On January 9, 2025, ProtonMail, a leading provider of encrypted email services, faced a significant disruption, leaving many users unable to access their accounts. The outage, which lasted approximately two hours, highlighted both challenges and opportunities for the platform’s infrastructure.
Incident Overview
The outage began at approximately 4:00 PM Zurich time, with a sudden surge in database connections across Proton’s global infrastructure. This unexpected load overwhelmed the system, resulting in intermittent service availability. While Proton VPN, Proton Pass, and other related services were quickly restored, ProtonMail and Proton Calendar faced prolonged issues. During the peak of the outage, nearly 50% of user requests failed, significantly impacting user experience.
Root Cause
The disruption was traced back to a software change implemented earlier. The time delay between the software update and the onset of the outage initially obscured the connection between the two. Preliminary code analysis suggested no immediate impact on database connections, necessitating further investigation to clarify the root cause.
Infrastructure Challenges
ProtonMail is in the process of migrating to a new infrastructure based on Kubernetes, which adds scalability and resilience. During the migration, two infrastructures are running simultaneously, limiting the platform’s ability to redistribute loads efficiently. While many services have already transitioned, ProtonMail remains in the migration phase, hampering its ability to automatically scale in response to unexpected load spikes.
Is @ProtonMail down? I've tried on several devices. pic.twitter.com/RlSTVrJmbG
— Ken (@Kheniwyze) January 9, 2025
Resolution and Future Plans
By 6:00 PM Zurich time, ProtonMail services were fully restored. The ongoing Kubernetes migration aims to prevent such outages in the future, improving redundancy and scalability. Proton’s technical team has committed to accelerating the migration to bolster the platform’s reliability.
User Impact
During the outage, users worldwide reported difficulties accessing ProtonMail, with many encountering error messages indicating server unavailability. The intermittent nature of the disruption caused frustration among individuals and businesses relying on ProtonMail for secure communications.
Lessons Learned
- Importance of Scalability: The incident underscores the need for infrastructure that can dynamically adapt to unexpected demand.
- Improved Monitoring: Real-time monitoring systems are essential for detecting anomalies before they escalate.
- Commitment to Security: Despite the outage, ProtonMail remains steadfast in its mission to provide secure and private communication solutions.
Conclusion
While the January 9 outage presented challenges, it also offered valuable insights into ProtonMail’s systems and operations. The completion of the Kubernetes migration will enhance the platform’s ability to handle unforeseen incidents and reaffirm its commitment to delivering reliable and secure services.
For ongoing updates and information, visit the ProtonMail status page.