Serraview Detailed Root Cause Analysis – January 19, 2024
UAT Inaccessible
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
On January 19, 2024, internal and external customers began to report the inability to access their Serraview UAT instances. When users try to login, they are presented with a timeout error or a server error.
Type of Event:
Due to the adverse effects experienced by the growing subset of customers, an incident was initiated, and our internal teams promptly recognized and addressed the issue. It's important to note that incidents arising from UAT environments do not result in any breach of SLA
Services\Modules Impacted:
UAT
Timeline:
9:19am EST – Internal teams and customers began to report the inability to access their Serraview, UAT instances. When users try to login, they are presented with a timeout error or a server error. As additional customers reported the issue the initial ticket for investigation was upgraded to a high priority and at approximately 09:46am EST, all customers were notified via Status Page of the incident. At 9:51am EST, out cloud ops team acknowledged the issue and began investigating. The team has implemented the fix at approximately 10:15am EST. Customer support continued to monitor the resolution and notified all customers that the issue had been resolved at 1:18pm EST.
Root Cause Analysis:
Internal service consuming 100% CPU causing the connection to drop.
Remediation:
After an investigation, our Internal Teams restarted services to resolve the disruption to UAT.
Preventative Action:
Our internal teams continue to enhance monitoring for these internal services.