Description:
On September 26, 2023, our customer experience team identified an issue with US Production instances, preventing users from logging in.
Upon investigation, our engineering team identified that the root cause of the problem was related to the connection between the Windows server and the Serraview US Production domain.
Type of Event:
Unplanned outage for US clients.
Services/Modules Impacted:
All services and modules in US Production instances.
Remediation:
The connection between the Windows server and the domain was restored, and US Production instances came back online quickly.
Timeline (AEST):
26th September
15:05 – Issue raised
15:51 – Issue identified – Severity 1 outage incident triggered.
16:48 – Service restarted – incident resolved.
Total Duration of Event:
~ 1 hour and 43 minutes.
Root Cause Analysis:
The connection between the Windows server and the domain had disconnected.
Preventative Action:
Specific monitoring and alerts will be set up for detecting when the connection between the Windows server and the domain is disconnected.