We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
On November 21, 2023, our customer experience team identified an issue with the European server for Serraview Live causing an outage across Locator, Engage and reporting.
Upon investigation, our engineering team identified that the root cause of the problem was that one of the AWS services was using more memory than expected, causing the server to not have enough available memory.
Type of Event:
Unplanned SVLive outage for European clients.
Services/Modules Impacted:
Locator, Engage and reports (including Insights).
Remediation:
We temporarily disabled the service causing the memory usage and restarted the application pool on the SVLive server.
Timeline (AEDT):
21st November
10:56 – Issue raised
13:32 – Service disabled temporarily and app pool restarted
14:32 – Issue confirmed resolved
Total Duration of Event:
~ 3 hours and 36 minutes.
Root Cause Analysis:
A service was using more memory than intended.
Preventative Action:
A support case was raised with AWS to identify the issue and steps to permanently resolve the issue.
Additional SVLive URLs were added to uptime monitoring to alert us proactively of similar issues before they happen in the future.
We will increase resiliency by adding high availability (have multiple versions of the app instances running at the same time).