S1 Inability to Access Serraview
Incident Report for Serraview
Postmortem

Serraview Detailed Root Cause Analysis | 4.08.2024

S1 – Login errors

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

Description:

On April 8th, 2024,  internal teams and customer support received reports of some Serraview instances experiencing inability to login without warning. This issue impacted a small number of Serraview customers.

This cause of the login issue was due to a routine migration that unfortunately had a performance error, causing it to get stuck and impacting some customers’ ability to login.

Upon receiving notice of the login issues, our dedicated team promptly took action to restart the migration process, ensuring its successful completion and resolving the issue.

Type of Event:

Service login disruption.

Remediation:

Upon immediate notice of the login issues, our dedicated Serraview team was able to restart the migration process to successful completion, resolving the issue.

Timeline:

 April 8

  • (8:41 AM) – Internal teams were notified that some customers were experiencing unexpected login issues due to the migration process to upgrade the product not completing properly.
  • (9:58 AM) – The migration process to update Serraview was restarted by our team and monitored through successful completion.
  • (11:18 AM) – Migration successfully completed. Monitoring to ensure resolution was successful began. All-clear was declared and incident was closed.

Total Duration of Event:

(0 day/1 hours/17minutes)

Root Cause Analysis:

Our Engineering team determined that the root cause of the issue was that a routine migration to upgrade the database and some pods for Serraview had an error in processing, causing the process to get stuck. This resulted in some customers experiencing difficulty logging in and unable to receive a notice about the downtime.

Preventative Action:

The Serraview team was able to restart the migration process and monitor it through successful completion and review with internal testing to ensure the issue was resolved for customers. This incident has been closed, but our team is dedicated to closely monitoring future updates as they are released to ensure best customer experience.

Posted Apr 16, 2024 - 16:04 UTC

Resolved
The issue is confirmed resolved and we will be taking this out of monitoring. Thank you for the continued patience as we worked through the issue at hand.
Posted Apr 08, 2024 - 18:43 UTC
Monitoring
The issue has been found and a fix has been implemented by our Engineering team. We'll stay in monitoring for the next hour while we wait for more reports of the issue being resolved. If you are still experiencing issues please don't hesitate to reach out.
Posted Apr 08, 2024 - 16:08 UTC
Investigating
We are currently investigating an issue with the inability to access Serraview. Our Engineering team is currently investigating to determine the cause of the disruption. The next update will be posted at 12:14pm CST
Posted Apr 08, 2024 - 15:15 UTC
This incident affected: Core Services (APAC - Core Services, EMEA- Core Services, NA- Core Services), Space Planning (APAC - Space Planning, EMEA - Space Planning, NA - Space Planning), and Engage, Insights, Public API, SV Live / Locator.