S2 - Engage Outage
Incident Report for Serraview
Postmortem

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.

Description:

On February 4, 2024, at approximately 11:28 PM EST customer support received reports of Serraview Engage Web being inaccessible for some users.

This was caused by deployment of a release which contained incorrect configuration.

The configuration change was reverted, and a hotfix was pushed to Production which restored service for all users.

Type of Event:

Service disruption.

Services/Modules Impacted:

Engage Web (excludes Engage Mobile).

Remediation:

The configuration change made to the docker file was reverted, and the fix was pushed to Production.

Timeline: 

Feb 4

  • 11:16 PM EST – Customers reported issues regarding Engage Web not loading for some users. Customer support escalated to an S2 incident and posted to our status page alerting customers. Internal teams acknowledged the issue and began regression testing against previous product build.

Feb 5

  • 4:35 AM EST – Internal teams finish developing the fix and deploy to Production.

Total Duration of Event:

5 hours and 19 minutes.

Root Cause Analysis:

Our Engineering team determined that the root cause of the issue was that a configuration change made in the latest version of Engage Web to the docker file was incorrect, resulting in service pods restarting with the incorrect version.

This prevented Engage Web from loading for some users.

Preventative Action:

The release process for Engage has been changed to require additional testing and approvals from the Cloud Ops team to ensure the deployment will be successful.

Posted Feb 14, 2024 - 02:56 UTC

Resolved
This incident has been resolved.
Posted Feb 05, 2024 - 12:29 UTC
Update
We are currently experiencing an issue with Engage and are investigating as we speak.
Posted Feb 05, 2024 - 08:17 UTC
Investigating
We are currently experiencing an issue with Engage and are investigating as we speak.
Posted Feb 05, 2024 - 04:16 UTC
This incident affected: Engage.