S2 - Insights data is slow to load or is not loading intermittently
Incident Report for Serraview
Postmortem

Description:  

On September 15, 2023, our customer experience team identified an issue with Insights not displaying data after the dashboard had loaded.  

Upon investigation, our engineering team identified that the root cause of the problem was that several ETL functions were taking longer than allowed. 

 

Type of Event:  

Insights dashboards loading but not displaying data.  

  

Services/Modules Impacted:  

Insights (including Direct Connect and Editor). 

  

Remediation:  

The ETL function was migrated from the SQL server to Insight’s Snowflake server, improving ETL speed by 30-50%. 

  

Timeline (AEST):  

12th September 

  • 00:00 – Issue first began 

15th September  

  • 02:57 – ETL failed, alert triggered 
  • 09:42 – Server restarted; incident resolved 

19th September 

  • 09:00 – Issue reappeared 

23rd September 

  • 15:24 – Fix applied, issue resolved 

   

Total Duration of Event:  

~ 7 days and 16 hours.  

  

Root Cause Analysis:  

The scalar functions used for Insights’ ETL (hourly for live data clients, nightly for all other clients) were taking longer due to being run on the SQL server rather than in Insights' Snowflake server. Once moved to the Snowflake server, ETL completed before timing out. 

 

Preventative Action:   

We will optimize the data extraction method that Insights uses and improve proactive ETL monitoring to prevent timeout issues in the future.

Posted Oct 17, 2023 - 00:27 UTC

Resolved
This incident has been resolved.
Posted Sep 26, 2023 - 13:33 UTC
Monitoring
After a fix has been implemented over night, Insights ETL is now proceeding without experiencing timeout errors and, as a result, dashboards load as expected. We will be in a monitoring phase as we continue working on the two notes below.

* floorplan highlight shapes (polygons) are not being updated and will remain for the time being in Insights as they appeared on 2023-09-20
* custom field/tag definitions (not the values for the custom fields/tags) for client Looker environments will be as they appeared on 2023-09-20
Posted Sep 23, 2023 - 15:18 UTC
Update
We have made progress and are continuing to work on the fix for this issue. We will continue to update as information comes available.
Posted Sep 23, 2023 - 00:44 UTC
Update
We are continuing to work on a complete fix for this issue. Engineering and QA teams are currently testing an interim workaround which restores key Insights functionality, but may impact Floorplan Highlight updates.
Posted Sep 22, 2023 - 21:10 UTC
Update
We are continuing to work on a fix for this issue.
Posted Sep 22, 2023 - 15:16 UTC
Update
We are continuing to work on a fix for this issue.
Posted Sep 22, 2023 - 06:07 UTC
Update
As we continue to investigate the issue, please note that some clients may be able to run Insights reports for date periods before the affected dates as a temporary workaround.
To find which dates you may be able to run data for, open the ETL Status Dashboard (Shared folder > Utility Dashboards folder > ETL Status) and look for the latest row where Mode = “Full” and Status = "SUCCESS". Any dashboards filtered for date periods before this time should show correctly.
More information can be found at https://knowledge.eptura.com/Serraview/Insights/Insights_Dashboards/Utility_Folder/ETL_Status_Dashboard
Posted Sep 22, 2023 - 03:43 UTC
Update
We are continuing to work on a fix for this issue.
Posted Sep 22, 2023 - 00:22 UTC
Identified
We have identified and will be restarting our database service at 5:00pm CST, approximate time to conduct will be 20-30 minutes.
Posted Sep 21, 2023 - 18:19 UTC
Update
Continuing to conduct investigation, a service restart is required. we will update with maintenance window .
Posted Sep 21, 2023 - 17:58 UTC
Investigating
We are currently investigating this issue.
Posted Sep 21, 2023 - 14:25 UTC
This incident affected: Insights.