Recent DNS Activity Unavailable
Incident Report for Twingate
Postmortem

Components impacted

Management: Admin Console

Summary

On April 20, 2024, between 5:32 GMT and 6:57 GMT, Recent DNS Activity on the Admin Console became unavailable.

Shortly after the incident began, the Twingate on-call team received alerts regarding abnormal database activity. Workers on the clusters that manages DNS filtering logs starting seeing errors from the logs API, leading to excessive retries and database writes. To mitigate the issue, the DNS Log Streaming workers were temporarily disabled.

The root cause was identified as a malfunction in the DNS Filtering Log API caused by a problematic dependency upgrade. Consequently, viewing DNS filtering logs and analytics in the Admin Console was temporarily unavailable.

A rollback of the update was issued, and normal operations were restored at 6:57 GMT after which DNS filtering logs and analytics were available in the Admin Console.

Root cause

The DNS Filtering Log API went down due to a bad dependency upgrade.

Corrective actions

Already completed:

  • Rectified Admin Console's infinite retry logic by enhancing the retrieval of DNS activity logs during error states.
  • Optimized DNS Log Streaming retry and database write procedures to reduce unnecessary operations when no events are returned from the DNS Filtering API

Short-term:

  • Improve the dependency upgrade process for the DNS FIltering API
Posted Apr 25, 2024 - 14:45 UTC

Resolved
This incident has been resolved.
Posted Apr 19, 2024 - 22:04 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Apr 19, 2024 - 21:58 UTC
Identified
Recent DNS Activity on admin is unavailable. We have identified the issue and working on a fix.
Posted Apr 19, 2024 - 21:45 UTC
This incident affected: Management (Admin Console).