admin console and authentication incident
Incident Report for Twingate
Postmortem

Summary

On June 7th 17:08 UTC, a new version of our Controller software was rolled out. Shortly after the rollout completed, our on-call team received automated exception alerts and began investigating. The new version had inadvertently included changes to clean up our database that were out of sync with the deployed code, and the team decided to rollback the new software deployment. The rollback proceeded smoothly, however the previous code version was missing the database fields that had been cleaned up, and the incident started at 17:24 UTC.

A fix was prepared and rolled out starting at 17:29 UTC. Deployment completed on the first cluster at 17:38 UTC and proceeded to the remaining clusters once we verified that the error state had been resolved. Deployment was completed on all clusters at 17:48 UTC.

Post-incident Analysis

We initially thought the incident had only impacted Admin Console users, however the following systems were impacted:

  • Admin Console sign in.
  • Client initial authentication and re-authentication requests.
  • Linux and container-based Connectors. This incident exposed an issue that resulted in Connectors incorrectly shutting down on transient Controller unavailability.

Root Cause

An error in our deployment process logic led to a mismatch in deployed code and database schema.

Corrective Actions

  1. Improve our processes for merging software changes that are linked to database schema changes.
  2. We have fixed and will be testing the bug in our Connector uptime / retry logic.
  3. Improve overall build and rollout performance to be able to push fixes more promptly.
Posted Jun 16, 2023 - 17:27 UTC

Resolved
This incident has been resolved. It's due to an issue with a software update. We are working on a plan to avoid this in the future. This impacted authentication flow too. Already authenticated flows continued to function.
Posted Jun 07, 2023 - 17:51 UTC
Investigating
We are seeing issues with our admin consoles not loading properly. It's been investigated.
Posted Jun 07, 2023 - 17:42 UTC
This incident affected: Management (Admin Console).