Summary
On August 19 at 7:51 AM UTC, Twingate received alerts of issues with the login services. Within a few minutes, the Twingate engineering team began investigating. The team quickly identified that our backend was seeing excessive timeouts from a 3rd-party API, preventing it from being able to process other requests such as authentication. After some initial fixes were unsuccessful, Twingate contacted the 3rd party and also disabled support for real-time updates that make use of these specific 3rd-party API calls. As a result, the issues started resolving at 8:10 AM UTC. Most of the services recovered quickly and full resolution occurred at 8:15 AM UTC.
The vendor later confirmed and fixed the issue, and Twingate re-enabled the real-time update feature shortly after on the same day, August 19.
Root cause
The Twingate backend was exhausted due to timeouts from a 3rd-party API.
Post-incident Analysis
Twingate had already separated out most services to their own deployments, allowing those services to function throughout the incident. Therefore, only some users that needed to authenticate or re-authenticate were affected; any user that had authenticated prior to the incident was not impacted.
Analysis of logs post-incident showed that the incident started at 7:49 AM UTC and fully recovered at 8:15 AM UTC.
Corrective actions
Short Term:
Medium / Long Term: