Database Connection Issues

Incident Report for Twingate

Postmortem

Components impacted

Management: Public API

Management: Admin Console

Summary

On June 26, 2024 between 20:16 and 20:24 UTC, Twingate’s SQL proxies restarted, causing a brief failure to a small percentage of calls made to our Public API (Terraform, Pulumi, k8s Operator, etc.) and to the Admin console. There was no impact to Clients or Connectors. A change to our SQL proxy deployments that was targeting staging and development environments was pushed to production due to a misconfiguration, causing our SQL proxy instances to restart.

Root cause

Due to a misconfiguration, a change to our SQL proxy deployments intended for staging and development environments was pushed to production, causing them to restart.

Corrective actions

Short Term:

  • Ensure that SQL proxy deployments are only pushed in a controlled manner by resuming the GitOps workflow manually.
  • Fix the misconfiguration in our GitOps deployment mechanism for our SQL proxy deployments and set the Helm chart version to a static value so that all upgrades are done in a controlled manner.
  • Enhance our SQL proxy Helm chart to reduce the impact to services during updates and upgrades.
Posted Jun 29, 2024 - 08:31 UTC

Resolved

On June 26, 2024, between 8:16 pm UTC and 8:28 pm UTC, Twingate experienced several database connectivity alerts due to a failed rollout of one of its components. The rollout was promptly reversed, and our existing reliability measures prevented any major disruption to customer traffic.
Posted Jun 26, 2024 - 22:01 UTC
This incident affected: Management (Public API, Admin Console) and Control Plane (Connector Heartbeat).