Previous incidents

October 2024
Oct 04, 2024
2 incidents

Degraded availability (test runs, prompt playground)

Degraded

Resolved Oct 04 at 08:39pm IST

We are up now.

RCA

  • We use a third-party library to acquire distributed locks that expect specific LUA scripts to be cached. At 6:00 AM PT today, we realized that the Redis cache was burst due to disc corruption that led to the deletion of these scripts.
  • We learned that the lib does not reindex the scripts, so we had to update them manually - once updated system is working as expected

2 previous updates

Dashboard and API Service are down

Downtime

Resolved Oct 04 at 02:06pm IST

API Service recovered.

5 previous updates

September 2024
Sep 25, 2024
1 incident

Degraded performance for test runs

Degraded

Resolved Sep 25 at 12:37pm IST

The issue is mitigated. We are now working on a long-term fix.

2 previous updates

August 2024
No incidents reported