Previous incidents
October 2024
Oct 04, 2024
2 incidents
Degraded availability (test runs, prompt playground)
Degraded
Resolved Oct 04 at 08:39pm IST
We are up now.
RCA
- We use a third-party library to acquire distributed locks that expect specific LUA scripts to be cached. At 6:00 AM PT today, we realized that the Redis cache was burst due to disc corruption that led to the deletion of these scripts.
- We learned that the lib does not reindex the scripts, so we had to update them manually - once updated system is working as expected
2 previous updates
Dashboard and API Service are down
Downtime
Resolved Oct 04 at 02:06pm IST
API Service recovered.
5 previous updates
September 2024
Sep 25, 2024
1 incident
Degraded performance for test runs
Degraded
Resolved Sep 25 at 12:37pm IST
The issue is mitigated. We are now working on a long-term fix.
2 previous updates
August 2024
No incidents reported