Previous incidents
Dashboard, API Service, and 1 other service are down
Resolved Oct 31 at 01:07am IST
Dashboard and API Service recovered.
6 previous updates
Test runs are degraded
Resolved Oct 30 at 11:30pm IST
Pipelines are working normally still we are observing all states.
1 previous update
Log Ingestion Delay Due to Google Pub/Sub Slowness
Resolved Aug 28 at 03:00pm IST
This is now resolved, and ingestion is back to its original speed.
2 previous updates
Log ingestion has slowed down
Resolved Aug 13 at 01:43am IST
The node is recovered. Ingestion is resumed.
1 previous update
[Downstream service issue] Log ingestion buffer is taking more time than expe...
Resolved Aug 06 at 01:19am IST
We’ve received an update from the Clickhouse team. Here’s the crux of the issue:
"Due to memory starvation, other processes in your cluster are starting to fail, resulting in degraded performance, likely including the failed writes you are experiencing."
All queued logs have been written to the disk and the instance is back to normal.
We’re still keeping an eye on our data pipelines. We’ve also put together a checklist with the team to help prevent this from happening again.
3 previous updates