Previous incidents

August 2025

Aug 28, 2025

1 incident

Log Ingestion Delay Due to Google Pub/Sub Slowness

Degraded

Resolved Aug 28 at 03:00pm IST

This is now resolved, and ingestion is back to its original speed.

2 previous updates

Aug 13, 2025

1 incident

Log ingestion has slowed down

Degraded

Resolved Aug 13 at 01:43am IST

The node is recovered. Ingestion is resumed.

1 previous update

Aug 06, 2025

1 incident

[Downstream service issue] Log ingestion buffer is taking more time than expe...

Degraded

Resolved Aug 06 at 01:19am IST

We’ve received an update from the Clickhouse team. Here’s the crux of the issue:

"Due to memory starvation, other processes in your cluster are starting to fail, resulting in degraded performance, likely including the failed writes you are experiencing."

All queued logs have been written to the disk and the instance is back to normal.

We’re still keeping an eye on our data pipelines. We’ve also put together a checklist with the team to help prevent this from happening again.

3 previous updates

July 2025

Jul 17, 2025

1 incident

Degraded service

Degraded

Resolved Jul 17 at 08:42pm IST

The latency is back to normal. We will be keeping an eye on the system for next few hours.

1 previous update

Jul 09, 2025

1 incident

Dashboard is down

Downtime

Resolved Jul 09 at 09:40pm IST

Postmortem:

A failover of one of the Kafka brokers triggered a cascading effect on an event queue, which led to degraded dashboard rendering and increased latency on some API endpoints.

We fully recovered within 3 minutes, but the system experienced intermittent degradation for the following 20 minutes. To prevent recurrence, we have added an extra replica and increased pod affinity.

We apologize for the inconvenience.

2 previous updates

June 2025

Jun 08, 2025

1 incident

API Service is down

Downtime

Resolved Jun 08 at 09:09am IST

API Service recovered.

1 previous update