Back to overview
Degraded

[Downstream service issue] Log ingestion buffer is taking more time than expected

Aug 06 at 12:42am IST
Affected services
Dashboard

Resolved
Aug 06 at 01:19am IST

We’ve received an update from the Clickhouse team. Here’s the crux of the issue:

"Due to memory starvation, other processes in your cluster are starting to fail, resulting in degraded performance, likely including the failed writes you are experiencing."

All queued logs have been written to the disk and the instance is back to normal.

We’re still keeping an eye on our data pipelines. We’ve also put together a checklist with the team to help prevent this from happening again.

Updated
Aug 06 at 01:10am IST

The Clickhouse team is still looking into it. Rest assured, all the data is safe in our data lake. We’ve marked the timestamps where the slowdown started. Once the Clickhouse team gives us the green light, we’ll re-audit the volume and replay ingestion if any logs are missing.

Updated
Aug 06 at 12:58am IST

Ingestion is now back to normal speed. We have asked for the RCA to clickhouse team. It will be attached to this degradation update.

Created
Aug 06 at 12:42am IST

Clickhouse queries are taking longer than expected. We are working with clickhouse team to debug the issue. Will keep this status updated