[Downstream service issue] ...

Degraded

[Downstream service issue] Log ingestion buffer is taking more time than expected

Aug 05, 2025 at 07:12pm UTC

Affected services

Dashboard

Resolved
Aug 05, 2025 at 07:49pm UTC

We’ve received an update from the Clickhouse team. Here’s the crux of the issue:

"Due to memory starvation, other processes in your cluster are starting to fail, resulting in degraded performance, likely including the failed writes you are experiencing."

All queued logs have been written to the disk and the instance is back to normal.

We’re still keeping an eye on our data pipelines. We’ve also put together a checklist with the team to help prevent this from happening again.

Updated
Aug 05, 2025 at 07:40pm UTC

The Clickhouse team is still looking into it. Rest assured, all the data is safe in our data lake. We’ve marked the timestamps where the slowdown started. Once the Clickhouse team gives us the green light, we’ll re-audit the volume and replay ingestion if any logs are missing.

Updated
Aug 05, 2025 at 07:28pm UTC

Ingestion is now back to normal speed. We have asked for the RCA to clickhouse team. It will be attached to this degradation update.

Created
Aug 05, 2025 at 07:12pm UTC

Clickhouse queries are taking longer than expected. We are working with clickhouse team to debug the issue. Will keep this status updated