[Downstream service issue] Log ingestion buffer is taking more time than expected
Resolved
Aug 05, 2025 at 07:49pm UTC
We’ve received an update from the Clickhouse team. Here’s the crux of the issue:
"Due to memory starvation, other processes in your cluster are starting to fail, resulting in degraded performance, likely including the failed writes you are experiencing."
All queued logs have been written to the disk and the instance is back to normal.
We’re still keeping an eye on our data pipelines. We’ve also put together a checklist with the team to help prevent this from happening again.
Affected services
Updated
Aug 05, 2025 at 07:40pm UTC
The Clickhouse team is still looking into it. Rest assured, all the data is safe in our data lake. We’ve marked the timestamps where the slowdown started. Once the Clickhouse team gives us the green light, we’ll re-audit the volume and replay ingestion if any logs are missing.
Affected services
Updated
Aug 05, 2025 at 07:28pm UTC
Ingestion is now back to normal speed. We have asked for the RCA to clickhouse team. It will be attached to this degradation update.
Affected services
Created
Aug 05, 2025 at 07:12pm UTC
Clickhouse queries are taking longer than expected. We are working with clickhouse team to debug the issue. Will keep this status updated
Affected services