Case Study: Real-time Advertisement Placement Through ClickHouse
A quick case study on handling advertising placement data through ClickHouse
Today we'll explore how we handle advertising placement data through ClickHouse.
In general, the operation staff needs to view the real-time effect of the advertising placement. Due to the characteristics of the business, data collected in one day is often mixed with those collated from the past few days. This system was originally implemented based on Druid. However, we discovered that Druid had some flaws in this case.
ClickHouse is able to resolve majority of the problems faced by Druid. However, there are some challenges left to be fixed.
Problem #1: Buffer Engine is unable to work with ReplicatedMergeTree
Problem & Challenge: Clickhouse community provides Buffer Engine to solve the problem of generating too many parts in a single write, but it does not work well with ReplicatedMergeTree. The Buffer written for different Replicas only caches the newly written data on their respective nodes, resulting in inconsistent queries.
We made the following enhancements to the Buffer Engine:
- Combine the Kafka/Buffer/MergeTree tables together to provide APIs that are easier to use.
- Buffer is built into the Kafka engine, it can be turned on/off as an option, which is easier to use.
- Use internal pipeline mode in Buffer table to handle multiple blocks.
- Supports queries in ReplicatedMergeTree Engine.
First of all, it ensures that only one node is consuming for a pair of replicas, so only one out of two Buffer tables has data. If a query is sent to the unconsumed replica, a special query logic is built to read data from the Buffer table of the other replica.
Result: Enhance Buffer Engine to solve the problem of query consistency when Buffer Engine and ReplicatedMergeTree are used at the same time.
Problem #2: Data loss and duplicate consumption issue after downtime
Challenge: ClickHouse lacks transactional support. During the service downtime, some parts of the batch write operations to ClickHouse might be lost or engaged in double consumption. This is due to the lack of transaction protection when the service is restarting.
With reference to Druid's KIS solution, we manage Kafka's Offset by ourselves. To achieve the atomic semantics of single batch consumption/writing: the implementation chooses to bind Offset and Parts data together to enhance the stability of consumption. During each consumption, a transaction is created by default, and the transaction is responsible for writing Part data and Offset to disk together. If there is a failure, the transaction will roll back the Offset and the written part, then consume again.
Result: Ensures the atomicity of each inserted data and enhances the stability of data consumption.