No more query timeouts in ClickHouse? ByteHouse makes it possible with resource groups
Learn how ByteHouse ensures minimum resource allocation and eliminates timeouts in ClickHouse
ClickHouse's resource management
ClickHouse's resource management is not satisfactory and can cause execution failures in a high insert/select concurrency scenario, affecting the user experience. This is because ClickHouse currently only supports maximum memory consumption per user, which kills the executed query when the threshold is exceeded.
In ByteDance's advertising business, there is a need to prioritise different queries. There is low jitter tolerance and a need for ad hoc analysis. A wide range of queries exist, causing resource intensive processes.
The concurrency control enabled by ClickHouse causes the following issues:
The inability to control concurrency flexibly leads to quick saturation of cluster resources, causing subsequent high-priority queries to remain pending, resulting in errors
The inability to reserve CPU resources for specific tasks results in large queries saturating CPU and subsequent queries taking significantly longer time to execute.
ByteHouse solution: Resource Group
To solve this, the ByteHouse team developed a component for resource management: Resource Group.
The basic idea is to divide resources like memory and CPU into different resource groups and then achieve resource sharing via a parent-child relationship.
When a user submits a query to the engine, the corresponding resource group is chosen based on the defined rules, and then the resource group and its parent resource group are evaluated to see if they can successfully execute the query; if so, it is executed directly; otherwise, it enters the resource group's waiting queue and waits for resources to be released.
Figure 1. Resource management using resource groups
max_concurrent_queries sets the maximum number of operations that can be run concurrently by a resource group.
When the concurrency limit is reached for a resource group, or when the concurrency limit is reached for the resource group's parent resource group, the engine puts the query into the resource group's waiting queue. When a query from the resource group ends, the engine will execute the first query in the queue. If the resource group's wait queue is empty at this point, it will trigger the release of the parent resource group's resources and further trigger the execution of queries in the wait queue of other child resource groups under the parent resource group, enabling concurrent quotas to be shared under the same parent resource group.
Each resource group can be configured with a soft memory limit. When a query in the resource group uses more memory than this soft limit, the new query is placed in a waiting queue. Memory control, like concurrency control, determines the limit of the parent resource group and implements memory sharing within a parent resource group using a similar approach.
Because there is no accurate model for estimating a query's memory utilisation, the current strategy is a prediction + actual memory correction model, in which the engine evaluates whether a new query can be executed based on the estimated memory when it enters, and then corrects the prediction value before the next round of judgement based on the existing memory tracker value after execution has begun.
This soft memory limit, unlike the hard memory limit of ClickHouse, does not kill queries that are already executing, but is used to judge the executability of new queries.
ByteHouse implements CPU control for resource groups using the cgroups-based CPU controller. The CPU controller uses the CFS scheduler to allocate CPU resources based on the same time slice, allowing different groups to occupy the corresponding CPU resources based on the predefined CPU shares.
We have implemented a new ThreadPool Class inside ByteHouse. When allocating thread resources to a query, the threads associated to the corresponding cgroup are allocated based on the information of resource groups recorded in the current Context.
Adopting the CFS scheduler:
When all resource groups have queries executing, the proportion of CPU available to each resource group is cpu_shares / sum (cpu_shares).
When only one resource group has queries executing, the proportion of CPU available to that resource group is 100%.
So the ratio of CPU resources available to each resource group is [cpu_shares / sum(cpu_shares), 100%], and by doing this we achieve two desired goals:
The lower limit of CPU resources that can be used by each query is guaranteed.
Overall CPU resource utilisation rate of the server in any workload is guaranteed.
Resource Group delivers improved results
Resource Group can significantly improve the query experience, providing security for queries with high priority and reducing the variance of query return times. It also improves cluster stability by not killing executing queries due to Out of Memory (OOM) and preventing failure of one service from affecting the entire cluster.
ByteHouse's Resource Group has several advantages.
Provides resource isolation in multiple aspects, including CPU, memory, etc.
Manages the impact of queries with low priority
Reduces the potential negative impact of other simultaneous workloads
In the advertising business mentioned above, shorter query time and better query experience were achieved by replacing ClickHouse with ByteHouse.
Figure 2. Performance comparison with and without Resource Groups
Query time without Resource Group (Unit: ms):
Query time with Resource Group (Unit: ms):
As you can see, before the Resource Group went live, the average query time per day was between 1.4s and 14.1s, resulting in poor user experience. Following the launch, the average query time per day was between 0.4s and 1.7s, which improved the guarantee of query resources for high-priority services and significantly reduced the average query return time.