How ByteHouse's self-developed query optimiser boosts ClickHouse's performance manifold
ByteHouse combined rule-based and cost-based optimisation for highly efficient query execution plans
ByteHouse employs a query optimiser developed by its team, which solves a key limitation of ClickHouse. The enhanced query optimiser has much more to offer. Let's see its offerings and how they’re put into practice.
The importance of query optimiser
A database can perform better with a query optimiser. This is true for OLAP applications as well, where it matters even more. The outcome of a query plan might differ significantly for complex queries in analytical scenarios. A good query optimiser can help run SQL queries efficiently and can accurately determine the fastest execution path, significantly reducing query time.
There are two common query optimisation techniques - Rule-Based Optimisation (RBO) and Cost-Based Optimisation (CBO), both of which should be considered for best results in practice.
Rule-Based Optimisation (RBO)
RBO converts a relational expression into another relational expression based on the optimisation rules and discards the original expression. After a series of conversions, it generates the final execution plan. RBO only recognises rules and is data insensitive, thus generating the same execution plan for the same SQL query. Since the order of the rules may affect its performance, query performance varies.
Cost-Based Optimisation (CBO)
CBO converts a relational expression into another relational expression based on the optimisation rules and retains the original expression. After a series of conversions, it generates multiple execution plans. The CBO calculates a relative cost for each plan based on statistics, then picks the plan with the lowest cost.
ByteHouse's query optimiser
ClickHouse is designed to be fast for simplified schema; but without a query optimiser, it performs poorly on complex multi-table join queries. When large-scale operations are required for each dimension change or a new requirement, ClickHouse loses its steam during analysis.
A well-developed and powerful optimiser is essential to enterprise-class OLAP database, so ByteHouse has developed its own query optimiser from scratch.
Figure 1. Query optimisation process at ByteHouse
The diagram above depicts the entire query execution plan, from SQL parse to everything during execution, which has been redesigned (the purple module) to build a complete and standardised query optimiser.
Core modules
Analyser
The Analyser consists of two modules.
The first is the QueryRewriter. It realises some syntactic features using Abstract Syntax Tree (AST), rewriting either for ClickHouse SQL or Standard SQL, and also ensures that the semantics of SQL queries in ClickHouse SQL are consistent with the native Interpreter's strategies for program execution.
The other is the QueryAnalyzer, which is used to analyse and validate the semantics of the rewritten AST, and it functions differently for ANSI SQL and ClickHouse SQL.
QueryRewriter's ANSI SQL rewrite rules include:
Rewrite using CTE/view
Rewrite using UDF
Rewrite of specific functions, e.g. count(*) to count(), countDistinct(...) to uniqExact(...) ;
QueryRewriter's ClickHouse SQL rewrite rules include:
Rewrite using CTE/view
Rewrite using UDF
Function-specific rewrite
Rewrite using JoinToSubquery which corresponds to the JoinToSubqueryTransformVisitor under Interpreter
Qualified name normalisation, corresponding to the TranslateQualifiedNamesVisitor under Interpreter.
Rewrite using Alias, corresponding to the QueryNormalizer under Interpreter
QueryAnalyzer analyses and verifies semantics, transforming AST into a data structure that provides the data for plan building. The module distinguishes between standard SQL and ClickHouse SQL, with one set of code offering two different interpretations.
Planner
The data structure generated by QueryAnalyzer is used to build the initial query plan. Query Planner extends the developer community's QueryPlanStep by including serialisation and deserialisation to allow the plan to be sent down for execution based on Query Plan rather than AST or SQL text.
On the other hand, the community's Step has been modified so that each Step only expresses the semantics of relational algebra rather than a lot of execution-related data and parameters, and this execution related information is actually obtained when the execution pipeline is built on the server.
Optimiser
After the execution plan is built, the most critical remaining step is the Optimiser. The PlanOptimizer class is the entry class for query optimisation. First, it does a coarse-grained classification of SQL queries based on PlanPattern, using different rules for queries with varying complexity to improve efficiency.
The optimiser, whether RBO or CBO, is essentially a rewrite of the query, and the difference lies in the rewrite ideas and framework.
Three rewrite frameworks have been proposed to address different requirements:
Visitor-based rewrite framework: You can rewrite a QueryPlan in a Top-Down or Bottom-Up manner. It is more suitable for optimisation rules with context dependencies, such as PredicatePushDown which aims to push down the filtering to the "bare metal".
Pattern-match based rewrite framework: This applies to simple, generic rewriting rules, e.g. for merging two consecutive Filters. As long as the sub plan in the QueryPlan matches the Filter-Filter pattern, it can match the corresponding optimisation rules and rewrite.
Cascade-based rewrite framework: By iterating through the equivalent plans and storing them in a memory space, the cost of each plan is evaluated and an optimal solution is chosen.
6x faster performance with ByteHouse query optimiser
In terms of performance, open-source ClickHouse is limited by the lack of a query optimiser and can only support a small percentage of the TPC-DS. Even with manual rewrite, many of them can’t be successfully executed.
The improved ClickHouse is a truly usable OLAP database thanks to implementing an optimiser that runs all of these SQL use cases directly. ByteHouse has a far more scope of work due to improvements in schema. Not only can these complex queries be executed correctly, but they can also be executed six times faster than manually rewritten SQL without an optimiser. Performance in some internal applications was seen to increase nearly tenfold.
The optimiser's capabilities
RBO: column pruning, partition pruning, expression simplification, subquery unnesting, predicate pushdown, redundant operator elimination, Outer-JOIN to Inner-JOIN, push operators to storage layer, operator splitting for distributed optimisation, etc.
CBO: An efficient Join enumeration algorithm was implemented on basis of Cascade framework. Histogram-based cost estimation enables full enumeration and optimal solution for Join Reorder of 10-table joins, as well as heuristic enumeration and optimal solution for Join Reorder of >10 tables. CBO supports rule-based expansion of the search space, Outer-Join/Join Reorder, Magic Set Placement, etc.
Distributed Plan Optimisation: Distributed query plans were generated for MPP databases. The common solution adopts a two-stage approach, finding the optimal standalone plan and then distributing it. Our solution combines these two phases, catering to the requirements of the distributed plan throughout the search for the optimal solution using CBO, and determining the optimal distributed plan from a cost perspective. The Join/Aggregate also supports the expansion of the Partition attribute.
Higher order optimisation capabilities: Dynamic Filter pushdown, Rewrite for Materialised Views, Cost-based Common Table Expressions(CTE).
In figure 2, we use the TPC-DS test set to show you the difference before and after enabling the optimiser:
Figure 2. TPC-DS test comparison (Unit: ms)
Without the optimiser, it could execute only 26 SQL queries. With the optimiser, it could run all 99 SQL use cases of TPC-DS, and the performance was greatly improved.