xinghuayu007 opened a new issue #6359: URL: https://github.com/apache/incubator-doris/issues/6359
### BackGround Doris supports ordering by aggregate /unique/duplicate key. It sorts many rows in memory by the key, and write it into segment file. For example, a table: | column | type | | a | varchar(12) | | b | varchar(12) | | c | varchar(12) | | d | varchar(12) | aggregate by (a, b, c,) It sorts by column a, b, c. Look at the follow table:  Then doris will create zonemap for every column in segment file level or page level. When query like this : `select * from table where a = 1 and b = 2 and c = 3`, it will takes every predicate a = 1, b = 2, c = 3 to compare with min-max of column a, b, c to filter a segment file or page. But when a query without predicate containing column a filter, the min-max zonemap index may be useless. For example, a table contains two columns: a, b. The range value of a and b is [0, 9]. There are 100 rows stored in this table. If we sort the data by a, b. and store them into 10 files evenly. The 1th file contains a with 0 all , b with range [0, 9]. The 2th file contains a with 0 all , b with range [0, 9]. And so on other files. Therefore, when query: `select * from table where b = 2` without column a filter. The query can not use the min-max of column b. In a word, normal sorting of multiple columns, only the min-max of first column is useful. The min-max of other column maybe useless. How to solve this problem: a query with multiple column filters. ### Introduction of Z-order indexing Z-order is a technique that allows you to map multidimensional data to a single dimension. Refer to Wik: imagine that you have a collection of (X, Y) coordinate pairs laid out on a 2-dimensional plane. Using Z-ordering, you could arrange those 2D pairs on a 1-dimensional line. Importantly, values that were close together in the 2D plane would still be close to each other on the line. The figure below shows the Z-values for the two dimensional case with integer coordinates 0 ≤ x ≤ 7, 0 ≤ y ≤ 7 (shown both in decimal and binary). Interleaving the binary coordinate values yields binary z-values as shown. Connecting the z-values in their numerical order produces the recursively Z-shaped curve. Two-dimensional Z-values are also called as quadkey ones.  It can be seen that if we sort the data according to the order of z-values and divide it into four files on average, no matter we use X or Y field filtering for point query in the query, we can skip half of the irrelevant files. If the amount of data is larger, the effect will be better. That is to say, the file based on z-order partition storage, It can have better data skipping effect on multiple fields. Fortunately, Z-order is not limited to 2-dimensional space—it can be abstracted to work in any number of dimensions. **How to implement Z-order in Doris** When use z-order indexing, the min-max of every column is useful. A query without any column filter can also use the min-max of other column. Doris has already supports Zone-Map index for every column. It is only need to write data in z-order indexing. 1. support SQL grammar: `create table ..... z-order by (a, b, c)` 2. implement a z-order comparator 3. when write data into this table, sort the rows with z-order comparator, then dump the data into segment file 4. make sure zone-map index has created for column a, b, c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org