xinghuayu007 opened a new issue #6359:
URL: https://github.com/apache/incubator-doris/issues/6359


   ### BackGround
   
   Doris supports ordering by aggregate /unique/duplicate key. It sorts many 
rows in memory by the key, and write it into segment file. For example, a table:
   
   | column | type | 
   | a | varchar(12) |
   | b | varchar(12) |
   | c | varchar(12) |
   | d | varchar(12) |
   aggregate by (a, b, c,)
   It sorts by column a, b, c. Look at the follow table:
   
![image](https://user-images.githubusercontent.com/12771191/127860261-15acb8e9-6c71-4f11-ab06-3030675950b5.png)
   
   Then doris will create zonemap for every column in segment file level or 
page level. When query like this : `select * from table where a = 1 and b = 2 
and c = 3`, it will takes every predicate a = 1, b = 2, c = 3 to compare with 
min-max of column a, b, c to filter a segment file or page.
   
   But when a query without predicate containing column a filter, the min-max 
zonemap index may be useless. For example, a table contains two columns: a, b. 
The range value of a and b is [0, 9]. There are 100 rows stored in this table. 
If we sort the data by a, b. and store them into 10 files evenly. The 1th file 
contains a with 0 all , b with range [0, 9]. The 2th file contains a with 0 all 
, b with range [0, 9].  And so on other files. Therefore, when query: `select * 
from table where b = 2` without column a filter. The query can not use the 
min-max of column b. 
   In a word, normal sorting of multiple columns, only the min-max of first 
column is useful. The min-max of other column maybe useless.
   
   How to solve this problem: a query with multiple column filters.
   
   ### Introduction of Z-order indexing
   
   Z-order is a technique that allows you to map multidimensional data to a 
single dimension.
   
   Refer to Wik:  imagine that you have a collection of (X, Y) coordinate pairs 
laid out on a 2-dimensional plane. Using Z-ordering, you could arrange those 2D 
pairs on a 1-dimensional line. Importantly, values that were close together in 
the 2D plane would still be close to each other on the line. The figure below 
shows the Z-values for the two dimensional case with integer coordinates 0 ≤ x 
≤ 7, 0 ≤ y ≤ 7 (shown both in decimal and binary). Interleaving the binary 
coordinate values yields binary z-values as shown. Connecting the z-values in 
their numerical order produces the recursively Z-shaped curve. Two-dimensional 
Z-values are also called as quadkey ones.
   
   
![image](https://user-images.githubusercontent.com/12771191/127862941-373518bf-2522-4011-9b1e-c59a4bf7c68d.png)
   
   
   It can be seen that if we sort the data according to the order of z-values 
and divide it into four files on average, no matter we use X or Y field 
filtering for point query in the query, we can skip half of the irrelevant 
files. If the amount of data is larger, the effect will be better. That is to 
say, the file based on z-order partition storage, It can have better data 
skipping effect on multiple fields.  Fortunately, Z-order is not limited to 
2-dimensional space—it can be abstracted to work in any number of dimensions.
   
   **How to implement Z-order in Doris**
   
   When use z-order indexing, the min-max of every column is useful. A query 
without any column filter can also use the min-max of other column.
   
   Doris has already supports Zone-Map index for every column. It is only need 
to write data in z-order indexing. 
   1. support SQL grammar: `create table ..... z-order by (a, b, c)`
   2. implement a z-order comparator
   3. when write data into this table, sort the rows with z-order comparator, 
then dump the data into segment file
   4. make sure zone-map index has created for column a, b, c


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to