hubgeter opened a new pull request, #37530:
URL: https://github.com/apache/doris/pull/37530

   bp #37377 
   
   ## Proposed changes
   Since the value of the partition column is fixed when querying the partition 
table, we can deserialize the value only once and then repeatedly insert the 
value into the block.
   ```sql
   in Hive: 
   CREATE TABLE parquet_partition_tb (
       col1 STRING,
       col2 INT,
       col3 DOUBLE
   ) PARTITIONED BY (
       partition_col1 STRING,
       partition_col2 INT
   )
   STORED AS PARQUET;
   
   insert into  parquet_partition_tb partition 
(partition_col1="hello",partition_col2=1) values("word",2,2.3);
   
   insert into parquet_partition_tb 
partition(partition_col1="hello",partition_col2=1 )  
   select col1,col2,col3 from  parquet_partition_tb where 
partition_col1="hello" and partition_col2=1;
   Repeat the `insert into xxx select  xxx`operation several times.
   
   
   Doris :
   before:
   mysql>  select count(partition_col1) from parquet_partition_tb;
   +-----------------------+
   | count(partition_col1) |
   +-----------------------+
   |              33554432 |
   +-----------------------+
   1 row in set (3.24 sec)
   
   mysql>  select count(partition_col2) from parquet_partition_tb;
   +-----------------------+
   | count(partition_col2) |
   +-----------------------+
   |              33554432 |
   +-----------------------+
   1 row in set (3.34 sec)
   
   
   after:
   mysql>  select count(partition_col1) from parquet_partition_tb ;
   +-----------------------+
   | count(partition_col1) |
   +-----------------------+
   |              33554432 |
   +-----------------------+
   1 row in set (0.79 sec)
   
   mysql> select count(partition_col2) from parquet_partition_tb;
   +-----------------------+
   | count(partition_col2) |
   +-----------------------+
   |              33554432 |
   +-----------------------+
   1 row in set (0.51 sec)
   
   ```
   ## Summary:
   test sql `select count(partition_col) from tbl;`
   Number of lines : 33554432
   | |before | after|
   |---|---|--|
   |boolean |  3.96|0.47  | 
   |tinyint  |  3.39|0.47  |  
   |smallint |  3.14|0.50   |
   |int    |3.34|0.51   | 
   |bigint  |   3.61|0.51  |
   |float   | 4.59 |0.51  | 
   |double   |4.60| 0.55  | 
   |decimal(5,2)|  3.96  |0.61 | 
   |date   | 5.80|0.52    | 
   |timestamp |  7.68 | 0.52 | 
   |string  |  3.24 |0.79   | 
   
   Issue Number: close #xxx
   
   <!--Describe your changes.-->
   
   ## Proposed changes
   
   Issue Number: close #xxx
   
   <!--Describe your changes.-->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to