yangzhg opened a new issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs URL: https://github.com/apache/incubator-doris/issues/3245 queies like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now doris will create only one cross join node to execute this sql, the time of eavl the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds; ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ((ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ss_addr_sk = ca_address_sk and ca_country = 'United States' and ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ``` but this sql can be rewrite to ``` select avg(ss_quantity) ,avg(ss_ext_sales_price) ,avg(ss_ext_wholesale_cost) ,sum(ss_ext_wholesale_cost) from store_sales, customer_address where ss_addr_sk = ca_address_sk and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` there for we can do a hash join first and then use ``` (((ca_state in ('CO', 'IL', 'MN') and ss_net_profit between 100 and 200 ) or (ca_state in ('OH', 'MT', 'NM') and ss_net_profit between 150 and 300 ) or (ca_state in ('TX', 'MO', 'MI') and ss_net_profit between 50 and 250 )) ) ``` to filter the value, in tpcds 10g dataset, the rewrited sql only cost about 1 seconds  so we should implements this optimize
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org