morningman commented on a change in pull request #3148: improve performent of 
hash join in some case
URL: https://github.com/apache/incubator-doris/pull/3148#discussion_r395001565
 
 

 ##########
 File path: be/src/exec/hash_join_node_ir.cpp
 ##########
 @@ -137,11 +137,20 @@ int HashJoinNode::process_probe_batch(RowBatch* 
out_batch, RowBatch* probe_batch
     return rows_returned;
 }
 
+// when build table has too many duplicated rows, the collisions will be very 
serious,
+// so in some case will don't need to store duplicated value in hash table, we 
can build an unique one
 void HashJoinNode::process_build_batch(RowBatch* build_batch) {
     // insert build row into our hash table
     for (int i = 0; i < build_batch->num_rows(); ++i) {
-        _hash_tbl->insert(build_batch->get_row(i));
+        if (_join_op == TJoinOp::LEFT_ANTI_JOIN|| _join_op == 
TJoinOp::RIGHT_ANTI_JOIN
 
 Review comment:
   I think these compare can be done just once when initializing the 
HashJoinNode, no need to judge it for every row.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to