morningman commented on a change in pull request #3148: improve performent of
hash join in some case
URL: https://github.com/apache/incubator-doris/pull/3148#discussion_r395001565
##########
File path: be/src/exec/hash_join_node_ir.cpp
##########
@@ -137,11 +137,20 @@ int HashJoinNode::process_probe_batch(RowBatch*
out_batch, RowBatch* probe_batch
return rows_returned;
}
+// when build table has too many duplicated rows, the collisions will be very
serious,
+// so in some case will don't need to store duplicated value in hash table, we
can build an unique one
void HashJoinNode::process_build_batch(RowBatch* build_batch) {
// insert build row into our hash table
for (int i = 0; i < build_batch->num_rows(); ++i) {
- _hash_tbl->insert(build_batch->get_row(i));
+ if (_join_op == TJoinOp::LEFT_ANTI_JOIN|| _join_op ==
TJoinOp::RIGHT_ANTI_JOIN
Review comment:
I think these compare can be done just once when initializing the
HashJoinNode, no need to judge it for every row.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]