zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3026598022
##########
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java:
##########
@@ -1977,8 +1977,9 @@ private void
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
LOG.debug("Old stats for {}: {}", roi.filterOperator,
roi.filterStats);
LOG.debug("Number of rows reduction: {}/{}", newNumRows,
roi.filterStats.getNumRows());
}
+ boolean useColStats = roi.filterStats.getColumnStats() != null;
StatsUtils.updateStats(roi.filterStats, newNumRows,
- true, roi.filterOperator, roi.colNames);
+ useColStats, roi.filterOperator, roi.colNames);
Review Comment:
This is the core fix that addresses the issue described in the Jira case. To
avoid unnecessary side effects let's keep the scope of the PR restricted to
this and revert all the other changes. This will also help in advancing and
merging this PR faster.
##########
ql/src/test/queries/clientpositive/semijoin_stats_missing_colstats.q:
##########
@@ -0,0 +1,45 @@
+-- HIVE-29516: Test that semijoin optimization handles missing column
statistics gracefully
Review Comment:
The `removeSemijoinOptimizationByBenefit` is not really triggered cause
there are zero `SemiJoinBranchInfo` when entering the method. Moreover, the
test does not exercise the code that was added in TezCompiler so it seems like
the new code has zero test coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]