zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3026598022


##########
ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java:
##########
@@ -1977,8 +1977,9 @@ private void 
removeSemijoinOptimizationByBenefit(OptimizeTezProcContext procCtx)
           LOG.debug("Old stats for {}: {}", roi.filterOperator, 
roi.filterStats);
           LOG.debug("Number of rows reduction: {}/{}", newNumRows, 
roi.filterStats.getNumRows());
         }
+        boolean useColStats = roi.filterStats.getColumnStats() != null;
         StatsUtils.updateStats(roi.filterStats, newNumRows,
-            true, roi.filterOperator, roi.colNames);
+            useColStats, roi.filterOperator, roi.colNames);

Review Comment:
   This is the core fix that addresses the issue described in the Jira case. To 
avoid unnecessary side effects let's keep the scope of the PR restricted to 
this and revert all the other changes. This will also help in advancing and 
merging this PR faster.



##########
ql/src/test/queries/clientpositive/semijoin_stats_missing_colstats.q:
##########
@@ -0,0 +1,45 @@
+-- HIVE-29516: Test that semijoin optimization handles missing column 
statistics gracefully

Review Comment:
   The `removeSemijoinOptimizationByBenefit` is not really triggered cause 
there are zero `SemiJoinBranchInfo` when entering the method. Moreover, the 
test does not exercise the code that was added in TezCompiler so it seems like 
the new code has zero test coverage.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to