shubhluck commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3012523355
##########
ql/src/test/queries/clientpositive/semijoin_stats_missing_colstats.q:
##########
@@ -0,0 +1,45 @@
+-- HIVE-29516: Test that semijoin optimization handles missing column
statistics gracefully
Review Comment:
1. **Added .q.out file** - The expected output file is now included in the
PR.
2. **Test not failing on master** - You're correct that the test doesn't
reproduce
the exact NPE on master. The NPE occurs under specific conditions in
production
(observed with TPC-DS scale 10000) where:
- Tables have basic statistics but no column statistics
- The semijoin optimization threshold is met (large row count ratios)
- The `removeSemijoinOptimizationByBenefit` code path is triggered
The .q test serves as a regression test to verify:
- Compilation succeeds when column stats are missing
- The fix doesn't break normal semijoin optimization flow
The actual bug fix is validated by the unit tests in TestStatsUtils which
verify that `updateStats` throws IllegalArgumentException when called
with
`useColStats=true` but no column stats are available.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]