zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3001055825
##########
ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java:
##########
@@ -328,6 +329,19 @@ public List<ColStatistics> getColumnStats() {
return null;
}
+ /**
+ * Returns the column statistics as a list, or an empty list if column
statistics are unavailable.
+ * This method is useful to avoid null checks when iterating over column
statistics.
+ *
+ * @return list of column statistics, or empty list if unavailable
+ */
+ public List<ColStatistics> getColumnStatsOrEmpty() {
+ if (columnStats != null) {
+ return Lists.newArrayList(columnStats.values());
+ }
+ return Collections.emptyList();
+ }
+
Review Comment:
Adding a new method increases the complexity of the code since now
developers have to decide which of the existing methods to call in order to
obtain column statistics. Given that the old method remains in place the
likelihood of NPE is still there.
I would suggest to do one of the following:
1. Modify `getColumnStats` ensuring that it never returns null and adapt all
callers.
2. Leave `getColumnStats` as is but add the missing null checks where
relevant.
3. Fix the specific NPE at its root by ensuring that
`removeSemijoinOptimizationByBenefit` will call `updateStats` by first ensuring
that its feasible (e.g., by using `satisfyPrecondition` or something along
these lines).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]