zabetak commented on code in PR #6382:
URL: https://github.com/apache/hive/pull/6382#discussion_r3001055825


##########
ql/src/java/org/apache/hadoop/hive/ql/plan/Statistics.java:
##########
@@ -328,6 +329,19 @@ public List<ColStatistics> getColumnStats() {
     return null;
   }
 
+  /**
+   * Returns the column statistics as a list, or an empty list if column 
statistics are unavailable.
+   * This method is useful to avoid null checks when iterating over column 
statistics.
+   *
+   * @return list of column statistics, or empty list if unavailable
+   */
+  public List<ColStatistics> getColumnStatsOrEmpty() {
+    if (columnStats != null) {
+      return Lists.newArrayList(columnStats.values());
+    }
+    return Collections.emptyList();
+  }
+

Review Comment:
   Adding a new method increases the complexity of the code since now 
developers have to decide which of the existing methods to call in order to 
obtain column statistics. Given that the old method remains in place the 
likelihood of NPE is still there.
   
   I would suggest to do one of the following:
   1. Modify `getColumnStats` ensuring that it never returns null and adapt all 
callers.
   2. Leave `getColumnStats` as is but add the missing null checks where 
relevant.
   3. Fix the specific NPE at its root by ensuring that 
`removeSemijoinOptimizationByBenefit` will call `updateStats` by first ensuring 
that its feasible (e.g., by using `satisfyPrecondition` or something along 
these lines).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to