zabetak commented on code in PR #6402:
URL: https://github.com/apache/hive/pull/6402#discussion_r3021970427


##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+  `data` array<struct<jobs:array<struct<code:string,label:string>>>>
+) STORED AS ORC;
+
+INSERT INTO TABLE cbo_component_access_if_tbl
+SELECT array(
+  named_struct(
+    'jobs', array(
+      named_struct('code', 'j1', 'label', 'l1'),
+      named_struct('code', 'j2', 'label', 'l2')
+    )
+  )
+);
+
+-- `data` is both the column name and the LATERAL VIEW table alias (matches 
common real-world queries).
+-- Expression `data.dat.jobs.code` forces Calcite to introduce 
COMPONENT_ACCESS for array-of-struct field access.
+-- `if(...)` is translated to CASE, and 
HiveFunctionHelper.checkForStatefulFunctions walks the Rex tree.
+SELECT
+  if(concat_ws(',', data.dat.jobs.code) = '', null, concat_ws(',', 
data.dat.jobs.code)) AS jobs_codes
+FROM cbo_component_access_if_tbl t
+LATERAL VIEW explode(t.`data`) `data` AS `dat`;

Review Comment:
   To verify the fix EXPLAIN CBO should suffice since we don't care about the 
actual execution of the query. In addition some elements seem redundant like 
`concat_ws`, `explode`, and `LATERAL VIEW` so please simplify the case as much 
as possible.



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;

Review Comment:
   Cleanup is taken care of by the test framework so no need to include this.



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS

Review Comment:
   NVL appears nowhere inside this file so please use a more descriptive file 
name.



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+  `data` array<struct<jobs:array<struct<code:string,label:string>>>>
+) STORED AS ORC;
+
+INSERT INTO TABLE cbo_component_access_if_tbl
+SELECT array(
+  named_struct(
+    'jobs', array(
+      named_struct('code', 'j1', 'label', 'l1'),
+      named_struct('code', 'j2', 'label', 'l2')
+    )
+  )
+);

Review Comment:
   The problem is a compilation error so we don't care about the data thus 
INSERT can be removed. This makes the test faster and more minimal avoiding 
potential unrelated failures in the future.



##########
ql/src/test/org/apache/hadoop/hive/ql/parse/type/TestHiveFunctionHelper.java:
##########
@@ -62,4 +64,24 @@ public void testGetUDTFFunctionThrowingException() throws 
SemanticException {
     // 'upper' is not a udtf so should throw exception
     functionHelper.getUDTFFunction("upper", operands);
   }
+
+  @Test
+  public void testCoalesceWithComponentAccessDoesNotAssert() throws 
SemanticException {

Review Comment:
   Since we use NVL rename to `testNVL...`



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS

Review Comment:
   Comment is not strictly necessary but if we want put something let's simply 
put the Jira ID plus summary and nothing more. 



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;

Review Comment:
   This is the default value so no need to set it explicitly.



##########
ql/src/test/queries/clientpositive/cbo_component_access_nvl.q:
##########
@@ -0,0 +1,27 @@
+-- Repro for HIVE-29538 issue: AssertionError when Calcite operator 
COMPONENT_ACCESS
+
+set hive.cbo.enable=true;
+
+DROP TABLE IF EXISTS cbo_component_access_if_tbl;
+
+CREATE TABLE cbo_component_access_if_tbl (
+  `data` array<struct<jobs:array<struct<code:string,label:string>>>>

Review Comment:
   Probably one level of nesting should suffice to repro the problem don't need 
to make the DDL overly complex.



##########
ql/src/java/org/apache/hadoop/hive/ql/parse/type/HiveFunctionHelper.java:
##########
@@ -346,8 +347,14 @@ public Void visitCall(final RexCall call) {
         GenericUDF nodeUDF = SqlFunctionConverter.getHiveUDF(
             call.getOperator(), call.getType(), call.getOperands().size());
         if (nodeUDF == null) {
+          // Some internal Calcite operators (e.g. COMPONENT_ACCESS used for 
nested field access over
+          // collections) are not backed by a Hive GenericUDF. They are not 
stateful and should not
+          // fail query compilation; still recurse into operands to find any 
stateful expressions.
+          if (call.getOperator() == HiveComponentAccess.COMPONENT_ACCESS) {
+            return super.visitCall(call);
+          }
           throw new AssertionError("Cannot retrieve function " + 
call.getOperator().getName()
-              + " within StatefulFunctionsChecker");
+                  + " (kind=" + call.getOperator().getKind() + ") within 
StatefulFunctionsChecker");
         }

Review Comment:
   The proposed fix addresses the problem for `COMPONENT_ACCESS` operator but 
we may hit it again for some other internal operator that does not have a Hive 
mapping. If that happens we will have to update the code here and add more 
if/else statements which makes the code a bit brittle.
   
   Personally, I would get rid of the `AssertionError` and I would opt for a 
more general change that does not depend on the specific Calcite operator.
   
   ```java
   if (nodeUDF != null && FunctionRegistry.isStateful(nodeUDF)) {
     throw new Util.FoundOne(call);
   }
   ```
   In other words when nodeUDF is null (i.e., Hive mapping is missing) then we 
assume that the operator is not stateful similar to what happens in 
`HiveFunctionHelper.isStateful(FunctionInfo)`. This should be fine since 
`UDFType#stateful` is false by default so in general when appropriate info is 
missing we assume that that functions/operators are not statefull.
   
   Moreover, there are very few (only 2 in the entire Hive codebase) actual 
usages of `@UDFType(stateful = true)` so the specific annotations does not seem 
widely adopted/used.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to