Noemi Pap-Takacs has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/23889 )

Change subject: IMPALA-12027: Support additional details for DataSink nodes in 
ExecSummary
......................................................................


Patch Set 5:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/23889/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/23889/5//COMMIT_MSG@13
PS5, Line 13: is
'was' Please be careful with the tenses. When someone reads the commit message, 
this will already be merged.


http://gerrit.cloudera.org:8080/#/c/23889/4/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java
File fe/src/main/java/org/apache/impala/planner/MultiDataSink.java:

http://gerrit.cloudera.org:8080/#/c/23889/4/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java@104
PS4, Line 104:   protected String getLabelDetail() {
> Actually, it feels a bit redundant to add one more end-to-end test just for
We would like to test the presence of table name in the label details. None of 
those test check for it, so it would be great to add an automated test case 
that checks this so that we can catch behavioral changes in the future.


http://gerrit.cloudera.org:8080/#/c/23889/5/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java
File fe/src/main/java/org/apache/impala/planner/MultiDataSink.java:

http://gerrit.cloudera.org:8080/#/c/23889/5/fe/src/main/java/org/apache/impala/planner/MultiDataSink.java@105
PS5, Line 105: Set<String> table_sinks_set = new HashSet<String>();
             :     for (DataSink dsink : dataSinks_) {
             :       if (dsink.getSinkType() == TDataSinkType.TABLE_SINK) {
             :         table_sinks_set.add(((TableSink)dsink).getLabelDetail());
             :       }
             :     }
             :     String minString = table_sinks_set.iterator().next();
             :     for (String dsink_name : table_sinks_set) {
             :       if (dsink_name.length() < minString.length()) {
             :         minString = dsink_name;
             :       }
             :     }
             :     return minString;
Csaba is right that it is also an assumption that the shortest string will be 
the real table name.

We could simply output the first table name. AFAIK the first data sink is the 
insertSink in both inheritors - please make sure it is.
Writing a comment about this logic and testing it ensures that the output 
remains correct even if the implementation changes.


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@288
PS5, Line 288: is populated in runtime profile correctly in every
             :     query state
"contains the correct table details"


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@292
PS5, Line 292: _async
Doesn't have to be async


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@294
PS5, Line 294: # After completion of the admission control phase, the 
coordinator would have started
             :     # and we should get a populated exec_summary.
             :     self.client.wait_for_admission_control(handle)
No need to test this stage.


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@297
PS5, Line 297: get_runtime_profile(handle)
get_exec_summary(handle)

Getting the ExecSummary is enough, we don't need to parse the entire profile.


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@299
PS5, Line 299: assert "ExecSummary:" in profile, profile
             :
             :     exec_summary = re.search(r"ExecSummary:(.*?)Errors", 
profile, re.DOTALL)
             :     if exec_summary:
             :       exec_summary = exec_summary.group(1).strip()
             :     else:
             :       assert 1 == 0, "ExecSummary does not exist in profile"
This is unnecessary, since it is not what we want to test here.


http://gerrit.cloudera.org:8080/#/c/23889/5/tests/query_test/test_observability.py@314
PS5, Line 314:
Please add test cases for MultiDataSink e.g. UPDATE and MERGE operations on 
Iceberg table. To be able to do this in this test case, you can create an 
Iceberg table in line 290 instead of a Hive table by using the `stored as 
Iceberg` syntax and setting the `format-version` table property to `2`.



--
To view, visit http://gerrit.cloudera.org:8080/23889
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2652dd896f72c5c6bbe7e76facdede2a237808d5
Gerrit-Change-Number: 23889
Gerrit-PatchSet: 5
Gerrit-Owner: Surya Hebbar <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Noemi Pap-Takacs <[email protected]>
Gerrit-Reviewer: Surya Hebbar <[email protected]>
Gerrit-Comment-Date: Wed, 08 Apr 2026 13:00:37 +0000
Gerrit-HasComments: Yes

Reply via email to