Dmitriy Maslov created IMPALA-14993:
---------------------------------------
Summary: Iceberg V2 count(*) optimization is incorrectly applied
to queries without count(*), causing row loss
Key: IMPALA-14993
URL: https://issues.apache.org/jira/browse/IMPALA-14993
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.5.0
Reporter: Dmitriy Maslov
On Iceberg V2 tables that contain delete files, queries without {{count(*)}} in
the select list (e.g. {{{}SELECT 1 FROM tbl{}}}) silently return fewer rows
than they should.
h3. Steps to reproduce
{{CREATE TABLE ice1 (id INT, c1 INT)}}
{{STORED AS ICEBERG TBLPROPERTIES ('format-version' = '2');}}
{{INSERT INTO ice1 SELECT 1, 10;}}
{{INSERT INTO ice1 SELECT 2, 20;}}
{{DELETE FROM ice1 WHERE id = 1;}}
{{SELECT 1 FROM ice1; -- expected: 1 row, actual: 0 rows}}
h3. Root cause
{{SelectStmt.optimizePlainCountStarQueryV2()}} decides to enable the
optimization based on a loop that _rejects_ anything that is not {{count(*)}}
or a constant - but never checks that at least one {{count(*)}} is actually
present. For {{SELECT 1 FROM ice1}} the loop accepts the constant and falls
through, setting {{{}tableRef.setOptimizeCountStarForIcebergV2(true){}}}.
h3. Proposed fix
Implement the protection in method V2 in a similar way to method V1, by adding
the hasCountStarFunc flag in file
fe/src/main/java/org/apache/impala/analysis/SelectStmt.java -
optimizePlainCountStarQueryV2() :
{{boolean hasCountStarFunc = false;}}
{{boolean alreadyOptimized = false;}}
{{for (SelectListItem selectItem : getSelectList().getItems()) {}}
{{ Expr expr = selectItem.getExpr();}}
{{ if (expr == null) return;}}
{{ if (expr.isConstant()) continue;}}
{{ if (expr instanceof IcebergV2CountStarAccumulator) {}}
{{ alreadyOptimized = true;}}
{{ continue;}}
{{ }}}
{{ if (!FunctionCallExpr.isCountStarFunctionCallExpr(expr)) return;}}
{{ hasCountStarFunc = true;}}
{{}}}
{{if (!hasCountStarFunc && !alreadyOptimized) return;}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)