[I] Bug triage results: 2026-04-27 [datafusion-comet]

via GitHub Mon, 27 Apr 2026 09:20:23 -0700


andygrove opened a new issue, #4110:
URL: https://github.com/apache/datafusion-comet/issues/4110


   ## Triage summary for 2026-04-27
   
   Triaged **19** open issues that carried `requires-triage`. Labels have 
already been applied; please spot-check below and close this issue when 
satisfied. Any individual relabel can be done directly on the affected issue.
   
   Triage criteria come from 
[`docs/source/contributor-guide/bug_triage.md`](../blob/main/docs/source/contributor-guide/bug_triage.md).
 Most issues in this batch are enhancement requests rather than bugs; the 
priority decision tree is bug-focused, so enhancement priorities reflect 
judgment about user impact and urgency relative to the priority descriptions in 
the guide (low = tooling/cosmetic/process; medium = functional gap with 
workaround; high = crash/major breakage; critical = silent wrong results).
   
   ### Counts by priority applied
   
   | Priority | Count |
   | --- | --- |
   | `priority:critical` | 0 |
   | `priority:high` | 1 |
   | `priority:medium` | 6 |
   | `priority:low` | 12 |
   
   `good first issue` applied: 3.
   
   ### Triaged
   
   | Issue | Priority | Areas | GFI | Rationale |
   | --- | --- | --- | --- | --- |
   | #4005 | medium | (none) | no | EPIC tracking broad planner improvements; 
cross-cutting impact on perf/correctness puts this above tooling but it is 
umbrella scope, so not high. |
   | #4006 | low | (none) | no | Internal API enhancement to distinguish info 
vs fallback messages; no user-visible bug, fits "tooling" tier. |
   | #4007 | medium | `area:aggregation` | no | Functional gap with possible 
correctness issues, but windowed aggregates are disabled by default so users 
are not silently affected. See escalation note. |
   | #4020 | low | `area:scan`, `native_iceberg_compat` | no | Internal 
refactor / cleanup; no user-visible bug. |
   | #4027 | low | (existing `spark 4`) | no | Forward-port to next Spark 4.0.x 
patch; routine dependency bump. |
   | #4045 | medium | `area:scan`, `area:shuffle` | no | Functional bug: 
Comet+vanilla mismatch breaks `ReusedExchangeExec` under AQE+DPP. Has 
workaround (disable AQE), fits medium. |
   | #4050 | low | (none) | no | Process / tooling enhancement for audit 
reporting. |
   | #4054 | low | (existing `spark 4`) | no | Build-profile default change 
discussion; no user-visible bug. |
   | #4059 | low | `area:ci`, `spark 4` | yes | Add Java 21 row to CI matrix; 
well-scoped YAML change, contributor docs update. |
   | #4069 | low | `area:expressions` | no | Exploratory: try DataFusion lambda 
functions in Comet. Tooling/research scope. |
   | #4074 | medium | `area:expressions` | no | Mix of expression support-level 
fixes. One sub-item (CometStringRepeat does not fall back for incompatibility) 
is potentially correctness; see escalation note. |
   | #4081 | low | (none) | no | Release planning for 0.16.0; tracking/process. 
|
   | #4094 | low | (none) | no | Release planning for 0.15.1; tracking/process. 
|
   | #4096 | medium | `spark 4` | no | Functional gap: Spark 4.1 introduces 
`OneRowRelationExec` causing ~30 sql-file test fallbacks. Workaround = run on 
Spark 4.0. |
   | #4098 | high | `spark 4` | no | Tracking issue for four Spark 4.1 failure 
clusters; includes potential silent-wrong-results bloom filter mismatch. See 
escalation note. |
   | #4099 | low | (none) | no | Cosmetic / branding update; fits the 
"cosmetic" tier of `priority:low`. |
   | #4101 | low | `area:ci`, `area:Iceberg` | yes | CI cost optimization; 
scoped YAML matrix edit. |
   | #4102 | low | `area:ci` | yes | CI cost optimization; scoped YAML matrix 
edit. |
   | #4107 | low | `spark 4` | no | Internal refactor of 4.0/4.1 shims after 
4.1 stabilizes; no user impact. |
   
   ### Escalations to consider
   
   - **#4098 (Spark 4.1 CI failures)** — One cluster (bloom filter result 
mismatch) produces different `might_contain` results between Comet and Spark, 
i.e. silent wrong results. Per the guide's "correctness over crashes" principle 
this would normally warrant `priority:critical`, but Spark 4.1 is not yet a 
supported target so users are not exposed today. Worth splitting that cluster 
into its own `priority:critical` issue once Spark 4.1 support lands.
   - **#4074** — The CometStringRepeat sub-item ("incompatibility that we do 
not fall back for") could produce silent wrong results on the relevant inputs; 
if confirmed, escalate that sub-item to `priority:critical` and split into its 
own issue.
   - **#4007** — Title says "fix correctness issues"; if the underlying 
correctness issues are present even when windowed aggregates are explicitly 
enabled by users, this should escalate to `priority:high` (or 
`priority:critical` if results are silently wrong).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Bug triage results: 2026-04-27 [datafusion-comet]

Reply via email to