schenksj commented on PR #3932:
URL: 
https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4363668195

   ### Full regression re-baseline
   
   A fresh full-regression run (Delta 3.3.2, ~24h with the in-flight macOS host 
this time) just finished:
   
   - **Total: 13,612** tests run
   - **Passed: 13,555** (was 13,437 at the prior baseline)
   - **Failed: 57** (was 139)
   - Canceled: 5, Ignored: 3,683
   
   Net **82 tests cleared** since the previous full run, via:
   
   | Commit | Cluster cleared |
   | --- | --- |
   | `da19481a` | DV-prefix override revert (CheckpointsSuite + entire 
DeletionVectorsSuite, ~30 tests) |
   | `bb35abf0` | File-splitting infra (latent — no direct count, lays 
groundwork for future kernel-path large-file scans) |
   | `b1fce4f8` | Synthetic-column gate (DeltaParquetFileFormatSuite, 9 tests) |
   | `7d8c3b0b` | AQE logical-link preservation on Comet exchange wrappers 
(broad fix — touched IdentityColumnIngestionScalaSuite alone for 18, plus 
knock-on across MERGE / CDF / streaming-watermark plans) |
   
   ### Remaining 57 fails — top clusters
   
   | Cluster | Count | Suites |
   | --- | --- | --- |
   | PredicatePushdown `partitions.size === 2` (DV-bypass, partition splitting) 
| 15 | DeletionVectorsWithPredicatePushdownSuite |
   | `huge table: read/delete… 2B rows with DVs` | 4 | DeletionVectorsSuite 
(variants) |
   | `query with predicates should skip partitions` (numFiles metric on Comet 
exec) | 2 | DeltaSuite + ColumnMapping variants |
   | `optimization not supported - join …` | 4 | OptimizeMetadataOnlyDeltaQuery 
(Id + Name CM variants) |
   | `create table with NOT NULL - check violation through file writing` | 3 | 
DeltaDDLSuite (Id + Name + Hive variants) |
   | `update-metrics` / `delete-metrics` / `merge-metrics` | ~6 | 
UpdateMetricsSuite / DeleteMetricsSuite |
   | streaming progress / `numInputRows == 0` (`basic`, `streaming`, `SC-8810`, 
`SC-11561`, `recreate the reservoir…`, `initial snapshot ends at base index of 
next version`, `allow to change schema…`, `skip change commits`, `can delete 
old files of a snapshot without update`, `can consume new data without update`) 
| ~16 | DeltaSourceSuite / DeltaSourceLargeLogSuite |
   | `partitioned writing and batch reading` (CM variants) | 3 | DeltaSinkSuite 
(+ Id/Name CM variants) |
   | `column mapping batch scan should detect physical name changes` | 1 | 
DeltaColumnMappingSuite |
   | `explicit id matching` | 1 | DeltaColumnMappingSuite |
   | `use TIMESTAMP_NTZ in a partition column` | 1 | DeltaTimestampNTZSuite |
   | `Validate that links to docs in DeltaErrors are correct` | 1 | 
DeltaErrorsSuite (HTTP 301 — environmental) |
   | Misc | balance | various |
   
   ### Next step
   
   Three highest-yield clusters for the next stretch:
   
   1. **Streaming progress (~16 tests)** — likely a single fix in how 
`CometDeltaNativeScanExec` populates `numOutputRows` / `numInputRows`. Worth 
one targeted look.
   2. **PredicatePushdown DV-bypass (15)** — already-traced cluster; needs the 
DV-fallback to either go through the kernel path (so `splitTasks` from 
`bb35abf0` activates) or a separate split fix in the fallback path.
   3. **DDL "NOT NULL violation through file writing" (3) + 
UpdateMetrics/DeleteMetrics/MergeMetrics (~6)** — likely a single scan-metric 
fix where the test reads `numFiles` off `c.wrapped` but the metric lives on the 
Comet exec.
   
   Going to start with (1) since streaming has the largest single-cluster yield 
and the fix is most likely localised.
   
   Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to