schenksj commented on PR #3932: URL: https://github.com/apache/datafusion-comet/pull/3932#issuecomment-4363668195
### Full regression re-baseline A fresh full-regression run (Delta 3.3.2, ~24h with the in-flight macOS host this time) just finished: - **Total: 13,612** tests run - **Passed: 13,555** (was 13,437 at the prior baseline) - **Failed: 57** (was 139) - Canceled: 5, Ignored: 3,683 Net **82 tests cleared** since the previous full run, via: | Commit | Cluster cleared | | --- | --- | | `da19481a` | DV-prefix override revert (CheckpointsSuite + entire DeletionVectorsSuite, ~30 tests) | | `bb35abf0` | File-splitting infra (latent — no direct count, lays groundwork for future kernel-path large-file scans) | | `b1fce4f8` | Synthetic-column gate (DeltaParquetFileFormatSuite, 9 tests) | | `7d8c3b0b` | AQE logical-link preservation on Comet exchange wrappers (broad fix — touched IdentityColumnIngestionScalaSuite alone for 18, plus knock-on across MERGE / CDF / streaming-watermark plans) | ### Remaining 57 fails — top clusters | Cluster | Count | Suites | | --- | --- | --- | | PredicatePushdown `partitions.size === 2` (DV-bypass, partition splitting) | 15 | DeletionVectorsWithPredicatePushdownSuite | | `huge table: read/delete… 2B rows with DVs` | 4 | DeletionVectorsSuite (variants) | | `query with predicates should skip partitions` (numFiles metric on Comet exec) | 2 | DeltaSuite + ColumnMapping variants | | `optimization not supported - join …` | 4 | OptimizeMetadataOnlyDeltaQuery (Id + Name CM variants) | | `create table with NOT NULL - check violation through file writing` | 3 | DeltaDDLSuite (Id + Name + Hive variants) | | `update-metrics` / `delete-metrics` / `merge-metrics` | ~6 | UpdateMetricsSuite / DeleteMetricsSuite | | streaming progress / `numInputRows == 0` (`basic`, `streaming`, `SC-8810`, `SC-11561`, `recreate the reservoir…`, `initial snapshot ends at base index of next version`, `allow to change schema…`, `skip change commits`, `can delete old files of a snapshot without update`, `can consume new data without update`) | ~16 | DeltaSourceSuite / DeltaSourceLargeLogSuite | | `partitioned writing and batch reading` (CM variants) | 3 | DeltaSinkSuite (+ Id/Name CM variants) | | `column mapping batch scan should detect physical name changes` | 1 | DeltaColumnMappingSuite | | `explicit id matching` | 1 | DeltaColumnMappingSuite | | `use TIMESTAMP_NTZ in a partition column` | 1 | DeltaTimestampNTZSuite | | `Validate that links to docs in DeltaErrors are correct` | 1 | DeltaErrorsSuite (HTTP 301 — environmental) | | Misc | balance | various | ### Next step Three highest-yield clusters for the next stretch: 1. **Streaming progress (~16 tests)** — likely a single fix in how `CometDeltaNativeScanExec` populates `numOutputRows` / `numInputRows`. Worth one targeted look. 2. **PredicatePushdown DV-bypass (15)** — already-traced cluster; needs the DV-fallback to either go through the kernel path (so `splitTasks` from `bb35abf0` activates) or a separate split fix in the fallback path. 3. **DDL "NOT NULL violation through file writing" (3) + UpdateMetrics/DeleteMetrics/MergeMetrics (~6)** — likely a single scan-metric fix where the test reads `numFiles` off `c.wrapped` but the metric lives on the Comet exec. Going to start with (1) since streaming has the largest single-cluster yield and the fix is most likely localised. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
