zeroshade opened a new pull request, #1114:
URL: https://github.com/apache/iceberg-go/pull/1114

   Restores main CI to green by reverting only the test-expectation byte counts 
that #1102 ("fix(manifest): stop writing deprecated distinct_counts") 
inadvertently changed.
   
   ## What broke
   
   After #1102 merged to main, both the `Go` and `Audit and Verify` workflows 
started failing on every push, across all 6 matrix entries 
(ubuntu/windows/macos × Go 1.25.5/1.26.1):
   
   ```
   --- FAIL: TestTableWriting (10.16s)
       --- FAIL: TestTableWriting/TestAddFilesPartitionedTable
       --- FAIL: TestTableWriting/TestAddFilesUnpartitioned
       --- FAIL: TestTableWriting/TestReplaceDataFiles
       --- FAIL: TestTableWriting/TestAddFilesPartitionedTable#01
       --- FAIL: TestTableWriting/TestAddFilesUnpartitioned#01
       --- FAIL: TestTableWriting/TestReplaceDataFiles#01
   --- FAIL: TestPositionDeletePartitionedFanoutWriterProcessBatch (0.00s)
       --- FAIL: TestPositionDeletePartitionedFanoutWriterProcessBatch/success
       --- FAIL: 
TestPositionDeletePartitionedFanoutWriterProcessBatch/batch_with_records_having_different_file_paths
   ```
   
   with diffs like:
   
   ```
   - "added-files-size": "3590"   (actual on CI / arrow-go v18.6.0)
   + "added-files-size": "3070"   (test source, post-#1102)
   
   - ColumnSizes: 2147483545:88   (actual on CI / arrow-go v18.6.0)
   + ColumnSizes: 2147483545:86   (test source, post-#1102)
   ```
   
   ## Root cause
   
   These two test files assert on **exact byte counts** of the on-disk parquet 
files written by `arrow-go`. Those byte counts depend on the parquet writer's 
metadata encoding, which differs across arrow-go versions.
   
   #1102 lowered the expected counts (3590→3070, 1066→963, 1816→2132+→1816, 
4264→3687, 88→86, 174→172, 96→94, 187→185). Those new values only reproduce 
against an unreleased arrow-go build (presumably wired in via a local 
`go.work`). Against the pinned `github.com/apache/arrow-go/v18 v18.6.0` from 
`go.sum` — which every CI runner resolves — the writer still emits the original 
larger sizes, so the assertions fail every time.
   
   The PR's own CI runs were `CANCELLED` before completion, so this didn't 
surface at merge time.
   
   The functional change in #1102 (dropping `distinct_counts` field-id 111 from 
the manifest entry Avro schema) only alters **manifest** serialization, not the 
**data** parquet files whose sizes these `Summary`/`ColumnSizes` fields tally — 
so the test-value updates were always unrelated to the production fix and can 
be safely reverted on their own.
   
   ## Fix
   
   Revert only the eight byte-count lines back to the pre-#1102 values verified 
via `git show 51b3140^`:
   
   | File | Test | Field | Before | #1102 | This PR |
   |---|---|---|---|---|---|
   | `table/table_test.go:549,554` | `TestAddFilesUnpartitioned` | 
`added-/total-files-size` | 3590 | 3070 | **3590** |
   | `table/table_test.go:770,776` | `TestAddFilesPartitionedTable` | 
`added-/total-files-size` | 3590 | 3070 | **3590** |
   | `table/table_test.go:1136,1140,1144` | `TestReplaceDataFiles` | 
`added-/removed-/total-files-size` | 1066/2132/4264 | 963/1816/3687 | 
**1066/2132/4264** |
   | `table/pos_delete_partitioned_fanout_writer_test.go:77` | `success` | 
`ColumnSizes` | 88,174 | 86,172 | **88,174** |
   | `table/pos_delete_partitioned_fanout_writer_test.go:87` | 
`batch_with_records_having_different_file_paths` | `ColumnSizes` | 96,187 | 
94,185 | **96,187** |
   
   No production code changes — #1102's manifest fix is preserved intact.
   
   ## Verification
   
   Reproduced CI failure locally with `GOWORK=off go test ./...` (forces use of 
pinned v18.6.0):
   
   - **Before this PR**: same 8 sub-test failures with identical 
expected/actual diffs as CI.
   - **After this PR**: `ok` across every package — `iceberg-go`, 
`catalog/{glue,hadoop,hive,rest,sql}`, `cmd/iceberg`, `codec`, `config`, 
`internal`, `io`, `io/gocloud`, `puffin`, `table`, 
`table/{compaction,dv,internal,substrait}`, `view`, `view/internal`.
   - `go vet ./...` clean, `go build ./...` clean, LSP diagnostics clean on 
both touched files.
   
   ## Followup (out of scope)
   
   These tests are inherently brittle — they'll keep breaking on every arrow-go 
bump that nudges parquet metadata encoding. A future cleanup could replace 
exact byte assertions with bounds (`> 0`, monotonic relationships between 
added/removed/total) or assert on row counts only. Not addressing here to keep 
the diff minimal and unblock main.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to