[I] feat(table): public helper to build datafile from in-heap parquet metadata [iceberg-go]

via GitHub Thu, 11 Jun 2026 22:09:58 -0700


badalprasadsingh opened a new issue, #1186:
URL: https://github.com/apache/iceberg-go/issues/1186


   ### Feature Request / Improvement
   
   External writers that create Parquet files in-process already hold 
`*metadata.FileMetaData` after `(*file.Writer).Close()`. After uploading the 
bytes to object storage, they need a fully populated iceberg.DataFile with 
per-column statistics to commit via `Transaction.AddDataFiles` or `RowDelta`.
   
   Today the only path is `Transaction.AddFiles(filePaths)`, which re-opens 
each file from storage to read the footer — the exact metadata the caller 
already has in memory. This adds unnecessary I/O on every commit.
   
   **Proposed Solution**
   
   A new exported function at package table, which would validate the inputs, 
call the already existing internals to extract statistics, set EqualityFieldIDs 
when Content == EntryContentEqDeletes and return the DataFile.
   
   **Contribution**
   
   I’d be happy to work on this enhancement. Please feel free to assign this 
issue to me and I can prepare a PR implementing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] feat(table): public helper to build datafile from in-heap parquet metadata [iceberg-go]

Reply via email to