pvary commented on PR #14435: URL: https://github.com/apache/iceberg/pull/14435#issuecomment-3551701479
> Great point and example! You're right! I think there could be two solutions: > > Solution 1: Group files by row ID continuity > > Group files with continuous _row_id ranges, preserve IDs by setting merged file's firstRowId to match. This is typically not an option, and very hard to find continuous stretches of row_ids. > Solution 2: Write _row_id as physical column > > Read virtual _row_id from each row and write as physical column. Can we do this without rewriting the rowgroup? Do we still have gains compared to the "normal" read/write compaction? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
