stevenzwu commented on issue #9410: URL: https://github.com/apache/iceberg/issues/9410#issuecomment-1890156116
ah. I didn't know it is a batch read mode using `asOfSnapshotId`. note that they are `delete` (not `deleted`) files to capture the row-level deletes. the actual files are not loaded during scan planning in jobmanager/coordinator node. splits only contains the locations of those delete files. the problem is that a equality file can be associated with many data files. that is probably why you are seeing many of them in one split. that is unfortunate implication of equality deletes. skipping those delete files won't be correct. delete compaction that was suggested earlier should help. Did you use Spark for that? Spark batch should generate position deletes, which are easier for the read path? Regardless, I would agree with @pvary 's suggestion of writeBytes to fix the 64 KB size limit. curious how many delete files you saw in one split/data file? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org