Fokko commented on issue #1604: URL: https://github.com/apache/iceberg-rust/issues/1604#issuecomment-3205827542
@ZENOTME Are you benchmarking against a local FS? I think one of the important objectives is to be efficient against Object-Stores, something that isn't taken into consideration with a local FS. I share @liurenjie1024's concern of opening up the Parquet file in the driver/kernel, since the overhead of opening a file is significant (miliseconds until first byte). > Planning phase typically happens master/driver node in a distributed compute engine. That's true, but not always the case. For exameple, Spark leverages distributed planning. Each manifest file has a target size of 8MB, which could be dispatched for distributed planning. I would flip it around, what's the problem if an executor opens up a Parquet file and reads very little data. This is unlikely since we have the min/max of the file, but the row-group could still be skipped if it doesn't hit the min/max there. Parsing the Parquet footer is also pretty slow, especially for wide schema's, so I would be reluctant to send that over the wire. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
