Fokko commented on issue #1604:
URL: https://github.com/apache/iceberg-rust/issues/1604#issuecomment-3205827542

   @ZENOTME Are you benchmarking against a local FS? I think one of the 
important objectives is to be efficient against Object-Stores, something that 
isn't taken into consideration with a local FS.
   
   I share @liurenjie1024's concern of opening up the Parquet file in the 
driver/kernel, since the overhead of opening a file is significant (miliseconds 
until first byte).
   
   > Planning phase typically happens master/driver node in a distributed 
compute engine.
   
   That's true, but not always the case. For exameple, Spark leverages 
distributed planning. Each manifest file has a target size of 8MB, which could 
be dispatched for distributed planning.
   
   I would flip it around, what's the problem if an executor opens up a Parquet 
file and reads very little data. This is unlikely since we have the min/max of 
the file, but the row-group could still be skipped if it doesn't hit the 
min/max there. Parsing the Parquet footer is also pretty slow, especially for 
wide schema's, so I would be reluctant to send that over the wire.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to