steveloughran opened a new pull request, #15586:
URL: https://github.com/apache/iceberg/pull/15586

   
   Fixes #15353
   
   Improve file opening and read times by
   - keeping file status when known, using it in openFile() call to eliminate 
HEAD requests
   - choosing file input policy when reading a file 
(`Util.determineReadPolicy()`).
   
   ParquetIO already hands down file opening to parquet, which does the right 
thing.l
   What matters for it is retaining any FileStatus already obtained, which is 
what the changes in `TableMigrationUtil` do.
   
   It's a shame that parquet (currently) lacks a way to skip that stat() call 
which is does to get file length, as this adds a HEAD request to all openings 
of a parquet file where the length is known from a manifest. That is fixable 
and would save 100+mS per file opening, as well as the associated IO capacity.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to