sungwy commented on issue #1032:
URL: 
https://github.com/apache/iceberg-python/issues/1032#issuecomment-2278456915

   Hi @jkleinkauff , that's indeed an interesting observation.
   
   I have some follow up questions to help us understand it better.
   1. Where are your files stored?
   2. Is there a way we can profile your IO and plot it against your IO 
Download limit?
   
   As a way of comparison, I just ran a scan using to_arrow against a table 
that has 63, 5.5Mb Parquet files comprising the table. I'd imagine a table with 
less files to take less time to return (although the limit function here should 
ensure that we aren't even reading the other parquet files past the first one)
   
   It returned in 6 seconds.
   
   
   Your observation that limit 1~100 took similar times makes sense to me as 
well. If you have 100+ Mb files, you are going to have to download the same 
amount of data regardless to return the limited result.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to