a-agmon commented on issue #265:
URL: https://github.com/apache/iceberg-rust/issues/265#issuecomment-2013869724

   Perhaps I am missing something, but I was running [this simple 
test](https://gist.github.com/a-agmon/65fe8e6f065404f039937befbbfa401e) on a 
small parquet file (65MB) and a simple predicate (column country code).
   This is the result I saw:
   
   ```
   Predicate KR - row count: 12660 with_filter: true => time taken: 656.518875ms
   Predicate KR - row count: 12660 with_filter: false => time taken: 
844.822917ms
   Predicate US - row count: 158015 with_filter: true => time taken: 
1.085824833s
   Predicate US - row count: 158015 with_filter: false => time taken: 
862.845125ms
   ```
   
   As you can see, when the values are "less common" (as in KR predicate), and 
I guess that skipping is beneficial, we see that row filter improves perf. But 
when the predicate is very common (as in the US predicate), and I guess it 
might exist in almost every batch then row filter in fact has a negative impact
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to