a-agmon commented on issue #265: URL: https://github.com/apache/iceberg-rust/issues/265#issuecomment-2013869724
Perhaps I am missing something, but I was running [this simple test](https://gist.github.com/a-agmon/65fe8e6f065404f039937befbbfa401e) on a small parquet file (65MB) and a simple predicate (column country code). This is the result I saw: ``` Predicate KR - row count: 12660 with_filter: true => time taken: 656.518875ms Predicate KR - row count: 12660 with_filter: false => time taken: 844.822917ms Predicate US - row count: 158015 with_filter: true => time taken: 1.085824833s Predicate US - row count: 158015 with_filter: false => time taken: 862.845125ms ``` As you can see, when the values are "less common" (as in KR predicate), and I guess that skipping is beneficial, we see that row filter improves perf. But when the predicate is very common (as in the US predicate), and I guess it might exist in almost every batch then row filter in fact has a negative impact -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org