aokolnychyi commented on code in PR #8278:
URL: https://github.com/apache/iceberg/pull/8278#discussion_r1291690525
##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -231,6 +231,10 @@ private TableProperties() {}
public static final String ORC_BATCH_SIZE =
"read.orc.vectorization.batch-size";
public static final int ORC_BATCH_SIZE_DEFAULT = 5000;
+ public static final String DELETE_COLUMN_STATS_FILTERING_ENABLED =
+ "read.delete.column-stats-filtering.enabled";
Review Comment:
I reconsidered that a bit after spending more time profiling.
- We want to get rid of any extra overhead (like Stream API) and transforms
if stats filtering is disabled. The deletes are being looked up in tight loops
and MoR use cases are frequently needed for larger tables where CoW is just too
expensive. That means we need to look up deletes for a large number of data
files. Any extra work matters. All of this overhead is present in flamegraphs.
This also becomes even more critical when data files are loaded in memory
(distributed planning).
- We want to skip even loading delete stats if they will not be used.
- We may reconsider how our paths are being generated and sometimes include
a temporal part to facilitate filtering. That won't be always possible but we
can't assume the current way of generating file names is the only way to do
that.
- We want a way to disable column stats filtering for some equality delete
use cases where filtering is not beneficial. We don't know how useful stats for
equality deletes are. Even if they are different, they may be selective. So we
can't skip indexing equality delete stats if they are different but they may be
still useless.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]