pavibhai opened a new pull request, #7197: URL: https://github.com/apache/iceberg/pull/7197
## What? Currently Iceberg does not support the use of the selected vector when reading ORC Files. This requires reads on ORC to be run in compatibility mode by not setting `orc.filter.use.selected` in the presence of filter processing that is triggered via `orc.sarg.to.filter`. Filter processing was introduced as part of [ORC-744](https://issues.apache.org/jira/browse/ORC-744) where ORC has the ability to filter out records and indicate this status in partially filtered batches using the selected vector in VectorizedRowBatch. This PR uses the selected vector to determine valid rows when applicable. ## Why? ORC can only operate in compatibility mode by not setting . Enabling this will further hasten the processing of rows by ignoring rows that are already filtered out in the batch. ## Tested? New Unit tests have been added to verify the behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
