[GitHub] [iceberg] pavibhai opened a new pull request, #7197: [ORC] - Support selected vector with ORC reader on the row and batch reader

via GitHub Fri, 24 Mar 2023 09:02:41 -0700


pavibhai opened a new pull request, #7197:
URL: https://github.com/apache/iceberg/pull/7197


   ## What?
   Currently Iceberg does not support the use of the selected vector when 
reading ORC Files. This requires reads on ORC to be run in compatibility mode 
by not setting `orc.filter.use.selected` in the presence of filter processing 
that is triggered via `orc.sarg.to.filter`.
   
   Filter processing was introduced as part of 
[ORC-744](https://issues.apache.org/jira/browse/ORC-744) where ORC has the 
ability to filter out records and indicate this status in partially filtered 
batches using the selected vector in VectorizedRowBatch.
   
   This PR uses the selected vector to determine valid rows when applicable.
   
   ## Why?
   ORC can only operate in compatibility mode by not setting . Enabling this 
will further hasten the processing of rows by ignoring rows that are already 
filtered out in the batch.
   
   ## Tested?
   New Unit tests have been added to verify the behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pavibhai opened a new pull request, #7197: [ORC] - Support selected vector with ORC reader on the row and batch reader

Reply via email to