[GitHub] [iceberg] pavibhai opened a new issue, #7191: Add support for selected vector based processing on ORC Files

via GitHub Thu, 23 Mar 2023 17:27:57 -0700


pavibhai opened a new issue, #7191:
URL: https://github.com/apache/iceberg/issues/7191


   ### Feature Request / Improvement
   
   ## What?
   Currently Iceberg does not support the use of the selected vector when 
reading ORC Files. This requires reads on ORC to be run in compatibility mode 
by not setting `orc.filter.use.selected` in the presence of filter processing 
that is triggered via `orc.sarg.to.filter`.
   
   Filter processing was introduced as part of 
[ORC-744](https://issues.apache.org/jira/browse/ORC-744) where ORC has the 
ability to filter out records and indicate this status in partially filtered 
batches using the selected vector in VectorizedRowBatch.
   
   ## Why?
   ORC can only operate in compatibility mode by not setting . Enabling this 
will further hasten the processing of rows by ignoring rows that are already 
filtered out in the batch.
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] pavibhai opened a new issue, #7191: Add support for selected vector based processing on ORC Files

Reply via email to