Re: [PR] Parquet, Arrow: Refactor vectorized reader [iceberg]

via GitHub Fri, 15 Mar 2024 08:58:06 -0700


wgtmac commented on PR #9772:
URL: https://github.com/apache/iceberg/pull/9772#issuecomment-1999964002


   Not yet. My rough plan is to do following things:
   
   1. add a new VectorizedValuesReader base class to supporting different 
encodings. This is similar to what spark does but reading into arrow field 
vector: 
:https://github.com/apache/spark/blob/b7aa9740249b50ad9db254626c530ff5bc33d385/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedValuesReader.java#L30
   2. extend VectorizedValuesReader to add v2 encodings one by one.
   3. support vectorized readers for nested types.
   
   This patch is the step 1 above and only added vectorized reading interfaces 
for float/double/int32/int64 physical types. It already shows the big picture 
that how following steps will be done. More interfaces will be added for other 
physical types and logical types into arrow field vectors will be added 
progressively for better review experience.
   
   Does this make sense to you? @nastra 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Parquet, Arrow: Refactor vectorized reader [iceberg]

Reply via email to