[PR] fix(table/scanner): Fix nested field scan [iceberg-go]

via GitHub Thu, 20 Feb 2025 11:31:01 -0800


zeroshade opened a new pull request, #311:
URL: https://github.com/apache/iceberg-go/pull/311


   Fixes #309 
   
   Requires https://github.com/apache/arrow-go/pull/293 to get merged first
   
   There was a combination of factors that caused the initial problem:
   
   1. The arrow-go/v18/parquet/pqarrow library wasn't properly propagating 
`PARQUET:field_id` metadata for children of List or Map typed fields
   2. We only iterated the *fields* and skipped list/maptypes when selecting 
column indexes, this caused us to miss the children. Instead we need to iterate 
all of the *field IDs*, this change updates that.
   3. When pruning parquet fields we were not propagating the correct ColIndex 
for map typed columns, we want the leaves so we need the ColIndex of the 
children
   4. creating the output arrays during `ToRequestedSchema` led to a memory 
leak for list/map columns that needed to be fixed.
   
   A unit test has been added to ensure we are properly able to read the 
`test_all_types` table and get the rows without error.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[PR] fix(table/scanner): Fix nested field scan [iceberg-go]

Reply via email to