Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

via GitHub Mon, 08 Apr 2024 22:03:24 -0700


kevinjqliu commented on issue #584:
URL: https://github.com/apache/iceberg-python/issues/584#issuecomment-2044152708


   > I would argue that the Python one is correct
   
   Yeah me too. But I think Java Iceberg doesn't support this since parquet 
files with `ABC-GG-1-A` column will be read as Iceberg column 
`ABC_x2DGG_x2D1_x2DA`. I think it's worth opening an issue to track this in the 
main Iceberg repo.
   
   PyIceberg can already read a parquet file with a "sanitized" column name, 
this is handled by #83. 
[file_project_schema](https://github.com/apache/iceberg-python/pull/83/files#diff-8d5e63f2a87ead8cebe2fd8ac5dcf2198d229f01e16bb9e06e21f7277c328abdR834),
 which is the schema with "sanitized" column name, is used to read the parquet 
files.
   
   So we just need to ensure that PyIceberg writes parquet files with the same 
behavior as Java Iceberg. 
   
   #590 verifies that fixing the write behavior will fix the entire roundtrip 
described in the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] [BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name [iceberg-python]

Reply via email to