Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

via GitHub Sun, 14 Jul 2024 09:10:21 -0700


syun64 commented on code in PR #921:
URL: https://github.com/apache/iceberg-python/pull/921#discussion_r1676970868



##########
pyiceberg/io/pyarrow.py:
##########
@@ -1450,14 +1451,17 @@ def field_partner(self, partner_struct: 
Optional[pa.Array], field_id: int, _: st
             except ValueError:
                 return None
 
-            if isinstance(partner_struct, pa.StructArray):
-                return partner_struct.field(name)
-            elif isinstance(partner_struct, pa.Table):
-                return partner_struct.column(name).combine_chunks()
-            elif isinstance(partner_struct, pa.RecordBatch):
-                return partner_struct.column(name)
-            else:
-                raise ValueError(f"Cannot find {name} in expected 
partner_struct type {type(partner_struct)}")
+            try:
+                if isinstance(partner_struct, pa.StructArray):
+                    return partner_struct.field(name)
+                elif isinstance(partner_struct, pa.Table):
+                    return partner_struct.column(name).combine_chunks()
+                elif isinstance(partner_struct, pa.RecordBatch):
+                    return partner_struct.column(name)
+                else:
+                    raise ValueError(f"Cannot find {name} in expected 
partner_struct type {type(partner_struct)}")
+            except KeyError:

Review Comment:
   Yeah I as I pointed out in this comment: 
https://github.com/apache/iceberg-python/pull/921#discussion_r1676751929 I 
think `write_parquet` is using the Table Schema, instead of the Schema 
corresponding to the data types of the PyArrow construct.
   
   I will take that to mean that this isn't intended and making sure that we 
use the Schema corresponding to the data types of the PyArrow construct is what 
we intend to do here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Allow writing `pa.Table` that are either a subset of table schema or in arbitrary order [iceberg-python]

Reply via email to