Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

via GitHub Tue, 12 Dec 2023 13:20:46 -0800


rdblue commented on code in PR #183:
URL: https://github.com/apache/iceberg-python/pull/183#discussion_r1424585279



##########
pyiceberg/io/pyarrow.py:
##########
@@ -713,28 +714,50 @@ def primitive(self, primitive: pa.DataType) -> 
Optional[T]:
         """Visit a primitive type."""
 
 
-def _get_field_id(field: pa.Field) -> Optional[int]:
-    for pyarrow_field_id_key in PYARROW_FIELD_ID_KEYS:
-        if field_id_str := field.metadata.get(pyarrow_field_id_key):
-            return int(field_id_str.decode())
-    return None
+class _ConvertToIceberg(PyArrowSchemaVisitor[Union[IcebergType, Schema]]):
+    counter: count[int]
+    missing_is_metadata: Optional[bool]
 
+    def __init__(self) -> None:
+        self.counter = count()

Review Comment:
   > I happen to have an iceberg table (migrated from delta lake) whose parquet 
files contain no field-id. With this change, I am now able to use pyiceberg to 
read its data.
   
   There is already a way to assign field IDs when they are not in a data file, 
using a name mapping. All reads that need to infer field IDs must use a name 
mapping rather than assigning IDs per data file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Arrow: Allow missing field-ids from Schema [iceberg-python]

Reply via email to