[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6997: Python: Infer Iceberg schema from the Parquet file

via GitHub Thu, 27 Apr 2023 13:06:06 -0700


JonasJ-ap commented on code in PR #6997:
URL: https://github.com/apache/iceberg/pull/6997#discussion_r1179626777



##########
python/pyiceberg/io/pyarrow.py:
##########
@@ -507,11 +709,8 @@ def _file_to_table(
         schema_raw = None
         if metadata := physical_schema.metadata:
             schema_raw = metadata.get(ICEBERG_SCHEMA)
-        if schema_raw is None:
-            raise ValueError(
-                "Iceberg schema is not embedded into the Parquet file, see 
https://github.com/apache/iceberg/issues/6505";
-            )
-        file_schema = Schema.parse_raw(schema_raw)
+        # TODO: if field_ids are not present, Name Mapping should be 
implemented to look them up in the table schema

Review Comment:
   I created the issue: https://github.com/apache/iceberg/issues/7451.
   
   But I am not sure if what is the proper way to raise exception in this case. 
Based on my understanding, name mapping is also needed if portion of parquet 
fields miss the field ids. However, in this case, `pyarrow_to_schema` can still 
generate a valid iceberg schema for the the rest of parquet fields. It seems we 
should not raise exception in this case.
   
   Should we only raise exception when no field id exist in the data file? I 
think we can also log some warning messages when a pyarrow field containing a 
field id. What do you think?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] JonasJ-ap commented on a diff in pull request #6997: Python: Infer Iceberg schema from the Parquet file

Reply via email to