pdpark commented on PR #829:
URL: https://github.com/apache/iceberg-python/pull/829#issuecomment-2218062095

   FYI: I tried using the changed `__init__.py` file in this commit to fix the 
"Mismatch in fields" error when calling `append`, but I was still getting the 
error. 
   
   The problem appears to be that one of the schemas after conversion has a 
`doc` field and the other schema does not. One of the the schema conversions 
uses: `visit_pyarrow(schema, _HasIds())` and the other uses: 
`visit_pyarrow(schema, _ConvertToIcebergWithoutIDs())`.
   
   The following hack worked - `ADDED` and `CHANGED` comments added for clarity:
   
   ```
       from pyiceberg.io.pyarrow import _pyarrow_to_schema_without_ids, 
pyarrow_to_schema
   
       name_mapping = table_schema.name_mapping
       try:
           task_schema = pyarrow_to_schema(other_schema, 
name_mapping=name_mapping)
       except ValueError as e:
           other_schema = _pyarrow_to_schema_without_ids(other_schema)
           additional_names = set(other_schema.column_names) - 
set(table_schema.column_names)
           raise ValueError(
               f"PyArrow table contains more columns: {', 
'.join(sorted(additional_names))}. Update the schema first (hint, use 
union_by_name)."
           ) from e
   
       # --> ADDED:
       table_schema_fields = [str(f) for f in table_schema.fields]
       task_schema_fields = [str(f) for f in table_schema.fields]
   
       # --> CHANGED:
       if table_schema_fields != task_schema_fields:
           from rich.console import Console
           from rich.table import Table as RichTable
   
           console = Console(record=True)
   
           rich_table = RichTable(show_header=True, header_style="bold")
           rich_table.add_column("")
           rich_table.add_column("Table field")
           rich_table.add_column("Dataframe field")
   
           for lhs_field in table_schema.fields:
               try:
                   # --> CHANGED:
                   rhs = str(task_schema.find_field(lhs_field.field_id))
                   lhs = str(lhs_field)
                   rich_table.add_row("✅" if lhs == rhs else "❌", lhs, rhs)**
               except ValueError:
                   rich_table.add_row("❌", str(lhs), "Missing")
   
           console.print(rich_table)
           raise ValueError(f"Mismatch in fields:\n{console.export_text()}")
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to