asheeshgarg commented on issue #208: URL: https://github.com/apache/iceberg-python/issues/208#issuecomment-1908955580
@Fokko @jqin61 Today I tried basic example on partition write from pyiceberg.io.pyarrow import schema_to_pyarrow import pyarrow as pa from pyarrow import parquet as pq data = {'key': ['001', '001', '002', '002'], 'value_1': [10, 20, 100, 200], 'value_2': ['a', 'b', 'a', 'b']} my_partitioning = pa.dataset.partitioning(pa.schema([pa.field("key", pa.string())]), flavor='hive') TABLE_SCHEMA = Schema( NestedField(field_id=1, name="key", field_type=StringType(), required=False), NestedField(field_id=2, name="value_1", field_type=StringType(), required=False), NestedField(field_id=3, name="value_2", field_type=StringType(), required=False), ) schema = schema_to_pyarrow(TABLE_SCHEMA) patbl = pa.Table.from_pydict(data) pq.write_to_dataset(patbl,'partitioned_data',partitioning=my_partitioning,schema=schema) If I don't use schema in write it works fine. But if I pass the schema create schema = schema_to_pyarrow(TABLE_SCHEMA) It fails with ArrowTypeError: Item has schema key: string value_1: int64 value_2: string which does not match expected schema key: string -- field metadata -- PARQUET:field_id: '1' value_1: string -- field metadata -- PARQUET:field_id: '2' value_2: string -- field metadata -- PARQUET:field_id: '3' I also tried the parquet write the way we are doing currenlty writer = pq.ParquetWriter("test", schema=schema, version="1.0") writer.write_table(patbl) ValueError: Table schema does not match schema used to create file: table: key: string value_1: int64 value_2: string vs. file: key: string -- field metadata -- PARQUET:field_id: '1' value_1: string -- field metadata -- PARQUET:field_id: '2' value_2: string -- field metadata -- PARQUET:field_id: '3 Do we do any other transformation for the schema before we write in current write support. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org