kevinjqliu commented on issue #1255: URL: https://github.com/apache/iceberg-python/issues/1255#issuecomment-2442979795
> So I expect that the value of the second record other than "string_field_1" to be null when I insert these records into the iceberg table using pyiceberg. ``` import pyarrow as pa schema = pa.schema( [ pa.field("string_field_1", pa.string(), True), pa.field("int_field_1", pa.int32(), True), pa.field("float_field_1", pa.float32(), True), pa.field( "struct_field_1", pa.struct( [ pa.field("string_nested_1", pa.string()), pa.field("int_item_2", pa.int32()), pa.field("float_item_2", pa.float32()), ] ), ), pa.field("list_field_1", pa.list_(pa.string())), pa.field("list_field_2", pa.list_(pa.int32())), pa.field("list_field_3", pa.list_(pa.float32())), pa.field("map_field_1", pa.map_(pa.string(), pa.string())), pa.field("map_field_2", pa.map_(pa.string(), pa.int32())), pa.field("map_field_3", pa.map_(pa.string(), pa.float32())), ] ) records = [ { "string_field_1": "field_1", "int_field_1": 123, "float_field_1": 1.23, "struct_field_1": { "string_nested_1": "nest_1", "int_item_2": 1234, "float_item_2": 1.234, }, "list_field_1": ["a", "b", "c"], "list_field_2": [1, 2, 3], "list_field_3": [0.1, 0.2, 0.3], "map_field_1": {"a": "b", "b": "c"}, "map_field_2": {"a": 1, "b": 2}, "map_field_3": {"a": 0.1, "b": 0.2}, }, { "string_field_1": "field_1_b", }, ] pyarrow_table: pa.Table = pa.Table.from_pylist(records, schema=schema) print(pyarrow_table["struct_field_1"].to_pandas()) ``` returns ``` 0 {'string_nested_1': 'nest_1', 'int_item_2': 12... 1 None Name: struct_field_1, dtype: object ``` which confirms the value is None in pyarrow. After appending to the table, is the record None when you read it back? `table.scan().to_pandas()` > I then checked the table using AWS Athena, but the "struct_field_1" of the second record is not null. It is not clear to me that Athena returns a non-null value in the example you provided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org