SGA-taichi-kato commented on issue #1255: URL: https://github.com/apache/iceberg-python/issues/1255#issuecomment-2442934911
Hi @kevinjqliu Here we define the pyarrow schema. What I'm asking about is the struct field "struct_field_1". ```python from pyiceberg.catalog import load_catalog import pyarrow as pa schema = pa.schema( [ pa.field("string_field_1", pa.string(), True), pa.field("int_field_1", pa.int32(), True), pa.field("float_field_1", pa.float32(), True), pa.field( "struct_field_1", pa.struct( [ pa.field("string_nested_1", pa.string()), pa.field("int_item_2", pa.int32()), pa.field("float_item_2", pa.float32()), ] ), ), pa.field("list_field_1", pa.list_(pa.string())), pa.field("list_field_2", pa.list_(pa.int32())), pa.field("list_field_3", pa.list_(pa.float32())), pa.field("map_field_1", pa.map_(pa.string(), pa.string())), pa.field("map_field_2", pa.map_(pa.string(), pa.int32())), pa.field("map_field_3", pa.map_(pa.string(), pa.float32())), ] ) ``` And then, I create the two records, but the second record has no value other than "string_field_1". So I expect that the value of the second record other than "string_field_1" to be null when I insert these records into the iceberg table using pyiceberg. ```python records = [ { "string_field_1": "field_1", "int_field_1": 123, "float_field_1": 1.23, "struct_field_1": { "string_nested_1": "nest_1", "int_item_2": 1234, "float_item_2": 1.234, }, "list_field_1": ["a", "b", "c"], "list_field_2": [1, 2, 3], "list_field_3": [0.1, 0.2, 0.3], "map_field_1": {"a": "b", "b": "c"}, "map_field_2": {"a": 1, "b": 2}, "map_field_3": {"a": 0.1, "b": 0.2}, }, { "string_field_1": "field_1_b", }, ] ``` And, I inserted the records above to glue iceberg table. ```python catalog = load_catalog( "glue", **{ "type": "glue", "glue.region": "us-west-2", "s3.region": "us-west-2", }, ) table_name = "iceberg_test" location = f"s3://tmp_bucket/test/iceberg/{table_name}" catalog.drop_table(f"test.{table_name}") table = catalog.create_table( f"test.{table_name}", schema, location=location, ) pyarrow_table: pa.Table = pa.Table.from_pylist(records, schema=schema) table.append(pyarrow_table) ``` I then checked the table using AWS Athena, but the "struct_field_1" of the second record is not null. So I'm asking you about why does this occur, and how can I avoid it. ``` "string_field_1","int_field_1","float_field_1","struct_field_1","list_field_1","list_field_2","list_field_3","map_field_1","map_field_2","map_field_3" "field_1","123","1.23","{string_nested_1=nest_1, int_item_2=1234, float_item_2=1.234}","[a, b, c]","[1, 2, 3]","[0.1, 0.2, 0.3]","{a=b, b=c}","{a=1, b=2}","{a=0.1, b=0.2}" "field_1_b",,,"{string_nested_1=, int_item_2=0, float_item_2=0.0}",,,,,, ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org