SGA-taichi-kato commented on issue #1255:
URL: 
https://github.com/apache/iceberg-python/issues/1255#issuecomment-2442934911

   Hi @kevinjqliu
   Here we define the pyarrow schema. What I'm asking about is the struct field 
"struct_field_1".
   ```python
   from pyiceberg.catalog import load_catalog
   import pyarrow as pa
   
   schema = pa.schema(
       [
           pa.field("string_field_1", pa.string(), True),
           pa.field("int_field_1", pa.int32(), True),
           pa.field("float_field_1", pa.float32(), True),
           pa.field(
               "struct_field_1",
               pa.struct(
                   [
                       pa.field("string_nested_1", pa.string()),
                       pa.field("int_item_2", pa.int32()),
                       pa.field("float_item_2", pa.float32()),
                   ]
               ),
           ),
           pa.field("list_field_1", pa.list_(pa.string())),
           pa.field("list_field_2", pa.list_(pa.int32())),
           pa.field("list_field_3", pa.list_(pa.float32())),
           pa.field("map_field_1", pa.map_(pa.string(), pa.string())),
           pa.field("map_field_2", pa.map_(pa.string(), pa.int32())),
           pa.field("map_field_3", pa.map_(pa.string(), pa.float32())),
       ]
   )
   ```
   
   And then, I create the two records, but the second record has no value other 
than "string_field_1".
   So I expect that the value of the second record other than "string_field_1" 
to be null when I insert these records into the iceberg table using pyiceberg.
   ```python
   records = [
       {
           "string_field_1": "field_1",
           "int_field_1": 123,
           "float_field_1": 1.23,
           "struct_field_1": {
               "string_nested_1": "nest_1",
               "int_item_2": 1234,
               "float_item_2": 1.234,
           },
           "list_field_1": ["a", "b", "c"],
           "list_field_2": [1, 2, 3],
           "list_field_3": [0.1, 0.2, 0.3],
           "map_field_1": {"a": "b", "b": "c"},
           "map_field_2": {"a": 1, "b": 2},
           "map_field_3": {"a": 0.1, "b": 0.2},
       },
       {
           "string_field_1": "field_1_b",
       },
   ]
   ```
   
   And, I inserted the records above to glue iceberg table.
   ```python
   catalog = load_catalog(
       "glue",
       **{
           "type": "glue",
           "glue.region": "us-west-2",
           "s3.region": "us-west-2",
       },
   )
   
   
   table_name = "iceberg_test"
   location = f"s3://tmp_bucket/test/iceberg/{table_name}"
   catalog.drop_table(f"test.{table_name}")
   table = catalog.create_table(
       f"test.{table_name}",
       schema,
       location=location,
   )
   
   
   pyarrow_table: pa.Table = pa.Table.from_pylist(records, schema=schema)
   table.append(pyarrow_table)
   ```
   
   I then checked the table using AWS Athena, but the "struct_field_1" of the 
second record is not null.
   So I'm asking you about why does this occur, and how can I avoid it.
   
   ```
   
"string_field_1","int_field_1","float_field_1","struct_field_1","list_field_1","list_field_2","list_field_3","map_field_1","map_field_2","map_field_3"
   "field_1","123","1.23","{string_nested_1=nest_1, int_item_2=1234, 
float_item_2=1.234}","[a, b, c]","[1, 2, 3]","[0.1, 0.2, 0.3]","{a=b, 
b=c}","{a=1, b=2}","{a=0.1, b=0.2}"
   "field_1_b",,,"{string_nested_1=, int_item_2=0, float_item_2=0.0}",,,,,,
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to