grbinho opened a new issue, #1728:
URL: https://github.com/apache/iceberg-python/issues/1728

   ### Apache Iceberg version
   
   0.8.1 (latest release)
   
   ### Please describe the bug 🐞
   
   Hi,
   
   I'm using pyiceberg with Glue and S3 and while creating a table that 
contains list types, I noticed that element_id do not match the id's defined in 
the schema itself.
   This specifically happens when using `ListType` with element type being a 
primitive type.
   The element id i give to the list type is not maintained.
   It is assigned ids that follow the last id of the root fields.
   
   My expectation is that ids would be preserved.
   Inserting data obviously fails due to schema missmatch.
   
   Here is the example
   
   Python object
   ```
   NestedField(field_id=1, name='__export_id', field_type=StringType(), 
required=True), 
   NestedField(field_id=2, name='__export_timestamp', 
field_type=TimestampType(), required=True), 
   NestedField(field_id=3, name='id', field_type=IntegerType(), required=True), 
   NestedField(field_id=4, name='first_name', field_type=StringType(), 
required=True), 
   NestedField(field_id=5, name='last_name', field_type=StringType(), 
required=True), 
   NestedField(field_id=6, name='email', field_type=StringType(), 
required=False), 
   NestedField(field_id=7, name='telephone', field_type=StringType(), 
required=False), 
   NestedField(field_id=8, name='timezone', field_type=StringType(), 
required=True), 
   NestedField(field_id=9, name='has_access_to_all_future_projects', 
field_type=BooleanType(), required=True), 
   NestedField(field_id=10, name='is_contractor', field_type=BooleanType(), 
required=True), 
   NestedField(field_id=11, name='is_active', field_type=BooleanType(), 
required=True), 
   NestedField(field_id=12, name='weekly_capacity', field_type=IntegerType(), 
required=True), 
   NestedField(field_id=13, name='default_hourly_rate', 
field_type=DecimalType(precision=14, scale=2), required=False), 
   NestedField(field_id=14, name='cost_rate', 
field_type=DecimalType(precision=14, scale=2), required=False), 
   NestedField(field_id=15, name='roles', field_type=ListType(type='list', 
element_id=16, element_type=StringType(), element_required=False), 
required=True), 
   NestedField(field_id=17, name='access_roles', 
field_type=ListType(type='list', element_id=18, element_type=StringType(), 
element_required=False), required=True), 
   NestedField(field_id=19, name='created_at', field_type=TimestampType(), 
required=True), 
   NestedField(field_id=20, name='updated_at', field_type=TimestampType(), 
required=True)
   ```
   
   Iceberg metadata
   ```
   {
    
       "schemas": [
           {
               "type": "struct",
               "fields": [
                   {
                       "id": 1,
                       "name": "__export_id",
                       "type": "string",
                       "required": true,
                       "doc": "Unique identifier of the run that wrote this 
data."
                   },
                   {
                       "id": 2,
                       "name": "__export_timestamp",
                       "type": "timestamp",
                       "required": true,
                       "doc": "Timestamp of when export that wrote this data 
started."
                   },
                   {
                       "id": 3,
                       "name": "id",
                       "type": "int",
                       "required": true,
                       "doc": "Unique id of the user"
                   },
                   {
                       "id": 4,
                       "name": "first_name",
                       "type": "string",
                       "required": true,
                       "doc": "First name of the user"
                   },
                   {
                       "id": 5,
                       "name": "last_name",
                       "type": "string",
                       "required": true,
                       "doc": "Last name of the user"
                   },
                   {
                       "id": 6,
                       "name": "email",
                       "type": "string",
                       "required": false,
                       "doc": "Email address of the user"
                   },
                   {
                       "id": 7,
                       "name": "telephone",
                       "type": "string",
                       "required": false,
                       "doc": "The user's telephone number"
                   },
                   {
                       "id": 8,
                       "name": "timezone",
                       "type": "string",
                       "required": true,
                       "doc": "The user's timezone"
                   },
                   {
                       "id": 9,
                       "name": "has_access_to_all_future_projects",
                       "type": "boolean",
                       "required": true,
                       "doc": "Whether the user should be automatically added 
to future projects"
                   },
                   {
                       "id": 10,
                       "name": "is_contractor",
                       "type": "boolean",
                       "required": true,
                       "doc": "Whether the user is a contractor or an employee"
                   },
                   {
                       "id": 11,
                       "name": "is_active",
                       "type": "boolean",
                       "required": true,
                       "doc": "Whether the user is active or archived"
                   },
                   {
                       "id": 12,
                       "name": "weekly_capacity",
                       "type": "int",
                       "required": true,
                       "doc": "The number of hours per week this person is 
available to work (in seconds, in half hour increments)"
                   },
                   {
                       "id": 13,
                       "name": "default_hourly_rate",
                       "type": "decimal(14, 2)",
                       "required": false,
                       "doc": "The billable rate to use for this user when they 
are added to a project"
                   },
                   {
                       "id": 14,
                       "name": "cost_rate",
                       "type": "decimal(14, 2)",
                       "required": false,
                       "doc": "The cost rate to use for this user when 
calculating a projects cost vs billable amount"
                   },
                   {
                       "id": 15,
                       "name": "roles",
                       "type": {
                           "type": "list",
                           "element-id": 19,
                           "element": "string",
                           "element-required": false
                       },
                       "required": true,
                       "doc": "Descriptive names of the business roles assigned 
to this user. They have no effect on the permissions. Can be used for 
reporting."
                   },
                   {
                       "id": 16,
                       "name": "access_roles",
                       "type": {
                           "type": "list",
                           "element-id": 20,
                           "element": "string",
                           "element-required": false
                       },
                       "required": true,
                       "doc": "Access roles that determine users permissions."
                   },
                   {
                       "id": 17,
                       "name": "created_at",
                       "type": "timestamp",
                       "required": true,
                       "doc": "Date and time the time entry was created"
                   },
                   {
                       "id": 18,
                       "name": "updated_at",
                       "type": "timestamp",
                       "required": true,
                       "doc": "Date and time the time entry was updated"
                   }
               ],
               "schema-id": 0,
               "identifier-field-ids": [
                   1
               ]
           }
       ]
   }
   ```
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to