Fokko commented on issue #1100:
URL: 
https://github.com/apache/iceberg-python/issues/1100#issuecomment-2312887591

   Agree with @sungwy that this is mostly a documentation issue, so let's 
extend the docs so ChatGPT can give better answers.
   
   Another solution would be:
   
   ```python
   ts_table = catalog.create_table_if_not_exists(
       'default.time_series',
       schema=ts_schema,
       location = "local_s3"
   )
   with ts_table.update_spec() as update_spec:
       update_spec.add_identity("campaign_id")
   ```
   
   This will first create the table, and then set the spec, but that's probably 
alright. 
   
   > Since we require the field ID, we should only accept the PyIceberg Schema.
   
   I don't think this is the most user-friendly option. In the end, we don't 
want to put the burden of field-IDs on the users. Keep in mind that they also 
get re-assigned:
   
   ```python
   ts_schema = Schema(
       NestedField(field_id=1925, name="timestamp", field_type=TimestampType(), 
required=True),
   )
   
   ts_table = catalog.create_table('default.time_series', schema=ts_schema)
   
   assert ts_table.schema.fields[0] == 1  # Field-ID starts now from 1 as they 
are being re-assigned.
   ```
   
   Another thing I noticed:
   
   ```
   ┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
   ┃ Table field                      ┃ Dataframe field                  ┃
   ╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
   │ ❌ │ 1: timestamp: required timestamp │ 1: timestamp: optional timestamp │
   │ ❌ │ 2: campaign_id: required int     │ 2: campaign_id: optional long    │
   │ ❌ │ 3: temperature: optional float   │ 3: temperature: optional double  │
   │ ❌ │ 4: pressure: optional float      │ 4: pressure: optional double     │
   │ ❌ │ 5: humidity: optional int        │ 5: humidity: optional long       │
   │ ❌ │ 6: led_0: optional boolean       │ 6: led_0: optional long
   ```
   
   Arrow by default sets everything to nullable, while there are no nulls in 
the data. We could check if the nullable is set correctly by checking if there 
are any null-records. This could become expensive when the table is big, so we 
probably only want to do it when we actually want to write an optional field to 
a required field in the table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to