ox opened a new issue, #2342:
URL: https://github.com/apache/iceberg-python/issues/2342

   ### Apache Iceberg version
   
   0.9.1 (latest release)
   
   ### Please describe the bug 🐞
   
   I've been experimenting with the latest pyiceberg version, 0.10.0 on master 
b/c I was having some other issue on 0.9.1, and have been getting issues 
committing table updates. For context I'm using Ray Data as my data engine (Ray 
Data uses pyIceberg internally), Google BigLake as my REST catalog, and Google 
Storage as a warehouse.
   
   My code looks something like this:
   
   ```py
   catalog_kwargs = {
       "type": "rest",
       "uri": "https://biglake.googleapis.com/iceberg/v1beta/restcatalog";,
       "warehouse": "gs://some-warehouse-bucket",
   
       "header.x-goog-user-project": "my-project",
       "header.X-Iceberg-Access-Delegation": "remote-signing",
       
       "auth": {
           "type": "google",
       },  
   }
   
   # load the dataset
   dataset = ray.data.read_parquet("gs://some-data-bucket/some-file.parquet")
   
   # hack to get iceberg-compatible schema
   refs = dataset.to_arrow_refs()
   schema = ray.get(refs[0]).schema
   
   # write
   table_identifier = "test_namespace.test_data"
   
   # create the table in iceberg
   catalog.create_table_if_not_exists(table_identifier, schema=schema)
   
   dataset.write_iceberg(table_identifier=table_identifier, 
catalog_kwargs=catalog_kwargs)
   ```
   
   The series of requests that Ray Data makes are to:
   
   1. Get the table metadata
   2. Write data to the warehouse
   3. Commit updates to the table
   
   The issue comes in step 3 where I get a `INVALID_ARGUMENT` error from 
BigLake. I traced the calls and the last payload looks like:
   
   ```
   {
     "identifier":{
       "namespace":["test_namespace"],
       "name":"test_data"
     },
     "requirements":[
       {"type":"assert-ref-snapshot-id","ref":"main"},
       
{"type":"assert-table-uuid","uuid":"689f1a7d-0000-2589-aca7-d4f547fce244"}
     ],
     "updates":[ ... ]
   }
   ```
   
   The `assert-ref-snapshot-id` requirement is missing the `snapshot-id` field 
which should be set to `null`, not absent from the requirement. We contacted 
BigLake/BigQuery support and they said their parser expects that key to exist. 
The [openAPI REST catalog 
spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3138-L3155)
 says that `snapshot-id` is required.
   
   I believe this means that the pyIceberg client implementation here does not 
meet the spec.
   
   
   ### Willingness to contribute
   
   - [x] I can contribute a fix for this bug independently
   - [x] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to