ox opened a new issue, #2342:
URL: https://github.com/apache/iceberg-python/issues/2342
### Apache Iceberg version
0.9.1 (latest release)
### Please describe the bug 🐞
I've been experimenting with the latest pyiceberg version, 0.10.0 on master
b/c I was having some other issue on 0.9.1, and have been getting issues
committing table updates. For context I'm using Ray Data as my data engine (Ray
Data uses pyIceberg internally), Google BigLake as my REST catalog, and Google
Storage as a warehouse.
My code looks something like this:
```py
catalog_kwargs = {
"type": "rest",
"uri": "https://biglake.googleapis.com/iceberg/v1beta/restcatalog",
"warehouse": "gs://some-warehouse-bucket",
"header.x-goog-user-project": "my-project",
"header.X-Iceberg-Access-Delegation": "remote-signing",
"auth": {
"type": "google",
},
}
# load the dataset
dataset = ray.data.read_parquet("gs://some-data-bucket/some-file.parquet")
# hack to get iceberg-compatible schema
refs = dataset.to_arrow_refs()
schema = ray.get(refs[0]).schema
# write
table_identifier = "test_namespace.test_data"
# create the table in iceberg
catalog.create_table_if_not_exists(table_identifier, schema=schema)
dataset.write_iceberg(table_identifier=table_identifier,
catalog_kwargs=catalog_kwargs)
```
The series of requests that Ray Data makes are to:
1. Get the table metadata
2. Write data to the warehouse
3. Commit updates to the table
The issue comes in step 3 where I get a `INVALID_ARGUMENT` error from
BigLake. I traced the calls and the last payload looks like:
```
{
"identifier":{
"namespace":["test_namespace"],
"name":"test_data"
},
"requirements":[
{"type":"assert-ref-snapshot-id","ref":"main"},
{"type":"assert-table-uuid","uuid":"689f1a7d-0000-2589-aca7-d4f547fce244"}
],
"updates":[ ... ]
}
```
The `assert-ref-snapshot-id` requirement is missing the `snapshot-id` field
which should be set to `null`, not absent from the requirement. We contacted
BigLake/BigQuery support and they said their parser expects that key to exist.
The [openAPI REST catalog
spec](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml#L3138-L3155)
says that `snapshot-id` is required.
I believe this means that the pyIceberg client implementation here does not
meet the spec.
### Willingness to contribute
- [x] I can contribute a fix for this bug independently
- [x] I would be willing to contribute a fix for this bug with guidance from
the Iceberg community
- [ ] I cannot contribute a fix for this bug at this time
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]