kevinjqliu commented on issue #1589:
URL: 
https://github.com/apache/iceberg-python/issues/1589#issuecomment-2646078141

   Thanks for chiming in here! I took some time to look into this. Here's what 
I found:
   
   ### On Catalog Configuration
   The configuration precedence is described in the REST spec for the `/config` 
endpoint. Based on the REST spec, here is the expected behavior for catalog 
configurations. 
   
https://github.com/apache/iceberg/blob/d935460bbed5da0ac205a67912b70fa682ee84df/open-api/rest-catalog-open-api.yaml#L81-L98
   Mainly, 
   ```
   Catalog configuration is constructed by setting the defaults, then client-
           provided configuration, and finally overrides. The final property 
set is then
           used to configure the catalog.
   ```
   
   The REST catalog implementation does indeed follow this order. 
   
https://github.com/apache/iceberg-python/blob/826a00628216993de50f7f0c2111b6c824b02939/pyiceberg/catalog/rest.py#L397-L400
   
   So when Nessie [added the 
feature](https://github.com/projectnessie/nessie/pull/9868/files#diff-35b0e3dabf006d8b221a85dbed21058b2a79856d533015afa740d565514a6eb3R267)
 to send `py-io-impl=pyiceberg.io.fsspec.FsspecFileIO` using the configurations 
endpoint (`/config`), it is sent as part of the configurations endpoint's 
`default` properties. This means the properties specified in `load_catalog` 
should override this value if it is provided. And the server can use `override` 
properties to trump over the `load_catalog` properties. 
   
   To summarize:
   * Client can override `/config`'s `default` parameters
   * `config`'s `override` parameters can override client parameters
   
   ### On Table Configuration
   Here's the REST spec for the `loadTable` endpoint 
   
https://github.com/apache/iceberg/blob/d935460bbed5da0ac205a67912b70fa682ee84df/open-api/rest-catalog-open-api.yaml#L920-L925
   ```
           Load a table from the catalog.
   
   
           The response contains both configuration and table metadata. The 
configuration, if non-empty is used
           as additional configuration for the table that overrides catalog 
configuration. For example, this
           configuration may change the FileIO implementation to be used for 
the table.
   ```
   This corresponds with 
https://github.com/apache/iceberg-python/blob/826a00628216993de50f7f0c2111b6c824b02939/pyiceberg/catalog/rest.py#L527-L531
   
   `self._load_file_io` here is defined in the base catalog class and will 
include the catalog's properties 
   
https://github.com/apache/iceberg-python/blob/826a00628216993de50f7f0c2111b6c824b02939/pyiceberg/catalog/__init__.py#L728
   
   Expanding the parameters 
   ```
   {**self.properties , {**table_response.metadata.properties, 
**table_response.config}}
   ->
   {**{**config_response.defaults, **self.properties, 
**config_response.overrides}, {**table_response.metadata.properties, 
**table_response.config}}
   -> 
   **{config_response.defaults, self.properties, config_response.overrides, 
table_response.metadata.properties, table_response.config}
   ```
   where `config_response` is from `/config` endpoint, `self.properties` is the 
properties given to `load_catalog`, and `table_response` is from `loadTable`. 
Note, `table_response.metadata.properties` should be the same as the table's 
own property. 
   
   ### Conclusion
   I think the real culprit is from [Nessie `icebergConfigDefaults` setting the 
`py-io-impl` 
config](https://github.com/projectnessie/nessie/pull/9868/files#diff-35b0e3dabf006d8b221a85dbed21058b2a79856d533015afa740d565514a6eb3R267).
 `icebergConfigDefaults` is called by 
[`configureIcebergTable`](https://github.com/projectnessie/nessie/blob/9ae0f670a8a67aa573612cde4f547aac0b61f112/catalog/files/impl/src/main/java/org/projectnessie/catalog/files/s3/S3ObjectIO.java#L174-L190)
 which is called by 
[`icebergConfigPerTable`](https://github.com/projectnessie/nessie/blob/main/catalog/service/rest/src/main/java/org/projectnessie/catalog/service/rest/IcebergConfigurer.java#L216-L255)
 and then finally called by 
[loadTable](https://github.com/projectnessie/nessie/blob/main/catalog/service/rest/src/main/java/org/projectnessie/catalog/service/rest/IcebergApiV1TableResource.java#L276-L298)
 
   
   This means every `loadTable` response from Nessie will include the config to 
override `py-io-impl`.
   And based on the hierarchy above, this will always override any parameters 
set by the config endpoint, the catalog itself, and the table's property. 
   
   In conclusion, pyiceberg REST client is working as intended. Nessie should 
not send `py-io-impl` config override as part of the loadTable response. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to