ccancellieri opened a new issue, #2122: URL: https://github.com/apache/iceberg-python/issues/2122
Dear all, I'm working on a GCP environment and I'm configuring pyIceberg to work over the BigLake API Metastore catalog. I'm pretty satisfied of the result (it almost works!) but I've a blocking issue that prevent me to instanciate the Catalog. The issue is located here: https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L181 Pydantic in fact validates the output of the Catalog requiring a mandatory field called "defaults". This is unfortunately NOT returned by the BigLake catalog and the result is that we're not able to correctly instantiate the catalog. I'm now testing the catalog using the following configuration: ` config = { "type": "rest", "uri": "https://biglake.googleapis.com/iceberg/v1beta/restcatalog", "warehouse": gcs_warehouse_path, "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", # Crucial for GCS "rest-metrics-reporting-enabled": "false", # Disable metrics reporting if not needed "oauth2-server-uri": "https://oauth2.googleapis.com/token", "token": access_token, "header.x-goog-user-project": biglake_project_id, # Optional: Set the logging level for pyiceberg if you need more debug info "pyiceberg.logging-level": "DEBUG", } ` For this reason instead of forking I would like to ask to apply the followin fix if possible: ``` class ConfigResponse(IcebergBaseModel): defaults: Optional[Properties] = Field(default={}) overrides: Properties = Field() ``` This will allow _fetch_config() to not fail while passing the response.json() to the ConfigResponse constructor [here](https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L353): ``` def _fetch_config(self) -> None: params = {} if warehouse_location := self.properties.get(WAREHOUSE_LOCATION): params[WAREHOUSE_LOCATION] = warehouse_location with self._create_session() as session: response = session.get(self.url(Endpoints.get_config, prefixed=False), params=params) try: response.raise_for_status() except HTTPError as exc: self._handle_non_200_response(exc, {}) config_response = ConfigResponse(**response.json()) config = config_response.defaults config.update(self.properties) config.update(config_response.overrides) self.properties = config ``` Doing this I'm able to have a working BigLake catalog and all the calls are working now. _Another issue is that list_namespaces() and list_tables() are failing in a similar way since BigLake is not returning an empty list but we could survive catching the exception and creating the first namespace and table, this works and after that all the calls are working fine._ I'm not sure about the Iceberg spec but I hope we could apply the suggested fix so we will be able to use pyIceberg with no issue also in GCP!!! Thanks all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org