ccancellieri opened a new issue, #2122:
URL: https://github.com/apache/iceberg-python/issues/2122

   Dear all,
    I'm working on a GCP environment and I'm configuring pyIceberg to work over 
the BigLake API Metastore catalog.
   
   I'm pretty satisfied of the result (it almost works!) but I've a blocking 
issue that prevent me to instanciate the Catalog.
   
   The issue is located here:
   
https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L181
   
   Pydantic in fact validates the output of the Catalog requiring a mandatory 
field called "defaults".
   
   This is unfortunately NOT returned by the BigLake catalog and the result is 
that we're not able to correctly instantiate the catalog.
   
   I'm now testing the catalog using the following configuration:
   
   `
   config = {
       "type": "rest",
       "uri": "https://biglake.googleapis.com/iceberg/v1beta/restcatalog";,
       "warehouse": gcs_warehouse_path,
       "py-io-impl": "pyiceberg.io.pyarrow.PyArrowFileIO", # Crucial for GCS
       "rest-metrics-reporting-enabled": "false", # Disable metrics reporting 
if not needed
       "oauth2-server-uri": "https://oauth2.googleapis.com/token";,
       "token": access_token,
       "header.x-goog-user-project": biglake_project_id,
       # Optional: Set the logging level for pyiceberg if you need more debug 
info
       "pyiceberg.logging-level": "DEBUG",
   }
   `
   
   For this reason instead of forking I would like to ask to apply the followin 
fix if possible:
   
   
   ```
   class ConfigResponse(IcebergBaseModel):
       defaults: Optional[Properties] = Field(default={})
       overrides: Properties = Field()
   ```
   
   This will allow _fetch_config() to not fail while passing the 
response.json() to the ConfigResponse constructor 
[here](https://github.com/apache/iceberg-python/blob/f71806ee816cf0fb1e7f785aec81932741a0c6ca/pyiceberg/catalog/rest/__init__.py#L353):
   
   ```
   def _fetch_config(self) -> None:
           params = {}
           if warehouse_location := self.properties.get(WAREHOUSE_LOCATION):
               params[WAREHOUSE_LOCATION] = warehouse_location
   
           with self._create_session() as session:
               response = session.get(self.url(Endpoints.get_config, 
prefixed=False), params=params)
           try:
               response.raise_for_status()
           except HTTPError as exc:
               self._handle_non_200_response(exc, {})
           config_response = ConfigResponse(**response.json())
   
           config = config_response.defaults
           config.update(self.properties)
           config.update(config_response.overrides)
           self.properties = config
   ```
   
   Doing this I'm able to have a working BigLake catalog and all the calls are 
working now.
    
   _Another issue is that list_namespaces() and list_tables() are failing in a 
similar way since BigLake is not returning an empty list but we could survive 
catching the exception and creating the first namespace and table, this works 
and after that all the calls are working fine._
   
   I'm not sure about the Iceberg spec but I hope we could apply the suggested 
fix so we will be able to use pyIceberg with no issue also in GCP!!!
   
   Thanks all.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to