sungwy opened a new pull request, #963:
URL: https://github.com/apache/iceberg-python/pull/963

   Currently, we optionally support catalog names being specified in the 
identifier string. In other words, this means that we currently support 
identifier names that look like `catalog_name.table_namespace.table_name` or 
`table_namespace.table_name` being passed into `catalog.load_table()` method.
   
   This PR proposes to remove this optional support, and change the `Table` 
class's identifier to be `(table_namespace, table_name)` instead of 
`(catalog_name, table_namespace, table_name)`. The reasons are as follows:
   1. The Identifier is specified as namespace + table_name according to the 
REST Catalog Open API Spec
   2. `Table` has the attribute `catalog`; hence the catalog name is accessible 
as: `tbl.catalog.name`, and prepending the catalog name is redundant. If we 
would like to create a long identifier including the catalog name, it may be 
better to leave the `identifier` field as namespace + name pair, and update the 
`name` attribute to be `self.catalog.name` + `self.identifier` instead.
   3. `identifier_to_tuple_without_catalog` is called in all `catalog` 
implementations, and is crucial in allowing `Table` to use the long identifier 
including the catalog_name. However, this function is imperfect, and will 
result in errors if the the table_namespace uses a hierarchical name that is 
repeats the catalog name. For example, namespace: `default.lower` and 
catalog_name: `default`. When we pass `default.lower.table_name` to refer to 
the table `table_name` in namespace `default.lower` on the catalog `default`, 
we will run into an unexpected exception as it will remove the first hierarchy 
of the namespace, and look for `lower.table_name`.
   4. Fixes: https://github.com/apache/iceberg-python/issues/742
   
   Thankfully, we have not explicitly documented that we support catalog_name 
in the identifier string in our API documentation, and other engine partners do 
not seem to be relying on this feature as well (checking one example of 
[Daft](https://github.com/Eventual-Inc/Daft/blob/764d2df9e7c6916b3cfda3e6be682af6430bc90b/tests/integration/iceberg/conftest.py#L54))
   
   It would be great to get this change in for 0.7.0 release as certain rest 
catalog implementations are having issues with the current approach: 
https://github.com/hansetag/iceberg-catalog/issues/18
   
   However this is backward incompatible change that takes away an optional 
feature, so I would like to put this PR up as a first step in facilitating the 
discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to