sungwy opened a new pull request, #963: URL: https://github.com/apache/iceberg-python/pull/963
Currently, we optionally support catalog names being specified in the identifier string. In other words, this means that we currently support identifier names that look like `catalog_name.table_namespace.table_name` or `table_namespace.table_name` being passed into `catalog.load_table()` method. This PR proposes to remove this optional support, and change the `Table` class's identifier to be `(table_namespace, table_name)` instead of `(catalog_name, table_namespace, table_name)`. The reasons are as follows: 1. The Identifier is specified as namespace + table_name according to the REST Catalog Open API Spec 2. `Table` has the attribute `catalog`; hence the catalog name is accessible as: `tbl.catalog.name`, and prepending the catalog name is redundant. If we would like to create a long identifier including the catalog name, it may be better to leave the `identifier` field as namespace + name pair, and update the `name` attribute to be `self.catalog.name` + `self.identifier` instead. 3. `identifier_to_tuple_without_catalog` is called in all `catalog` implementations, and is crucial in allowing `Table` to use the long identifier including the catalog_name. However, this function is imperfect, and will result in errors if the the table_namespace uses a hierarchical name that is repeats the catalog name. For example, namespace: `default.lower` and catalog_name: `default`. When we pass `default.lower.table_name` to refer to the table `table_name` in namespace `default.lower` on the catalog `default`, we will run into an unexpected exception as it will remove the first hierarchy of the namespace, and look for `lower.table_name`. 4. Fixes: https://github.com/apache/iceberg-python/issues/742 Thankfully, we have not explicitly documented that we support catalog_name in the identifier string in our API documentation, and other engine partners do not seem to be relying on this feature as well (checking one example of [Daft](https://github.com/Eventual-Inc/Daft/blob/764d2df9e7c6916b3cfda3e6be682af6430bc90b/tests/integration/iceberg/conftest.py#L54)) It would be great to get this change in for 0.7.0 release as certain rest catalog implementations are having issues with the current approach: https://github.com/hansetag/iceberg-catalog/issues/18 However this is backward incompatible change that takes away an optional feature, so I would like to put this PR up as a first step in facilitating the discussion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org