sungwy opened a new pull request, #994: URL: https://github.com/apache/iceberg-python/pull/994
Today in PyIceberg, we have support for identifier parsing in public APIs belonging to two different classes: - Catalog class: load_table, purge_table, drop_table - Table class: scan These APIs currently have optional support for the identifier that the instance itself belongs to. For example, the catalog class APIs support: ``` catalog = load_catalog(“animals”, **properties) catalog.load_table(“cats.whiskers”) ``` But it also supports: `catalog.load_table(“animals.cats.whiskers”)` Which is redundant, because the catalog.name is already “animals”. Similarly, row_filter in the Table scan API supports: ``` table = catalog.load_table(“cats.whiskers”) table.scan(row_filter=”n_legs== 4”) ``` But we also support `table.scan(row_filter=”whiskers.n_legs == 4”)` Which is also redundant, because the table name is already “whiskers” (or cats.whiskers) The benefits of this change are as follows: - As observed above, specifying instance-level identifier in these APIs is redundant - This optional support adds a lot of complexity to the code base and leads to issues like: #742 <https://github.com/apache/iceberg-python/issues/742> It would be really great to clean this up before as we prepare for a 1.0.0 later this year - The optional support opens up the possibility of resulting in correctness issues if there exists a name in the level below as the instance-level identifier. - For example, if in the above catalog, we have a table namespace named “animals.lower” catalog.load(“animals.lower.cats”) can be construed as table name “cats” in the namespace “animals.lower” but it will be interpreted as table name “cats” in the namespace “lower” which is erroneous. - We would see a similar issue with tables and field names as well. Field name parsing is already complicated because we have to represent nested fields as flat representations. So it would be great to remove one unnecessary level of complication here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org