isc-patrick commented on issue #1148: URL: https://github.com/apache/iceberg-python/issues/1148#issuecomment-2352758701
I think you were correct about removing ensure_tables_exist. The question comes down to what is the best method to use for checking the existence of a table. Create_all() relies on the default method in SQLAlchemy, https://docs.sqlalchemy.org/en/20/core/internals.html#sqlalchemy.engine.Dialect.has_table, which uses the DB metadata. This is also what the Java implementation is doing: https://github.com/mrcnc/iceberg/blob/928e52ae888f601e1d8d81ae2405d677db10f470/core/src/main/java/org/apache/iceberg/jdbc/JdbcCatalog.java#L155 Also, the current ensure_tables_exists method has the following issues: - It doesn't really check if a table exists, only whether a SELECT statement can be executed on that table. In most RBAC based permissioning, a table can exist and the user could see that the table exists because they have permissions to the DB, but cannot SELECT from that table because a SELECT is a different permission, which would result in a false negative. - It also could fail for other reasons, any other Programming or Database Exception that the Dailect raises, and result in a false positive. This also restricts the databases that are supported to just the ones that throw the errors that are included in the except clause. This was actaully the starting point for my PR because my companies database throws a different error. So I will add the flag init_catalog_tables and can add back the _ensure_tables_exist method if that is desired. If I add that back, the only question is what DB Exceptions should be handled if the SELECT fails? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org