coolderli commented on issue #9249: URL: https://github.com/apache/iceberg/issues/9249#issuecomment-1849747058
@Fokko Thanks for your reply. I think we can use a table for the files that have schema. If a file does not have a schema, tables cannot be used. Of course, files can be directly used on object storage, but directories are difficult to manage I am wondering if it is possible to provide external references to files stored on object storage in a unified directory structure, such as using a fixed prefix : /volume/catalog_ Name/database_ Name/volume_ Name We can record the actual object storage address of the file in the iceberg manifest file, so that we can also provide a snapshot without relying on atomic renaming of object storage This does indeed reduce the read and write performance of file semantics. If submitted too frequently, it may cause small file problems, but it is possible if we use Spark for batch commit. In addition, I found that Spark does not have a catalog of the files, so maybe it is possible to implement a FileCatalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org