Re: [I] Can databricks volume be implemented on Iceberg? [iceberg]

via GitHub Mon, 11 Dec 2023 02:19:21 -0800


coolderli commented on issue #9249:
URL: https://github.com/apache/iceberg/issues/9249#issuecomment-1849747058


   @Fokko Thanks for your reply. I think we can use a table for the files that 
have schema. 
   
   If a file does not have a schema, tables cannot be used. Of course, files 
can be directly used on object storage, but directories are difficult to manage
   
   I am wondering if it is possible to provide external references to files 
stored on object storage in a unified directory structure, such as using a 
fixed prefix : /volume/catalog_ Name/database_ Name/volume_ Name
   
   We can record the actual object storage address of the file in the iceberg 
manifest file, so that we can also provide a snapshot without relying on atomic 
renaming of object storage
   
   This does indeed reduce the read and write performance of file semantics. If 
submitted too frequently, it may cause small file problems, but it is possible 
if we use Spark for batch commit. In addition, I found that Spark does not have 
a catalog of the files, so maybe it is possible to implement a FileCatalog.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Can databricks volume be implemented on Iceberg? [iceberg]

Reply via email to