coolderli opened a new issue, #9249:
URL: https://github.com/apache/iceberg/issues/9249

   ### Query engine
   
   _No response_
   
   ### Question
   
   
   Recently, I was researching solutions for managing unstructured files and 
discovered [the volume of 
databricks](https://docs.databricks.com/en/data-governance/unity-catalog/create-volumes.html).
 I was wondering if it could be implemented on Iceberg. Here is my simple idea.
   
   Using catalogs and volumes to manage unstructured files can facilitate 
better data governance, such as lifecycle management.
   
   I envision a volume as a logical volume that can contain actual files and 
achieve transaction isolation through snapshots.
   
   For example, mapping files
    `volume://catalog_name/database_name/table_name/mydb/my-volume/file1` to 
`s3://bucket_ Name/mydb/my volume/file1`,
   `volume://catalog_name/database_name/table_name/mydb/my-volume/file2` to 
`abfss://azure_account_name/container_name/mydb/my-volumn/file2`
   
   Another consideration is that I want to read file rather than read a table, 
as file access is supported in many deep learning frameworks such as 
TensorFlow. And the formats supported by these frameworks are relatively 
difficult to structure.
   
   Let's take Spark as an example, I prefer to use 
   `Spark. read(). csv(“ 
volume://catalog_name/database_name/table_name/mydb/my-volume/ ")` 
   not `spark. read(). table (" catalogname. databasename. tablename ")`
   
   But I found that there is a lack of api support on engines like Spark, and I 
was wondering if it's worth trying


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to