[I] Can databricks volume be implemented on Iceberg? [iceberg]

via GitHub Thu, 07 Dec 2023 23:51:21 -0800


coolderli opened a new issue, #9249:
URL: https://github.com/apache/iceberg/issues/9249

### Query engine

_No response_

### Question

Recently, I was researching solutions for managing unstructured files and
discovered [the volume of
databricks](https://docs.databricks.com/en/data-governance/unity-catalog/create-volumes.html).
I was wondering if it could be implemented on Iceberg. Here is my simple idea.

Using catalogs and volumes to manage unstructured files can facilitate
better data governance, such as lifecycle management.

I envision a volume as a logical volume that can contain actual files and
achieve transaction isolation through snapshots.

For example, mapping files
`volume://catalog_name/database_name/table_name/mydb/my-volume/file1` to
`s3://bucket_ Name/mydb/my volume/file1`,
`volume://catalog_name/database_name/table_name/mydb/my-volume/file2` to
`abfss://azure_account_name/container_name/mydb/my-volumn/file2`

Another consideration is that I want to read file rather than read a table,
as file access is supported in many deep learning frameworks such as
TensorFlow. And the formats supported by these frameworks are relatively
difficult to structure.

Let's take Spark as an example, I prefer to use
`Spark. read(). csv（“
volume://catalog_name/database_name/table_name/mydb/my-volume/ ")`
not `spark. read(). table (" catalogname. databasename. tablename ")`

But I found that there is a lack of api support on engines like Spark, and I
was wondering if it's worth trying

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Can databricks volume be implemented on Iceberg? [iceberg]

Reply via email to