[I] Questions around Iceberg-rust [iceberg-rust]

via GitHub Tue, 09 Jul 2024 09:24:14 -0700


ChristianCasazza opened a new issue, #450:
URL: https://github.com/apache/iceberg-rust/issues/450


   Hello, I had some questions around Iceberg-rust regarding data interactions 
with S3, authn, and authz.
   
   1. How does connecting an Iceberg catalog with a specific S3 bucket work? I 
understand the structure on S3 with dividing a table into parquet data files 
and avro metadata files, but I am not sure how the relationship between this 
file organization and a deployed catalog works, and how to configure that 
exactly.
   
   2. Where does Pyiceberg fit into Iceberg-rust? Would it be possible to 
deploy Iceberg-rust on the server side, and interact with the rest catalog 
through Pyiceberg? I like python as a nice interface for data consumers to 
interact with a catalog, and for basic management of tables.
   
   3. What are the write table options with an Iceberg rust? As of now, is it 
only possible with a distributed engine like Spark or Trino? What would be the 
bottlenecks to duckdb, polars, or Ibis+backend writes? The vast majority of my 
datasets are less than 50Gb currently, and most workloads a fraction of that. I 
would like to use Iceberg for its superior data management vs files, but 
initially for use cases that can mostly be done on a single node and don't 
really need the power of distributed engines.
   
   4. How does authentication and authorization work with the current 
Iceberg-rust? The access control system I described above works for AWS S3 and 
sharing files. Any pointers about where I could learn to integrate IAM 
permissions into a catalog and tables? It seems the creators of 
https://github.com/hansetag/iceberg-catalog are in the middle of implementing 
some of these exact features. I would love to contribute on these features and 
implement for my use case. It seems the way it works where non-AWS credentials 
are vended to consumers, and the catalog uses AWS credentials to sign S3 
requests for the users, but I am not sure. I am also not sure how this 
implementation compares with the open-sourced implementation released by 
Databricks.
   
   5. Where exactly does OpenDAL fit into the Iceberg-rust catalog? Would 
OpenDAL help standardize accessing data from the catalog? The custom metadata 
https://github.com/apache/opendal/issues/4842 feature could also be useful for 
connecting tables to different authz commands.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Questions around Iceberg-rust [iceberg-rust]

Reply via email to