liurenjie1024 commented on issue #450: URL: https://github.com/apache/iceberg-rust/issues/450#issuecomment-2220534721
> How does connecting an Iceberg catalog with a specific S3 bucket work? I understand the structure on S3 with dividing a table into parquet data files and avro metadata files, but I am not sure how the relationship between this file organization and a deployed catalog works, and how to configure that exactly. It depends on what catalog you are using. For hms/glue catalog, which could be classified as client side catalog, you need to setup hive metastore or glue server, and pass `warehouse` configuration to catalog builder. For rest catalog, it's the rest catalog server's responsibility to manage the location. > Where does Pyiceberg fit into Iceberg-rust? Would it be possible to deploy Iceberg-rust on the server side, and interact with the rest catalog through Pyiceberg? I like python as a nice interface for data consumers to interact with a catalog, and for basic management of tables. Currently there is no relationship between these two libraries, and they are just iceberg implementation in different languages. iceberg-rust is a library, so you can use it in a server, but you need to write server code by yourself. Since pyiceberg and iceberg-rust both implement iceberg spec, so you can in theory use iceberg-rust to write data into iceberg table, and use pyiceberg to read them, and vice verse. > What are the write table options with an Iceberg rust? As of now, is it only possible with a distributed engine like Spark or Trino? What would be the bottlenecks to duckdb, polars, or Ibis+backend writes? The vast majority of my datasets are less than 50Gb currently, and most workloads a fraction of that. I would like to use Iceberg for its superior data management vs files, but initially for use cases that can mostly be done on a single node and don't really need the power of distributed engines. Currently iceberg-rust has not implemented writing to table yet. The community focuses on reading support in recent releases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org