Fokko commented on code in PR #11845: URL: https://github.com/apache/iceberg/pull/11845#discussion_r1913038717
########## site/docs/spark-quickstart.md: ########## @@ -267,44 +271,109 @@ To read a table, simply use the Iceberg table's name. df = spark.table("demo.nyc.taxis").show() ``` -### Adding A Catalog +### Adding catalogs -Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. -Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, -we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out -the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. +Apache Iceberg provides several catalog implementations to manage tables and enable SQL operations. +Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. +You can configure different catalog types, such as JDBC, Hive Metastore, Glue, and REST, to manage Iceberg tables in Spark. -This configuration creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. +This guide covers the configuration of two popular catalog types: + +* JDBC Catalog +* REST Catalog + +To learn more, check out the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. + +#### Configuring JDBC Catalog + +The JDBC catalog stores Iceberg table metadata in a relational database. + +This configuration creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. + +The JDBC catalog uses file-based SQLite database as the backend. === "CLI" ```sh - spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ + spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.spark_catalog.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.local.type=hadoop \ + --conf spark.sql.catalog.local.type=jdbc \ + --conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ --conf spark.sql.defaultCatalog=local ``` === "spark-defaults.conf" ```sh - spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }} + spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type hive spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog - spark.sql.catalog.local.type hadoop - spark.sql.catalog.local.warehouse $PWD/warehouse + spark.sql.catalog.local.type jdbc + spark.sql.catalog.local.uri jdbc:sqlite:iceberg_catalog_db.sqlite + spark.sql.catalog.local.warehouse warehouse spark.sql.defaultCatalog local ``` !!! note If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE local;` +#### Configuring REST Catalog + +The REST catalog provides a language-agnostic way to manage Iceberg tables through a RESTful service. + +This configuration creates a REST-based catalog named `rest` for tables under `s3://warehouse/` and adds support for Iceberg tables to Spark's built-in catalog. + +The REST catalog uses the `apache/iceberg-rest-fixture` docker container from the `docker-compose.yml` above as the backend service with MinIO for S3-compatible storage. + +=== "CLI" + + ```sh + spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:1.7.1 \ + --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ + --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ + --conf spark.sql.catalog.spark_catalog.type=hive \ + --conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \ + --conf spark.sql.catalog.rest.type=rest \ + --conf spark.sql.catalog.rest.uri=http://localhost:8181 \ + --conf spark.sql.catalog.rest.warehouse=s3://warehouse/ \ + --conf spark.sql.catalog.rest.io-impl=org.apache.iceberg.aws.s3.S3FileIO \ + --conf spark.sql.catalog.rest.s3.endpoint=http://localhost:9000 \ + --conf spark.sql.catalog.rest.s3.path-style-access=true \ + --conf spark.sql.catalog.rest.s3.access-key-id=admin \ + --conf spark.sql.catalog.rest.s3.secret-access-key=password \ + --conf spark.sql.catalog.rest.client.region=us-east-1 \ + --conf spark.sql.defaultCatalog=rest + ``` + +=== "spark-defaults.conf" + + ```sh + spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.apache.iceberg:iceberg-aws-bundle:1.7.1 + spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions + spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog + spark.sql.catalog.spark_catalog.type hive + spark.sql.catalog.rest org.apache.iceberg.spark.SparkCatalog + spark.sql.catalog.rest.type rest + spark.sql.catalog.rest.uri http://localhost:8181 + spark.sql.catalog.rest.warehouse s3://warehouse/ + spark.sql.catalog.rest.io-impl org.apache.iceberg.aws.s3.S3FileIO + spark.sql.catalog.rest.s3.endpoint http://localhost:9000 + spark.sql.catalog.rest.s3.path-style-access true + spark.sql.catalog.rest.s3.access-key-id admin + spark.sql.catalog.rest.s3.secret-access-key password + spark.sql.catalog.rest.client.region us-east-1 + spark.sql.defaultCatalog rest + ``` + +!!! note + If your Iceberg catalog is not set as the default catalog, you will have to switch to it by executing `USE rest;` Review Comment: ```suggestion If your Iceberg catalog is not set as the default catalog using `spark.sql.defaultCatalog`, you will have to switch to it by executing `USE rest;` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org