kevinjqliu commented on code in PR #11845: URL: https://github.com/apache/iceberg/pull/11845#discussion_r1895040536
########## docs/docs/spark-getting-started.md: ########## Review Comment: note, there are two "getting started" docs this one and `site/docs/spark-quickstart.md` ########## site/docs/spark-quickstart.md: ########## @@ -26,7 +26,11 @@ highlight some powerful features. You can learn more about Iceberg's Spark runti - [Writing Data to a Table](#writing-data-to-a-table) - [Reading Data from a Table](#reading-data-from-a-table) - [Adding A Catalog](#adding-a-catalog) -- [Next Steps](#next-steps) + - [Configuring JDBC Catalog](#configuring-jdbc-catalog) + - [Configuring REST Catalog](#configuring-rest-catalog) +- [Next steps](#next-steps) + - [Adding Iceberg to Spark](#adding-iceberg-to-spark) + - [Learn More](#learn-more) Review Comment: renders the subsection correctly  ########## site/docs/spark-quickstart.md: ########## @@ -269,42 +273,104 @@ To read a table, simply use the Iceberg table's name. ### Adding A Catalog -Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. -Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, -we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out Review Comment: weird that the guide already mention JDBC here, but the example is still hadoop ########## site/docs/spark-quickstart.md: ########## @@ -267,44 +271,109 @@ To read a table, simply use the Iceberg table's name. df = spark.table("demo.nyc.taxis").show() ``` -### Adding A Catalog +### Adding catalogs -Iceberg has several catalog back-ends that can be used to track tables, like JDBC, Hive MetaStore and Glue. -Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. In this guide, -we use JDBC, but you can follow these instructions to configure other catalog types. To learn more, check out -the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. +Apache Iceberg provides several catalog implementations to manage tables and enable SQL operations. +Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. +You can configure different catalog types, such as JDBC, Hive Metastore, Glue, and REST, to manage Iceberg tables in Spark. -This configuration creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. +This guide covers the configuration of two popular catalog types: + +* JDBC Catalog +* REST Catalog + +To learn more, check out the [Catalog](docs/latest/spark-configuration.md#catalogs) page in the Spark section. + +#### Configuring JDBC Catalog + +The JDBC catalog stores Iceberg table metadata in a relational database. + +This configuration creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. + +The JDBC catalog uses file-based SQLite database as the backend. === "CLI" ```sh - spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ + spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.spark_catalog.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.local.type=hadoop \ + --conf spark.sql.catalog.local.type=jdbc \ + --conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ --conf spark.sql.defaultCatalog=local ``` === "spark-defaults.conf" ```sh - spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }} + spark.jars.packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type hive spark.sql.catalog.local org.apache.iceberg.spark.SparkCatalog - spark.sql.catalog.local.type hadoop - spark.sql.catalog.local.warehouse $PWD/warehouse Review Comment: `$PWD` does not expand in `spark-defaults.conf`. keeping this here will create a folder named `$PWD` ########## docs/docs/spark-getting-started.md: ########## @@ -41,20 +41,26 @@ spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ iceb ### Adding catalogs -Iceberg comes with [catalogs](spark-configuration.md#catalogs) that enable SQL commands to manage tables and load them by name. Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. +Apache Iceberg provides several [catalog](spark-configuration.md#catalogs) implementations to manage tables and enable SQL operations. +Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. -This command creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog: +This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. + +The JDBC catalog uses file-based SQLite database as the backend. ```sh -spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ +spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.spark_catalog.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.local.type=hadoop \ + --conf spark.sql.catalog.local.type=jdbc \ + --conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ --conf spark.sql.catalog.local.warehouse=$PWD/warehouse ``` +For example configuring a REST-based catalog, see [Configuring REST Catalog](/spark-quickstart#configuring-rest-catalog) Review Comment: instead of repeating here for configuring REST catalog, just link to `site/docs/spark-quickstart.md`. I double checked the link here locally ########## docs/docs/spark-getting-started.md: ########## @@ -41,20 +41,27 @@ spark-shell --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ iceb ### Adding catalogs -Iceberg comes with [catalogs](spark-configuration.md#catalogs) that enable SQL commands to manage tables and load them by name. Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. +Apache Iceberg provides several [catalog](spark-configuration.md#catalogs) implementations to manage tables and enable SQL operations. +Catalogs are configured using properties under `spark.sql.catalog.(catalog_name)`. -This command creates a path-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog: +This command creates a JDBC-based catalog named `local` for tables under `$PWD/warehouse` and adds support for Iceberg tables to Spark's built-in catalog. + +The JDBC catalog uses file-based SQLite database as the backend. ```sh -spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }}\ +spark-sql --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:{{ icebergVersion }},org.xerial:sqlite-jdbc:3.46.1.3 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.spark_catalog.type=hive \ --conf spark.sql.catalog.local=org.apache.iceberg.spark.SparkCatalog \ - --conf spark.sql.catalog.local.type=hadoop \ - --conf spark.sql.catalog.local.warehouse=$PWD/warehouse + --conf spark.sql.catalog.local.type=jdbc \ + --conf spark.sql.catalog.local.uri=jdbc:sqlite:$PWD/iceberg_catalog_db.sqlite \ + --conf spark.sql.catalog.local.warehouse=$PWD/warehouse \ + --conf spark.sql.defaultCatalog=local Review Comment: add `defaultCatalog` to match other pages -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org