Fokko commented on code in PR #9935: URL: https://github.com/apache/iceberg/pull/9935#discussion_r1521056978
########## 1.5.0/docs/spark-getting-started.md: ########## @@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots; +-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+ ``` -[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and can now reference tables by name using `spark.table`: +[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported and can now reference tables by name using `spark.table`: ```scala val df = spark.table("local.db.table") df.count() ``` +### Type compatibility + +Spark and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, +so you may want to understand the type conversion in Iceberg in prior to design the types of columns in your tables. + +#### Spark type to Iceberg type + +This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark. + +| Spark | Iceberg | Notes | +|-----------------|----------------------------|-------| +| boolean | boolean | | +| short | integer | | +| byte | integer | | +| integer | integer | | +| long | long | | +| float | float | | +| double | double | | +| date | date | | +| timestamp | timestamp with timezone | | +| timestamp_ntz | timestamp without timezone | | Review Comment: OCD ```suggestion | timestamp_ntz | timestamp without timezone | | ``` ########## 1.5.0/docs/spark-getting-started.md: ########## @@ -123,18 +123,78 @@ SELECT * FROM local.db.table.snapshots; +-------------------------+----------------+-----------+-----------+----------------------------------------------------+-----+ ``` -[DataFrame reads](spark-queries.md#querying-with-dataframes) are supported and can now reference tables by name using `spark.table`: +[DataFrame reads](../spark-queries.md#querying-with-dataframes) are supported and can now reference tables by name using `spark.table`: ```scala val df = spark.table("local.db.table") df.count() ``` +### Type compatibility + +Spark and Iceberg support different set of types. Iceberg does the type conversion automatically, but not for all combinations, +so you may want to understand the type conversion in Iceberg in prior to design the types of columns in your tables. + +#### Spark type to Iceberg type + +This type conversion table describes how Spark types are converted to the Iceberg types. The conversion applies on both creating Iceberg table and writing to Iceberg table via Spark. + +| Spark | Iceberg | Notes | +|-----------------|----------------------------|-------| +| boolean | boolean | | +| short | integer | | +| byte | integer | | +| integer | integer | | +| long | long | | +| float | float | | +| double | double | | +| date | date | | +| timestamp | timestamp with timezone | | +| timestamp_ntz | timestamp without timezone | | +| char | string | | +| varchar | string | | +| string | string | | +| binary | binary | | +| decimal | decimal | | +| struct | struct | | +| array | list | | +| map | map | | + +!!! info + The table is based on representing conversion during creating table. In fact, broader supports are applied on write. Here're some points on write: + + * Iceberg numeric types (`integer`, `long`, `float`, `double`, `decimal`) support promotion during writes. e.g. You can write Spark types `short`, `byte`, `integer`, `long` to Iceberg type `long`. + * You can write to Iceberg `fixed` type using Spark `binary` type. Note that assertion on the length will be performed. + +#### Iceberg type to Spark type + +This type conversion table describes how Iceberg types are converted to the Spark types. The conversion applies on reading from Iceberg table via Spark. + +| Iceberg | Spark | Note | +|----------------------------|-------------------------|---------------| +| boolean | boolean | | +| integer | integer | | +| long | long | | +| float | float | | +| double | double | | +| date | date | | +| time | | Not supported | +| timestamp with timezone | timestamp | | +| timestamp without timezone | timestamp_ntz | | Review Comment: ```suggestion | timestamp without timezone | timestamp_ntz | | ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org