mrcnc opened a new issue, #10084: URL: https://github.com/apache/iceberg/issues/10084
### Query engine Using Spark 3.4.0 with Iceberg 1.4.3 ### Question To reproduce this behavior you can start a spark shell configured with 2 catalogs ``` SPARK_VERSION=3.4 ICEBERG_VERSION=1.4.3 $SPARK_HOME/bin/spark-shell \ --packages="org.apache.iceberg:iceberg-spark-runtime-${SPARK_VERSION}_2.12:${ICEBERG_VERSION}" \ --conf spark.driver.host=127.0.0.1 \ --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ --conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog \ --conf spark.sql.catalog.rest=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.rest.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ --conf spark.sql.catalog.rest.uri=http://127.0.0.1:8080/catalog/ \ --conf spark.sql.catalog.rest.warehouse=/tmp/warehouse/rest \ --conf spark.sql.defaultCatalog=rest \ --conf spark.sql.catalog.hadoop=org.apache.iceberg.spark.SparkCatalog \ --conf spark.sql.catalog.hadoop.type=hadoop \ --conf spark.sql.catalog.hadoop.warehouse=/tmp/warehouse/hadoop \ --conf spark.sql.warehouse.dir=/tmp/warehouse/ ``` First you can create a table in the rest catalog with ``` spark.sql("CREATE TABLE rest.test.table1(id bigint, data string)").show(false) ``` And you can see the request body sent to the createTable endpoint will look like this ```json { "name": "table1", "location": null, "schema": { "type": "struct", "schema-id": 0, "fields": [ { "id": 0, "name": "id", "required": false, "type": "long" }, { "id": 1, "name": "data", "required": false, "type": "string" } ] }, "partition-spec": { "spec-id": 0, "fields": [] }, "write-order": null, "properties": { "owner": "your.name" }, "stage-create": false } ``` And if you create the same table in the hadoop catalog with ``` spark.sql("CREATE TABLE haddop.test.table1(id bigint, data string)").show(false) ``` it will write the metadata file `/tmp/warehouse/hadoop/test/table1/metadata/v1.metadata.json` with contents like this ```json { "format-version": 2, "table-uuid": "d1768dd2-cacd-45c0-b6ae-2481292e7682", "location": "/tmp/warehouse/hadoop/test/table1", "last-sequence-number": 0, "last-updated-ms": 1712168876272, "last-column-id": 2, "current-schema-id": 0, "schemas": [ { "type": "struct", "schema-id": 0, "fields": [ { "id": 1, "name": "id", "required": false, "type": "long" }, { "id": 2, "name": "data", "required": false, "type": "string" } ] } ], "default-spec-id": 0, "partition-specs": [ { "spec-id": 0, "fields": [] } ], "last-partition-id": 999, "default-sort-order-id": 0, "sort-orders": [ { "order-id": 0, "fields": [] } ], "properties": { "owner": "your.name", "write.parquet.compression-codec": "zstd" }, "current-snapshot-id": -1, "refs": {}, "snapshots": [], "statistics": [], "snapshot-log": [], "metadata-log": [] } ``` You can see that the field ids in the schema differ between these catalogs (rest starts at 0, hadoop starts at 1). I find it odd that this would be different across catalogs so I'm wondering if this is expected behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org