voducdan opened a new issue, #13370: URL: https://github.com/apache/iceberg/issues/13370
### Query engine Spark ### Question I'm using spark to read data from GCS into DataFrame and write to Iceberg table. My table's DDL looks like this ``` CREATE TABLE catalog.database.table( row ARRAY<STRUCT<granularity: ARRAY<STRUCT<avgCPT: STRUCT<amount: STRING, currency: STRING>>>, metadata: STRUCT<campaignId: BIGINT, orgId: BIGINT>>>, etl_date STRING) USING iceberg PARTITIONED BY (etl_date) LOCATION 'gs://bucket/database/table' TBLPROPERTIES ( 'compression' = 'snappy', 'format' = 'iceberg/parquet', 'format-version' = '2', 'mergeSchema' = 'true', 'write.distribution-mode' = 'none', 'write.format.default' = 'parquet', 'write.location-provider.impl' = 'com.idz.iceberg.FileOnlyLocationProvider', 'write.object-storage.partitioned-paths' = 'false', 'write.parquet.compression-codec' = 'zstd', 'write.spark.accept-any-schema' = 'true') ``` Later, data sources in GCS have changed their structure, with schema looks like ``` root |-- row: array (nullable = true) | |-- element: struct (containsNull = true) | | |-- granularity: array (nullable = true) | | | |-- element: struct (containsNull = true) | | | | |-- avgCPT: struct (nullable = true) | | | | | |-- amount: string (nullable = true) | | | | | |-- currency: string (nullable = true) | | | | |-- date: string (nullable = true) | | | | |-- impressions: long (nullable = true) | | | | |-- localSpend: struct (nullable = true) | | | | | |-- amount: string (nullable = true) | | | | | |-- currency: string (nullable = true) | | | | |-- tapInstallCPI: struct (nullable = true) | | | | | |-- amount: string (nullable = true) | | | | | |-- currency: string (nullable = true) | | | | |-- tapInstallRate: double (nullable = true) | | | | |-- tapInstalls: long (nullable = true) | | | | |-- tapNewDownloads: long (nullable = true) | | | | |-- tapRedownloads: long (nullable = true) | | | | |-- taps: long (nullable = true) | | | | |-- totalAvgCPI: struct (nullable = true) | | | | | |-- amount: string (nullable = true) | | | | | |-- currency: string (nullable = true) | | | | |-- totalInstallRate: double (nullable = true) | | | | |-- totalInstalls: long (nullable = true) | | | | |-- totalNewDownloads: long (nullable = true) | | | | |-- totalRedownloads: long (nullable = true) | | | | |-- ttr: double (nullable = true) | | | | |-- viewInstalls: long (nullable = true) | | | | |-- viewNewDownloads: long (nullable = true) | | | | |-- viewRedownloads: long (nullable = true) | | |-- insights: struct (nullable = true) | | | |-- bidRecommendation: struct (nullable = true) | | | | |-- suggestedBidAmount: struct (nullable = true) | | | | | |-- amount: string (nullable = true) | | | | | |-- currency: string (nullable = true) | | |-- metadata: struct (nullable = true) | | | |-- adGroupDeleted: boolean (nullable = true) | | | |-- adGroupId: long (nullable = true) | | | |-- adGroupName: string (nullable = true) | | | |-- bidAmount: struct (nullable = true) | | | | |-- amount: string (nullable = true) | | | | |-- currency: string (nullable = true) | | | |-- campaignId: long (nullable = true) | | | |-- countryOrRegion: string (nullable = true) | | | |-- deleted: boolean (nullable = true) | | | |-- keyword: string (nullable = true) | | | |-- keywordDisplayStatus: string (nullable = true) | | | |-- keywordId: long (nullable = true) | | | |-- keywordStatus: string (nullable = true) | | | |-- matchType: string (nullable = true) | | | |-- modificationTime: string (nullable = true) | | | |-- orgId: long (nullable = true) | | |-- other: boolean (nullable = true) ``` At that time, when I run `df.writeTo(output_path).overwritePartitions()` I got error `pyspark.errors.exceptions.captured.IllegalArgumentException: Field date not found in source schema` What am I missing here? Is this a known bug? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org