[I] Catching error Field xxx not found in source schema when merging schema for struct column [iceberg]

via GitHub Mon, 23 Jun 2025 03:47:29 -0700


voducdan opened a new issue, #13370:
URL: https://github.com/apache/iceberg/issues/13370


   ### Query engine
   
   Spark
   
   ### Question
   
   I'm using spark to read data from GCS into DataFrame and write to Iceberg 
table. My table's DDL looks like this
   
   ```
   CREATE TABLE catalog.database.table( 
     row ARRAY<STRUCT<granularity: ARRAY<STRUCT<avgCPT: STRUCT<amount: STRING, 
currency: STRING>>>, metadata: STRUCT<campaignId: BIGINT, orgId: BIGINT>>>,
     etl_date STRING)
   USING iceberg
   PARTITIONED BY (etl_date)
   LOCATION 'gs://bucket/database/table'
   TBLPROPERTIES (
     'compression' = 'snappy',
     'format' = 'iceberg/parquet',
     'format-version' = '2',
     'mergeSchema' = 'true',
     'write.distribution-mode' = 'none',
     'write.format.default' = 'parquet',
     'write.location-provider.impl' = 
'com.idz.iceberg.FileOnlyLocationProvider',
     'write.object-storage.partitioned-paths' = 'false',
     'write.parquet.compression-codec' = 'zstd',
     'write.spark.accept-any-schema' = 'true')
   ```
   
   Later, data sources in GCS have changed their structure, with schema looks 
like
   ```
   root
    |-- row: array (nullable = true)
    |    |-- element: struct (containsNull = true)
    |    |    |-- granularity: array (nullable = true)
    |    |    |    |-- element: struct (containsNull = true)
    |    |    |    |    |-- avgCPT: struct (nullable = true)
    |    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |    |    |-- date: string (nullable = true)
    |    |    |    |    |-- impressions: long (nullable = true)
    |    |    |    |    |-- localSpend: struct (nullable = true)
    |    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |    |    |-- tapInstallCPI: struct (nullable = true)
    |    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |    |    |-- tapInstallRate: double (nullable = true)
    |    |    |    |    |-- tapInstalls: long (nullable = true)
    |    |    |    |    |-- tapNewDownloads: long (nullable = true)
    |    |    |    |    |-- tapRedownloads: long (nullable = true)
    |    |    |    |    |-- taps: long (nullable = true)
    |    |    |    |    |-- totalAvgCPI: struct (nullable = true)
    |    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |    |    |-- totalInstallRate: double (nullable = true)
    |    |    |    |    |-- totalInstalls: long (nullable = true)
    |    |    |    |    |-- totalNewDownloads: long (nullable = true)
    |    |    |    |    |-- totalRedownloads: long (nullable = true)
    |    |    |    |    |-- ttr: double (nullable = true)
    |    |    |    |    |-- viewInstalls: long (nullable = true)
    |    |    |    |    |-- viewNewDownloads: long (nullable = true)
    |    |    |    |    |-- viewRedownloads: long (nullable = true)
    |    |    |-- insights: struct (nullable = true)
    |    |    |    |-- bidRecommendation: struct (nullable = true)
    |    |    |    |    |-- suggestedBidAmount: struct (nullable = true)
    |    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |-- metadata: struct (nullable = true)
    |    |    |    |-- adGroupDeleted: boolean (nullable = true)
    |    |    |    |-- adGroupId: long (nullable = true)
    |    |    |    |-- adGroupName: string (nullable = true)
    |    |    |    |-- bidAmount: struct (nullable = true)
    |    |    |    |    |-- amount: string (nullable = true)
    |    |    |    |    |-- currency: string (nullable = true)
    |    |    |    |-- campaignId: long (nullable = true)
    |    |    |    |-- countryOrRegion: string (nullable = true)
    |    |    |    |-- deleted: boolean (nullable = true)
    |    |    |    |-- keyword: string (nullable = true)
    |    |    |    |-- keywordDisplayStatus: string (nullable = true)
    |    |    |    |-- keywordId: long (nullable = true)
    |    |    |    |-- keywordStatus: string (nullable = true)
    |    |    |    |-- matchType: string (nullable = true)
    |    |    |    |-- modificationTime: string (nullable = true)
    |    |    |    |-- orgId: long (nullable = true)
    |    |    |-- other: boolean (nullable = true)
    ```
   
   At that time, when I run `df.writeTo(output_path).overwritePartitions()` I 
got error `pyspark.errors.exceptions.captured.IllegalArgumentException: Field 
date not found in source schema`
   
   What am I missing here? Is this a known bug?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

[I] Catching error Field xxx not found in source schema when merging schema for struct column [iceberg]

Reply via email to