ammarchalifah opened a new pull request, #16424: URL: https://github.com/apache/iceberg/pull/16424
### Problem When a table is partitioned by `bucket(N, string_column)`, the bucket transform produces an `Integer` partition value. During Storage Partitioned Joins (SPJ), Spark reads partition values through `StructInternalRow`, which calls `struct.get(ordinal, CharSequence.class)` in `getUTF8StringInternal()`. This assumes the value is always a `CharSequence`, causing a `ClassCastException`: ``` IllegalArgumentException: Wrong class, expected java.lang.CharSequence, but was java.lang.Integer, for object: 1 ``` This affects any SPJ query (e.g. `MERGE INTO` or `JOIN`) on tables partitioned with `bucket(N, string_column)`. ### Fix Changed `getUTF8StringInternal()` to use `struct.get(ordinal, Object.class)` instead of `struct.get(ordinal, CharSequence.class)`, then call `value.toString()`. This follows the same pattern already used by `getBinaryInternal()` in the same class, which uses `Object.class` to handle multiple possible runtime types. The fix is applied to all Spark versions: 3.4, 3.5, 4.0, and 4.1. ### Testing - Added `testJoinsWithBucketingOnStringColumn` using the existing `checkJoin` helper to cover bucket-only partitioning on string columns. - Added `testJoinsWithIdentityAndBucketOnStringColumn` as a targeted regression test for the exact scenario from the issue: identity + bucket partitioning on a string column with an SPJ join. Both tests are added consistently across all 4 Spark versions. ### Notes AI tools were used to assist with drafting this change. I have reviewed and validated the logic, tests, and code style end-to-end. Closes #15349 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
