ebyhr opened a new pull request, #16829: URL: https://github.com/apache/iceberg/pull/16829
## Summary Add `varchar(N)` and `char(N)` primitive types to Iceberg v4 specification. These types improve compatibility with traditional SQL engines (DB2, Netezza, Oracle, SQL Server) and are already supported by major query engines: - **Spark**: Supports `VarcharType(length)` and `CharType(length)` since 3.1.0 - **Trino**: Supports `varchar(n)` and `char(n)` natively ## Motivation Traditional SQL engines support four string types (`char[N]`, `varchar[N]`, `nchar[N]`, `nvarchar[N]`), while Iceberg only supports unbounded `string` (equivalent to `varchar[unlimited]`). This creates friction when: 1. **SQL compatibility**: Operations like `LIKE` behave differently on `char` (compares with padding) vs `varchar` (no padding). Without native support, users cannot migrate data and preserve query semantics. 2. **Memory optimization**: Long-lived SQL engines assume max string limits for query planning and memory allocation. Without length-bounded types, they must either silently truncate or fail on oversized strings. 3. **Ecosystem gap**: Both Spark and Trino already support these types in their type systems, but cannot expose them in Iceberg tables. ## Changes Added to **Primitive Types** table (v4): - `varchar(N)`: Variable-length UTF-8 strings, max N characters (code points) - `char(N)`: Fixed-length UTF-8 strings, padded with spaces to N characters **Schema Evolution** (v4): - `varchar(N)` → `varchar(N')` if N' > N - `varchar(N)` → `string` - `char(N)` → `char(N')` if N' > N - `char(N)` → `varchar(N')` if N' ≥ N - `char(N)` → `string` **Semantics**: - Length parameter N specifies UTF-8 characters (code points), not bytes - `char(N)` values are right-padded with spaces for comparison operations - Trailing spaces preserved in storage but may need padding during reads for comparison ## Related - Issue: apache/iceberg#10461 - Proposal: https://docs.google.com/document/d/1chIx22dNZcMSsS607F96ARu4m4Yyf1dbsYxQTrfQ-5o/edit?usp=sharing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
