ebyhr opened a new pull request, #16829:
URL: https://github.com/apache/iceberg/pull/16829

   ## Summary
   
   Add `varchar(N)` and `char(N)` primitive types to Iceberg v4 specification.
   
   These types improve compatibility with traditional SQL engines (DB2, 
Netezza, Oracle, SQL Server) and are already supported by major query engines:
   - **Spark**: Supports `VarcharType(length)` and `CharType(length)` since 
3.1.0
   - **Trino**: Supports `varchar(n)` and `char(n)` natively
   
   ## Motivation
   
   Traditional SQL engines support four string types (`char[N]`, `varchar[N]`, 
`nchar[N]`, `nvarchar[N]`), while Iceberg only supports unbounded `string` 
(equivalent to `varchar[unlimited]`). This creates friction when:
   
   1. **SQL compatibility**: Operations like `LIKE` behave differently on 
`char` (compares with padding) vs `varchar` (no padding). Without native 
support, users cannot migrate data and preserve query semantics.
   
   2. **Memory optimization**: Long-lived SQL engines assume max string limits 
for query planning and memory allocation. Without length-bounded types, they 
must either silently truncate or fail on oversized strings.
   
   3. **Ecosystem gap**: Both Spark and Trino already support these types in 
their type systems, but cannot expose them in Iceberg tables.
   
   ## Changes
   
   Added to **Primitive Types** table (v4):
   - `varchar(N)`: Variable-length UTF-8 strings, max N characters (code points)
   - `char(N)`: Fixed-length UTF-8 strings, padded with spaces to N characters
   
   **Schema Evolution** (v4):
   - `varchar(N)` → `varchar(N')` if N' > N
   - `varchar(N)` → `string`
   - `char(N)` → `char(N')` if N' > N
   - `char(N)` → `varchar(N')` if N' ≥ N
   - `char(N)` → `string`
   
   **Semantics**:
   - Length parameter N specifies UTF-8 characters (code points), not bytes
   - `char(N)` values are right-padded with spaces for comparison operations
   - Trailing spaces preserved in storage but may need padding during reads for 
comparison
   
   ## Related
   
   - Issue: apache/iceberg#10461
   - Proposal: 
https://docs.google.com/document/d/1chIx22dNZcMSsS607F96ARu4m4Yyf1dbsYxQTrfQ-5o/edit?usp=sharing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to