Re: [PR] #10668 - Support case-insensitivity for column names in PartitionSpec [iceberg]

via GitHub Fri, 12 Jul 2024 10:24:45 -0700


dramaticlly commented on PR #10678:
URL: https://github.com/apache/iceberg/pull/10678#issuecomment-2226008338


   > > @sl255051 appreciate you are taking the stub for the PR.
   > > But I am wondering why do you think column name case insensitivity is 
the right behavior when building PartitionSpec? I think in iceberg schema we 
can have both column named `data` and `DATA` with each different field id 
assigned, like below
   > > ```
   > > table {
   > >   1: id: required int
   > >   2: data: required string
   > >   3: DATA: required string
   > > }
   > > ```
   > > 
   > > 
   > >     
   > >       
   > >     
   > > 
   > >       
   > >     
   > > 
   > >     
   > >   
   > > Would this change introduce additional ambiguity when resolve a column 
name in a case insensitive way?
   > 
   > Thanks for taking the time to review my PR. I did notice that the Schema 
object uses a simple Map<String, Integer> for column names which means the 
schema is case sensitive. But I wonder if that is a bug too. I believe 
partition columns should be case-insensitive based on this issue #83. That 
issue says to make Iceberg case-insensitive. I can see lots of work was done to 
enable case-insensitivity in Iceberg. Several objects even have multiple 
methods to enable case-insensitivity. Take the Schema object as an example. If 
case-insensitivity is not a feature of Iceberg why would that class have both 
methods, `findField` and `caseInsensitiveFindField`?
   > 
   > In summary, I believe case-insensitivity is the correct path forward. I 
can accept that I may not have implemented in the best way. If that is the case 
I would appreciate some pointers on how best to implement case-insensitivity.
   
   I am not fully aware of the current status of case sensitivity support in 
iceberg as it's not documented in the spec, maybe we can ask if any of the 
experts want to chime in  @rdblue or @RussellSpitzer
   
   But as you mentioned if current schema supports case sensitivity, I dont 
think it's correct to build partition spec when finding column by name in a 
case insensitive manner, as it introduce additional ambiguity per my example 
illustrated above.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] #10668 - Support case-insensitivity for column names in PartitionSpec [iceberg]

Reply via email to