dramaticlly commented on PR #10678:
URL: https://github.com/apache/iceberg/pull/10678#issuecomment-2226008338
> > @sl255051 appreciate you are taking the stub for the PR.
> > But I am wondering why do you think column name case insensitivity is
the right behavior when building PartitionSpec? I think in iceberg schema we
can have both column named `data` and `DATA` with each different field id
assigned, like below
> > ```
> > table {
> > 1: id: required int
> > 2: data: required string
> > 3: DATA: required string
> > }
> > ```
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Would this change introduce additional ambiguity when resolve a column
name in a case insensitive way?
>
> Thanks for taking the time to review my PR. I did notice that the Schema
object uses a simple Map<String, Integer> for column names which means the
schema is case sensitive. But I wonder if that is a bug too. I believe
partition columns should be case-insensitive based on this issue #83. That
issue says to make Iceberg case-insensitive. I can see lots of work was done to
enable case-insensitivity in Iceberg. Several objects even have multiple
methods to enable case-insensitivity. Take the Schema object as an example. If
case-insensitivity is not a feature of Iceberg why would that class have both
methods, `findField` and `caseInsensitiveFindField`?
>
> In summary, I believe case-insensitivity is the correct path forward. I
can accept that I may not have implemented in the best way. If that is the case
I would appreciate some pointers on how best to implement case-insensitivity.
I am not fully aware of the current status of case sensitivity support in
iceberg as it's not documented in the spec, maybe we can ask if any of the
experts want to chime in @rdblue or @RussellSpitzer
But as you mentioned if current schema supports case sensitivity, I dont
think it's correct to build partition spec when finding column by name in a
case insensitive manner, as it introduce additional ambiguity per my example
illustrated above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]