adamhonen opened a new issue, #10875:
URL: https://github.com/apache/iceberg/issues/10875

   ### Apache Iceberg version
   
   1.6.0 (latest release)
   
   ### Query engine
   
   None
   
   ### Please describe the bug 🐞
   
   I've reproduced this with the following test added to 
`TestSchemaUpdate.java`:
   
   ```java
     @Test
     public void testAddFieldWithDotInName() {
       Schema expected =
               new Schema(
                       required(1, "id", Types.IntegerType.get()),
                       optional(2, "dot", Types.StructType.of(
                               required(3, "field", Types.IntegerType.get()))),
                       optional(4, "dot.field", Types.IntegerType.get()));
   
       Schema schema = new Schema(
               required(1, "id", Types.IntegerType.get()),
               optional(2, "dot", Types.StructType.of(
                               required(3, "field", Types.IntegerType.get()))));
   
       Schema result =
               new SchemaUpdate(schema, schema.highestFieldId())
                       .addColumn(null,"dot.field", Types.IntegerType.get())
                       .apply();
   
       assertThat(result.asStruct()).isEqualTo(expected.asStruct());
     }
   ```
   
   Seems like there are some internal maps (e.g. `Schema.nameToId`) that use 
plain strings as keys. These strings get built through joining nested fields 
using dots. This makes `field` nested under `dot` and a non-nested field 
`dot.field` result in the same key, causing the above test to fail.
   
   I started looking at what would be needed to resolve this. Ideally the key 
isn't a string, but maybe a list of strings that represents different levels of 
nesting, or something similar. Otherwise, the dots should be escaped and 
unescaped when dealing with fields that actually have a dot in the name.
   
   The main issue with this is that there are already quite a few methods that 
implicitly assume the field name doesn't have a dot inside. Some examples: 
`deleteColumn`, `moveAfter` and basically anything on `UpdateSchema` that 
doesn't accept a parent field (though even then, what if the parent field has a 
dot inside?). So more methods (or another interface with said methods) would 
need to be added in order to support the use case of a dot inside field names, 
in order to not break existing code.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to