syun64 commented on issue #278: URL: https://github.com/apache/iceberg-python/issues/278#issuecomment-1900574539
That sounds good @Fokko I think having a _CreateMappingFromPyArrowSchma preorder visitor does a good job of separating out the two concerns above. > I think the outcome will be the same as the pre-order visitor, but we don’t do it by position, but by name. I think this bit about not doing it by position is catching me a bit off guard because I’m not convinced that we can assign ids without relying on the position when generating the name mapping. Just to make sure we are on the same page, this new Visitor will: 1. Map field_ids from PyArrow Schema if the field_id exists 2. Have a Boolean flag to _assign_fresh_ids by ignoring existing field_ids (or an automatic fallback to assign ids if field_ids don’t exist) and assign field ids **by position** And then, we will use the name mapping generated from the pyarrow schema to assign field ids **by name** and create a new Iceberg Schema. Does that approach sound consistent with your current thought? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org