mattmartin14 commented on PR #1665: URL: https://github.com/apache/iceberg-python/pull/1665#issuecomment-2661415391
> @kevinjqliu Yes, that is an issue, but we don't respect this for any of the operations (`append`, etc). Doing this would make the operations expensive so we could leave this up to the user. Two more opinionated approaches are: > > > > - Don't allow `join_cols` if the table has identifier fields. > > - Remove the `join_cols` column. > > > > I think it would be nice to push Iceberg-specific features like the identifier fields, but I think the above might be _too_ opinionated. Would love to hear what others think. > > > > @ananthdurai Kevin already provided an excellent answer, If you want to learn more, I would recommend reading the docs on [hidden partition pruning](https://iceberg.apache.org/docs/nightly/partitioning/). @Fokko , I honestly didn't even know about the iceberg specific identifier fields until you had recently mentioned it. I can't imagine many have. I see situations where teams have already built a ton of iceberg tables and it would be easier and more explicit for the user to understand if join_cols is an option they can call out. Otherwise, for users that do not know the internal schema of the table and see the code for the first time with no join_cols specified, they will probably be puzzled and wonder "how is this thing doing this correctly?" I'd personally leave the join_cols as an optional way for users to use upsert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org