amogh-jahagirdar commented on code in PR #10352: URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273
########## core/src/main/java/org/apache/iceberg/SchemaUpdate.java: ########## @@ -533,6 +537,34 @@ private static Schema applyChanges( } } + Map<Integer, List<Integer>> specToDeletes = Maps.newHashMap(); Review Comment: I think that's a better option @advancedxy , I was having a slack discussion with someone on removing unused partition specs and unused schemas and thinking the same thing about this PR. I think there's a legitimate need for those operations and we could leverage that need here. With this approach, this means to drop a column which used to be referenced in a partition spec a user must first call `RemovedUnusedPartitionSpec` to remove any partition specs (this procedure would have to go through the manifests) and only then can they perform the column drop. This seems better because then we don't have to have potential manifest reads on the schema evolution path which I can see being problematic for folks. One aspect is on ordering of these. There are two ways of going about this: 1.) Prevent dropping of columns which are part of the specs in the metadata and get in the RemovedUnusedPartitionSpec procedure after that. The downside of this approach is that there are cases where a user could not drop a partition column even though they should be able to. Here's a trivial case: Imagine a user just creates the table and they don't write any data. Then they realize they want to drop a partition column, but the procedure will fail unexpectedly whereas before it would work. The benefit of this approach though is we will prevent users from shooting themselves in the foot generally as seen by the reported issues for this. 2.) First get in the RemovedUnusedPartitionSpec procedure and then prevent dropping if it's part of a spec. The downside of this is, it may take some more time in to get the whole API in? Maybe not much more time since it looks mostly there, but I'd have to check. Another downside is until the procedure is in, users may end up in bad states. The benefit of this approach is that users have a way of dropping historical partition columns that are unreachable upfront. Right now in my mind approach 1 seems better even though there are some behavior changes, it seems net better for users. cc @Fokko @RussellSpitzer @aokolnychyi @rdblue @nastra in case they had any comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org