amogh-jahagirdar commented on code in PR #10352:
URL: https://github.com/apache/iceberg/pull/10352#discussion_r1676213273


##########
core/src/main/java/org/apache/iceberg/SchemaUpdate.java:
##########
@@ -533,6 +537,34 @@ private static Schema applyChanges(
       }
     }
 
+    Map<Integer, List<Integer>> specToDeletes = Maps.newHashMap();

Review Comment:
   I think that's a better option @advancedxy , I was having a slack discussion 
with someone on removing unused partition specs and unused schemas and thinking 
the same thing about this PR. I think there's a legitimate need for those 
operations and we could leverage that need here.
   
   With this approach, this means to drop a column which used to be referenced 
in a partition spec a user must first call `RemovedUnusedPartitionSpec` to 
remove any partition specs (this procedure would have to go through the 
manifests) and only then can they perform the column drop. This seems better 
because then we don't have to have potential manifest reads on the schema 
evolution path which I can see being problematic for folks.
   
   One aspect is on ordering of these. There are two ways of going about this:
   
   1.) Prevent dropping of columns which are part of the specs in the metadata 
and get in the RemovedUnusedPartitionSpec procedure after that. The downside of 
this approach is that there are cases where a user could not drop a partition 
column even though they should be able to. Here's a trivial case: Imagine a 
user just creates the table and they don't write any data. Then they realize 
they want to drop a partition column, but the procedure will fail unexpectedly 
whereas before it would work. The benefit of this approach though is we will 
prevent users from shooting themselves in the foot generally as seen by the 
reported issues for this.
   
   2.) First get in the RemovedUnusedPartitionSpec procedure and then prevent 
dropping if it's part of a spec. The downside of this is, it may take some more 
time in to get the whole API in? Maybe not much more time since it looks mostly 
there, but I'd have to check. Another downside is until the procedure is in, 
users may end up in bad states. The benefit of this approach is that users have 
a way of dropping historical partition columns that are unreachable upfront. 
   
   Right now in my mind approach 1 seems better even though there are some 
behavior changes, it seems net better for users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to