bmaisonn commented on issue #7822: URL: https://github.com/apache/iceberg/issues/7822#issuecomment-1632242215
@aokolnychyi the behavior mentioned above makes sense to me. In this example we didn't necessarily expect that the two columns with the same name to line up in the changelog. We did that when we noticed that the column disappeared from the changelog after we dropped it to try to figure out if it would come back and how. The use case here is that we want to keep track of the changes made by the users on those tables for quality purpose. Basically we want to be able to easily answer the question "What happened to that datapoint during the last week" and see the modifications made for that period of time. In that context we'd like to see the records as of that modification. In that case i think it would be acceptable to see two columns with the same name in the changelog or to return the column ids and let us resolve them afterward. Still it would be difficult to make the difference between "this column was there with null values" vs " this column wasn't there at that time" but we could display the schema changes separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
