[GitHub] [iceberg] bmaisonn commented on issue #7822: CDC data inconsistencies with schema changes

via GitHub Wed, 12 Jul 2023 03:23:10 -0700


bmaisonn commented on issue #7822:
URL: https://github.com/apache/iceberg/issues/7822#issuecomment-1632242215


   @aokolnychyi the behavior mentioned above makes sense to me. In this example 
we didn't necessarily expect that the two columns with the same name to line up 
in the changelog. We did that when we noticed that the column disappeared from 
the changelog after we dropped it to try to figure out if it would come back 
and how.
   
   The use case here is that we want to keep track of the changes made by the 
users on those tables for quality purpose. Basically we want to be able to 
easily answer the question "What happened to that datapoint during the last 
week" and see the modifications made for that period of time. In that context 
we'd like to see the records as of that modification.
   
   In that case i think it would be acceptable to see two columns with the same 
name in the changelog or to return the column ids and let us resolve them 
afterward. Still it would be difficult to make the difference between "this 
column was there with null values" vs " this column wasn't there at that time" 
but we could display the schema changes separately.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] bmaisonn commented on issue #7822: CDC data inconsistencies with schema changes

Reply via email to