bmaisonn opened a new issue, #7822:
URL: https://github.com/apache/iceberg/issues/7822

   ### Apache Iceberg version
   
   1.2.1 (latest release)
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   We're testing the CDC feature released in version 1.2.1 (also tested with 
1.3.0) and we noticed a strange behavior:
   
   We performed the following actions
   
   | snapshot | change type                                 |
   |----------|---------------------------------------------|
   | s0       | initial insert of record A                  |
   | s1       | update of record A on col1                  |
   | s2       | ADD column col2 + update record A on col2   |
   | s3       | DROP column col2 + update record A on col1  |
   | s4       | ADD column 2 + update record A on col2  |
   
   Before we drop the col 2 we see col2 in cdc generated between  S2 and (S1 or 
S0).
   After we drop col2, col2 is not visible anymore in cdc whatever the 
snapshots given. For instance cdc between S2 and S1 is empty while it was not 
empty before we drop col2.
   If we create col2 again with new data (i.e s4):
   * CDC between S2 and S1 col2 is visible but the dataframe is empty
   * CDC between S4 and S1 col2 is visible and only the value added in S4 is 
visible
   
   In this example we would expect to see the changes for col2 between S2 and 
S1 even after we drop col2
   
   To reproduce:
   
   * create a table with few columns and insert records
   * add a new column and perform some updates
   * drop that last column
   * check cdc
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to