sourajitsaha17 commented on issue #14093:
URL: https://github.com/apache/iceberg/issues/14093#issuecomment-3308781842
I can reproduce it using our application's functional tests. With unit tests
using the 'TestTables' implementation, I couldn't reproduce it yet.
I try to describe the test scenario in more details:
The test ingests data which introduces new columns. For each new batch:
1. The schema is updated to include any missing columns.
2. The data file is committed asynchronously in an executor thread.
Since the data arrives much faster than the commit operation, the table
schema is already updated before the first datafile is committed.
Example:
Initial schema:
```
0 = table {
1: id: required int
2: data: optional string
}
```
First data arrives and evolves the schema to:
```
1 = table {
1: id: required int
2: data: optional string
3: column1: optional string
}
```
Then the data file is staged and commit is added to a executor thread.
Before that commit finishes, new data arrives and schema evolves again:
```
2 = table {
1: id: required int
2: data: optional string
3: column1: optional string
4: column2: optional string
}
```
When the first commit (from step 1) finally completes, the table schema is
reverted to:
```
3 = table {
1: id: required int
2: data: optional string
3: column1: optional string
}
```
At this point column2 is lost from the latest schema.
I observed that, while adding the snapshot, the schema is not validated as
an update requirement.
In this case, I was expecting a CommitFailedException
and then a retry with the updated metadata.
It would be really helpful if you could suggest something in this regard.
And let me know if you need any further details or would like to see some debug
logs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]