sgrebnov opened a new pull request, #1721:
URL: https://github.com/apache/iceberg-rust/pull/1721
## Which issue does this PR close?
PR fixes schema mismatch errors (similar to the example shown below) when
using `IcebergCommitExec` with DataFusion. This occurs when `IcebergCommitExec`
is not the top-level plan but is instead wrapped as the input to another plan
node, for example when added by a custom optimization rule.
>An internal error occurred. Internal error: PhysicalOptimizer rule
'OutputRequirements' failed. Schema mismatch. Expected original schema: Schema
{ fields: [Field { name: "count", data_type: UInt64, nullable: false, dict_id:
0, dict_is_ordered: false, metadata: {} }], metadata: {} }, got new schema:
Schema { fields: [Field { name: "r_regionkey", data_type: Int32, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {"PARQUET:field_id": "1"}
}, Field { name: "r_name", data_type: Utf8, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {"PARQUET:field_id": "2"} }, Field { name:
"r_comment", data_type: Utf8, nullable: true, dict_id: 0, dict_is_ordered:
false, metadata: {"PARQUET:field_id": "3"} }], metadata: {} }.
This issue was likely caused by a bug in DataFusion's code. Please help us
to resolve this by filing a bug report in our issue tracker:
https://github.com/apache/datafusion/issues
## What changes are included in this PR?
PR updates `compute_properties` logic to use target (output) schema instead
of input schema. Below is example DataFusion `DataSinkExec` implementation
demonstrating that properties must be created based on target schema, not input.
https://github.com/apache/datafusion/blob/4eacb6046773b759dae0b3d801fe8cb1c6b65c0f/datafusion/datasource/src/sink.rs#L101C1-L117C6
```rust
impl DataSinkExec {
/// Create a plan to write to `sink`
pub fn new(
input: Arc<dyn ExecutionPlan>,
sink: Arc<dyn DataSink>,
sort_order: Option<LexRequirement>,
) -> Self {
let count_schema = make_count_schema();
let cache = Self::create_schema(&input, count_schema);
Self {
input,
sink,
count_schema: make_count_schema(),
sort_order,
cache,
}
}
```
## Are these changes tested?
Tested manually, expanded existing test to verify output schema.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]