ion-elgreco opened a new issue, #14218:
URL: https://github.com/apache/datafusion/issues/14218

   ### Describe the bug
   
   I am rewriting our CDF operation in delta-rs, the code looks roughtly like 
this:
   
   ```rust
       let mut projected = if should_cdc {
           operation_count
               .clone()
               .with_column(
                   CDC_COLUMN_NAME,
                   when(col(TARGET_DELETE_COLUMN).is_null(), lit("delete")) // 
nulls are equal to True
                       .when(col(DELETE_COLUMN).is_null(), lit("source_delete"))
                       .when(col(TARGET_COPY_COLUMN).is_null(), lit("copy"))
                       .when(col(TARGET_INSERT_COLUMN).is_null(), lit("insert"))
                       .when(col(TARGET_UPDATE_COLUMN).is_null(), lit("update"))
                       .end()?,
               )?
               // .drop_columns(&["__delta_rs_path"])? // WEIRD bug caused by 
interaction with unnest_columns, has to be dropped otherwise throws schema error
               .with_column(
                   "__delta_rs_update_expanded",
                   when(
                       col(CDC_COLUMN_NAME).eq(lit("update")),
                       lit(ScalarValue::List(ScalarValue::new_list(
                           &[
                               
ScalarValue::Utf8(Some("update_preimage".into())),
                               
ScalarValue::Utf8(Some("update_postimage".into())),
                           ],
                           &DataType::List(Field::new("element", 
DataType::Utf8, false).into()),
                           true,
                       ))),
                   )
                   .end()?,
               )?
               .unnest_columns(&["__delta_rs_update_expanded"])?
               .with_column(
                   CDC_COLUMN_NAME,
                   when(
                       col(CDC_COLUMN_NAME).eq(lit("update")),
                       col("__delta_rs_update_expanded"),
                   )
                   .otherwise(col(CDC_COLUMN_NAME))?,
               )?
               .drop_columns(&["__delta_rs_update_expanded"])?
               .select(write_projection_with_cdf)?
   ```
   I noticed that when I do unnest_columns on another column, it complains 
afterwards about a schema error: 
   
   ```
   Result::unwrap()` on an `Err` value: Arrow { source: 
InvalidArgumentError("column types must match schema types, expected Utf8 but 
found Dictionary(UInt16, Utf8) at column index 7") }
   ```
   Since I don't need the column, I can safely drop it beforehand, but I don't 
understand why doesn't Dictionary(UInt16, Utf8) just coerce to utf8?
   
   
   ### To Reproduce
   
   Bit difficult but, you can run grab my branch: 
https://github.com/ion-elgreco/delta-rs/tree/refactor--combine_execution_plans
   
   And then you run the test test_merge_cdc_enabled_simple, with this line 
commented out: `.drop_columns(&["__delta_rs_path"])? `
   
   ### Expected behavior
   
   I guess coerce gracefully?
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to