niebayes opened a new pull request, #21432:
URL: https://github.com/apache/datafusion/pull/21432

   ## Summary
   
   This PR fixes a bug in the 
`datafusion-examples/examples/relation_planner/pivot_unpivot.rs`
   example implementation.
   
   The example planner rewrites `PIVOT` into a `GROUP BY` plus `CASE` 
expressions. During that
   rewrite, it rebuilt column references using unquoted `col("...")` 
expressions. That loses the
   original identifier quoting and case-sensitivity, which breaks queries that 
pivot on quoted
   mixed-case columns such as `"pointNumber"`.
   
   This PR preserves the original parsed column expression instead of 
reconstructing it from a plain
   string.
   
   ## Problem
   
   Consider a query like:
   
   ```sql
   SELECT *
   FROM point_stats
   PIVOT (
     MAX(max_value)
     FOR "pointNumber" IN ('16951' AS p16951, '16952' AS p16952)
   )
   ORDER BY ts
   ```
   
   Before this change, the example planner:
   
   1. Parsed `"pointNumber"` correctly as a quoted, case-sensitive identifier.
   2. Extracted its name as `pointNumber`.
   3. Reconstructed new expressions with `col(&pivot_col_name)` and `col(...)` 
for `GROUP BY`.
   
   That reconstruction treated the identifier as an unquoted column reference, 
which could be
   normalized differently from the original schema field. In practice, this 
means the planner could
   end up looking for `pointnumber` while the schema still contained 
`"pointNumber"`.
   
   ## Root Cause
   
   Two places in the example rewrite logic rebuilt column references from bare 
strings:
   
   1. The generated `CASE` expression for the pivot column
   2. The inferred `GROUP BY` expressions
   
   That is fine for simple lowercase identifiers, but it is not correct for 
quoted identifiers or
   qualified fields because the reconstructed expression no longer carries the 
original identifier
   semantics.
   
   ## Fix
   
   The fix is minimal:
   
   1. Reuse the already planned `pivot_col` expression when building the `CASE` 
expression.
   2. Build `GROUP BY` expressions directly from the input schema via 
`Expr::from(...)` rather than
      re-creating them with `col(field_name)`.
   
   This preserves:
   
   - quoted mixed-case identifiers
   - qualifiers
   - original field resolution semantics
   
   ## Code Changes
   
   File changed:
   
   - `datafusion-examples/examples/relation_planner/pivot_unpivot.rs`
   
   Main changes:
   
   - Replace:
   
   ```rust
   case(col(&pivot_col_name))
   ```
   
   with:
   
   ```rust
   case(pivot_col.clone())
   ```
   
   - Replace string-based `GROUP BY` reconstruction:
   
   ```rust
   schema
       .fields()
       .iter()
       .map(|f| f.name().as_str())
       ...
       .map(col)
   ```
   
   with schema-derived expressions:
   
   ```rust
   schema
       .iter()
       .filter(...)
       .map(Expr::from)
   ```
   
   ## Example Coverage
   
   This PR also adds an additional example scenario to the same file:
   
   - a `point_stats` input table
   - a `PIVOT` query using quoted mixed-case column `"pointNumber"`
   - a snapshot asserting the expected output
   
   This makes the bug and the fix visible directly in the example itself.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to