andygrove commented on issue #4137:
URL: 
https://github.com/apache/datafusion-comet/issues/4137#issuecomment-4367008650

   Investigated this and the title/description undersell the bug. Reproducer 
(Comet on Spark 4.1.1, no custom \`CachedBatchSerializer\` needed):
   
   \`\`\`scala
   val data = spark.read.parquet(path)  // 2 rows
   data.cache()
   data.count()             // → 2  ✓
   data.collect()           // → [100], [200]  ✓
   data.union(data).count() // → 0  ✗  expected 4
   \`\`\`
   
   Executed plan:
   
   \`\`\`
   CometNativeColumnarToRow
   +- CometUnion
      :- CometSparkColumnarToColumnar
      :  +- InMemoryTableScan ...
      +- CometSparkColumnarToColumnar
         +- InMemoryTableScan ...
   \`\`\`
   
   The plan-structure mismatch the title describes (\`CometNativeColumnarToRow 
+ CometNativeScan\` instead of \`WholeStageCodegenExec(ColumnarToRowExec)\`) is 
real but secondary. The blocking failure when this test is un-ignored is the 
\`assert(df.count() == 4)\` line, which fires before the structure check.
   
   Either \`CometSparkColumnarToColumnar\` is producing empty batches when fed 
an \`InMemoryTableScan\`, or \`CometUnion\` is mishandling 
non-\`CometNativeExec\` children. Worth narrowing down which one.
   
   Updating the title and description to reflect the actual root cause would 
help anyone picking this up. The fix is Comet code, not a diff matcher tweak.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to