Re: [I] Expected: decimal(7,2), Found: DOUBLE when running TPC-DS benchmarks on Spark 3.5 [datafusion-comet]

via GitHub Tue, 15 Jul 2025 15:17:37 -0700


ShreyeshArangath commented on issue #2029:
URL: 
https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3075918084


   Update:
   
   Looking the executor logs it looks like its likely because its not using 
HDFS as the data source. The file reader is using S3 related  configs?
   
   ```
        ... 36 more
   25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File 
reader auto configured 'fs.s3a.connection.maximum=256'
   25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File 
reader auto configured 'fs.s3a.readahead.range=1048576'
   25/07/15 22:09:40 INFO  YarnCoarseGrainedExecutorBackend: Got assigned task 
7568
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) Executor: Running 
task 52.0 in stage 79.0 (TID 7568)
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator: 
Calculated per-task memory limit of 0 (0 * 1.0 / 8.0)
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator: 
Calculated per-task memory limit of 0 (0 * 1.0 / 8.0)
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) FileScanRDD: 
Reading File path: 
hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00017-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet,
 range: 45493820-90987640, partition values: [empty row]
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File 
reader auto configured 'fs.s3a.connection.maximum=256'
   25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File 
reader auto configured 'fs.s3a.readahead.range=1048576'
   25/07/15 22:09:40 ERROR task 345.0 in stage 78.0 (TID 7413) Executor: 
Exception in task 345.0 in stage 78.0 (TID 7413)
   org.apache.spark.SparkException: Parquet column cannot be converted in file 
hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00035-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet.
 Column: [sr_return_amt], Expected: decimal(7,2), Found: DOUBLE.
        at 
org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:855)
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:302)
        at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:142)
        at 
org.apache.spark.sql.comet.CometScanExec$$anon$1.hasNext(CometScanExec.scala:268)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Expected: decimal(7,2), Found: DOUBLE when running TPC-DS benchmarks on Spark 3.5 [datafusion-comet]

Reply via email to