ShreyeshArangath commented on issue #2029:
URL:
https://github.com/apache/datafusion-comet/issues/2029#issuecomment-3075918084
Update:
Looking the executor logs it looks like its likely because its not using
HDFS as the data source. The file reader is using S3 related configs?
```
... 36 more
25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File
reader auto configured 'fs.s3a.connection.maximum=256'
25/07/15 22:09:40 INFO task 0.0 in stage 79.0 (TID 7530) ReadOptions: File
reader auto configured 'fs.s3a.readahead.range=1048576'
25/07/15 22:09:40 INFO YarnCoarseGrainedExecutorBackend: Got assigned task
7568
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) Executor: Running
task 52.0 in stage 79.0 (TID 7568)
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator:
Calculated per-task memory limit of 0 (0 * 1.0 / 8.0)
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) CometExecIterator:
Calculated per-task memory limit of 0 (0 * 1.0 / 8.0)
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) FileScanRDD:
Reading File path:
hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00017-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet,
range: 45493820-90987640, partition values: [empty row]
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File
reader auto configured 'fs.s3a.connection.maximum=256'
25/07/15 22:09:40 INFO task 52.0 in stage 79.0 (TID 7568) ReadOptions: File
reader auto configured 'fs.s3a.readahead.range=1048576'
25/07/15 22:09:40 ERROR task 345.0 in stage 78.0 (TID 7413) Executor:
Exception in task 345.0 in stage 78.0 (TID 7413)
org.apache.spark.SparkException: Parquet column cannot be converted in file
hdfs://cluster/jobs/x/y/tpcds-unpartitioned/tpcds-1000/store_returns/part-00035-85e008a2-66d5-4ca4-a957-da1b024bc0ae-c000.snappy.parquet.
Column: [sr_return_amt], Expected: decimal(7,2), Found: DOUBLE.
at
org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedSchemaColumnConvertError(QueryExecutionErrors.scala:855)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:302)
at
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:142)
at
org.apache.spark.sql.comet.CometScanExec$$anon$1.hasNext(CometScanExec.scala:268)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]