monchickey opened a new issue, #16987: URL: https://github.com/apache/dolphinscheduler/issues/16987
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/dolphinscheduler/issues?q=is%3Aissue) and found no similar issues. ### What happened DolphinScheduler version: 3.2.2 Deployment: pseudo-cluster Spark is deployed in a standalone cluster, version: 3.5.4 Resource files are stored using MinIO S3 The configuration files involve `api-server/conf/common.properties` and `worker-server/conf/common.properties`, The main changes are as follows: ``` resource.storage.type=S3 resource.storage.upload.base.path=/dolphinscheduler resource.aws.access.key.id=<minio access key> resource.aws.secret.access.key=<minio secret key> resource.aws.region=cn-north-1 resource.aws.s3.bucket.name=dolphinscheduler resource.aws.s3.endpoint=http://<ip>:9000 resource.hdfs.root.user=root resource.hdfs.fs.defaultFS=s3a://dolphinscheduler ``` Keep the rest of the configuration as default, After starting the service, the jar file can be uploaded normally. Then select the SPARK component in the workflow, select the Jar package uploaded to MinIO, and select `cluster` as the deployment method. Then run the workflow instance, and the output log attachment is as follows: [1737699046243.log](https://github.com/user-attachments/files/18531446/1737699046243.log) The important error information is: ``` [INFO] 2025-01-24 13:53:34.674 +0800 - ********************************* Execute task instance ************************************* [INFO] 2025-01-24 13:53:34.675 +0800 - *********************************************************************************************** [INFO] 2025-01-24 13:53:34.677 +0800 - Final Shell file is: [INFO] 2025-01-24 13:53:34.677 +0800 - ****************************** Script Content ***************************************************************** [INFO] 2025-01-24 13:53:34.677 +0800 - #!/bin/bash BASEDIR=$(cd `dirname $0`; pwd) cd $BASEDIR export SPARK_HOME=/opt/spark-3.5.4-bin-hadoop3 ${SPARK_HOME}/bin/spark-submit --master spark://192.168.11.17:7077 --deploy-mode cluster --class org.apache.spark.examples.JavaSparkPi --conf spark.driver.cores=1 --conf spark.driver.memory=512M --conf spark.executor.instances=2 --conf spark.executor.cores=2 --conf spark.executor.memory=2G /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar [INFO] 2025-01-24 13:53:34.678 +0800 - ****************************** Script Content ***************************************************************** [INFO] 2025-01-24 13:53:34.678 +0800 - Executing shell command : sudo -u default -i /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/6_6.sh [INFO] 2025-01-24 13:53:34.687 +0800 - process start, process id is: 172698 [INFO] 2025-01-24 13:53:37.688 +0800 - -> 25/01/24 13:53:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 25/01/24 13:53:37 INFO SecurityManager: Changing view acls to: default 25/01/24 13:53:37 INFO SecurityManager: Changing modify acls to: default 25/01/24 13:53:37 INFO SecurityManager: Changing view acls groups to: 25/01/24 13:53:37 INFO SecurityManager: Changing modify acls groups to: 25/01/24 13:53:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: default; groups with view permissions: EMPTY; users with modify permissions: default; groups with modify permissions: EMPTY [INFO] 2025-01-24 13:53:38.691 +0800 - -> 25/01/24 13:53:37 INFO Utils: Successfully started service 'driverClient' on port 39639. 25/01/24 13:53:37 INFO TransportClientFactory: Successfully created connection to /192.168.11.17:7077 after 57 ms (0 ms spent in bootstraps) 25/01/24 13:53:38 INFO ClientEndpoint: ... waiting before polling master for driver state 25/01/24 13:53:38 INFO ClientEndpoint: Driver successfully submitted as driver-20250124135338-0056 [INFO] 2025-01-24 13:53:43.693 +0800 - -> 25/01/24 13:53:43 INFO ClientEndpoint: State of driver-20250124135338-0056 is ERROR 25/01/24 13:53:43 ERROR ClientEndpoint: Exception from cluster was: java.nio.file.NoSuchFileException: /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar java.nio.file.NoSuchFileException: /tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6/spark-examples_2.12-3.5.4.jar at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixCopyFile.copy(UnixCopyFile.java:526) at sun.nio.fs.UnixFileSystemProvider.copy(UnixFileSystemProvider.java:253) at java.nio.file.Files.copy(Files.java:1274) at org.apache.spark.util.Utils$.copyRecursive(Utils.scala:681) at org.apache.spark.util.Utils$.copyFile(Utils.scala:652) at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:725) at org.apache.spark.util.Utils$.fetchFile(Utils.scala:467) at org.apache.spark.deploy.worker.DriverRunner.downloadUserJar(DriverRunner.scala:162) at org.apache.spark.deploy.worker.DriverRunner.prepareAndRunDriver(DriverRunner.scala:179) at org.apache.spark.deploy.worker.DriverRunner$$anon$2.run(DriverRunner.scala:99) 25/01/24 13:53:43 INFO ShutdownHookManager: Shutdown hook called 25/01/24 13:53:43 INFO ShutdownHookManager: Deleting directory /tmp/spark-2af4f41d-c583-4698-9d8e-546a656bcf17 [INFO] 2025-01-24 13:53:43.695 +0800 - process has exited. execute path:/tmp/dolphinscheduler/exec/process/default/131329535157952/131329769571008_2/6/6, processId:172698 ,exitStatusCode:255 ,processWaitForStatus:true ,processExitValue:255 [INFO] 2025-01-24 13:53:43.697 +0800 - Start finding appId in /opt/apache-dolphinscheduler-3.2.2-bin/worker-server/logs/20250124/131329769571008/2/6/6.log, fetch way: log [INFO] 2025-01-24 13:53:43.698 +0800 - *********************************************************************************************** [INFO] 2025-01-24 13:53:43.699 +0800 - ********************************* Finalize task instance ************************************ [INFO] 2025-01-24 13:53:43.699 +0800 - *********************************************************************************************** ``` From the error message, we can see that although the jar package on MinIO was selected when configuring the workflow, DolphinScheduler still used the local temporary directory as a parameter during runtime, which caused the Spark Driver to fail to read the package and cause an error. ### What you expected to happen Tasks can be submitted and run normally, ### How to reproduce You can reproduce it by following the steps above. ### Anything else The above problem will occur as long as DolphinScheduler and Spark Driver are not running on the same node. ### Version 3.2.x ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
