(spark) branch master updated: [MINOR][PYTHON] Better error message when Python worker crushes

gurwls223 Mon, 13 Nov 2023 00:40:47 -0800

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/master by this push:
     new bfbd3df69956 [MINOR][PYTHON] Better error message when Python worker 
crushes
bfbd3df69956 is described below

commit bfbd3df699560b901c71ffef5912ace97106bca3
Author: Hyukjin Kwon <[email protected]>
AuthorDate: Mon Nov 13 17:40:22 2023 +0900

    [MINOR][PYTHON] Better error message when Python worker crushes
    
    ### What changes were proposed in this pull request?
    
    This PR improves the Python UDF error messages to be more actionable.
    
    ### Why are the changes needed?
    
    Suppose you face a segfault error:
    
    ```python
    from pyspark.sql.functions import udf
    import ctypes
    spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
    ```
    
    The current error message is not actionable:
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      ...
    get_return_value
        raise Py4JJavaError(
    py4j.protocol.Py4JJavaError: An error occurred while calling 
o82.collectToPython.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
15 in stage 1.0 failed 1 times, most recent failure: Lost task 15.0 in stage 
1.0 (TID 31) (192.168.123.102 executor driver): org.apache.spark.SparkException:
    Python worker exited unexpectedly (crashed)
    ```
    
    After this PR, it fixes the error message as below:
    
    ```
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      ...
    get_return_value
        raise Py4JJavaError(
    py4j.protocol.Py4JJavaError: An error occurred while calling 
o59.collectToPython.
    : org.apache.spark.SparkException: Job aborted due to stage failure: Task 
15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 
0.0 (TID 15) (192.168.123.102 executor driver): org.apache.spark.SparkException:
    Python worker exited unexpectedly (crashed). Consider setting 
'spark.sql.execution.pyspark.udf.faulthandler.enabled'
    or 'spark.python.worker.faulthandler.enabled' configuration to 'true' 
forthe better Python traceback.
    ```
    
    So you can try this out
    
    ```python
    from pyspark.sql.functions import udf
    import ctypes
    spark.conf.set("spark.sql.execution.pyspark.udf.faulthandler.enabled", 
"true")
    spark.range(1).select(udf(lambda x: ctypes.string_at(0))("id")).collect()
    ```
    
    that now shows where the segfault happens:
    
    ```
    Caused by: org.apache.spark.SparkException: Python worker exited 
unexpectedly (crashed): Fatal Python error: Segmentation fault
    
    Current thread 0x00007ff84ae4b700 (most recent call first):
      File "/.../envs/python3.9/lib/python3.9/ctypes/__init__.py", line 525 in 
string_at
      File "<stdin>", line 1 in <lambda>
      File "/.../lib/pyspark.zip/pyspark/util.py", line 88 in wrapper
      File "/.../lib/pyspark.zip/pyspark/worker.py", line 99 in <lambda>
      File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in <genexpr>
      File "/.../lib/pyspark.zip/pyspark/worker.py", line 1403 in mapper
    ```
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, it fixes the error message actionable.
    
    ### How was this patch tested?
    
    Manually tested as above.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No.
    
    Closes #43778 from HyukjinKwon/minor-error-improvement.
    
    Authored-by: Hyukjin Kwon <[email protected]>
    Signed-off-by: Hyukjin Kwon <[email protected]>
---
 .../main/scala/org/apache/spark/api/python/PythonRunner.scala    | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
index 1a01ad1bc219..d6363182606d 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRunner.scala
@@ -31,7 +31,7 @@ import scala.util.control.NonFatal
 
 import org.apache.spark._
 import org.apache.spark.internal.Logging
-import org.apache.spark.internal.config.{BUFFER_SIZE, EXECUTOR_CORES}
+import org.apache.spark.internal.config.{BUFFER_SIZE, EXECUTOR_CORES, Python}
 import org.apache.spark.internal.config.Python._
 import org.apache.spark.rdd.InputFileBlockHolder
 import 
org.apache.spark.resource.ResourceProfile.{EXECUTOR_CORES_LOCAL_PROPERTY, 
PYSPARK_MEMORY_LOCAL_PROPERTY}
@@ -549,6 +549,13 @@ private[spark] abstract class BasePythonRunner[IN, OUT](
         JavaFiles.deleteIfExists(path)
         throw new SparkException(s"Python worker exited unexpectedly 
(crashed): $error", e)
 
+      case eof: EOFException if !faultHandlerEnabled =>
+        throw new SparkException(
+          s"Python worker exited unexpectedly (crashed). " +
+            "Consider setting 
'spark.sql.execution.pyspark.udf.faulthandler.enabled' or" +
+            s"'${Python.PYTHON_WORKER_FAULTHANLDER_ENABLED.key}' configuration 
to 'true' for" +
+            "the better Python traceback.", eof)
+
       case eof: EOFException =>
         throw new SparkException("Python worker exited unexpectedly 
(crashed)", eof)
     }


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

(spark) branch master updated: [MINOR][PYTHON] Better error message when Python worker crushes

Reply via email to