Repository: spark
Updated Branches:
  refs/heads/master ac06a85da -> 5603e4c47


[SPARK-2242] HOTFIX: pyspark shell hangs on simple job

This reverts a change introduced in 3870248740d83b0292ccca88a494ce19783847f0, 
which redirected all stderr to the OS pipe instead of directly to the 
`bin/pyspark` shell output. This causes a simple job to hang in two ways:

1. If the cluster is not configured correctly or does not have enough 
resources, the job hangs without producing any output, because the relevant 
warning messages are masked.
2. If the stderr volume is large, this could lead to a deadlock if we redirect 
everything to the OS pipe. From the [python 
docs](https://docs.python.org/2/library/subprocess.html):

```
Note Do not use stdout=PIPE or stderr=PIPE with this function as that can 
deadlock
based on the child process output volume. Use Popen with the communicate() 
method
when you need pipes.
```

Note that we cannot remove `stdout=PIPE` in a similar way, because we currently 
use it to communicate the py4j port. However, it should be fine (as it has been 
for a long time) because we do not produce a ton of traffic through `stdout`.

That commit was not merged in branch-1.0, so this fix is for master only.

Author: Andrew Or <andrewo...@gmail.com>

Closes #1178 from andrewor14/fix-python and squashes the following commits:

e68e870 [Andrew Or] Merge branch 'master' of github.com:apache/spark into 
fix-python
20849a8 [Andrew Or] Tone down stdout interference message
a09805b [Andrew Or] Return more than 1 line of error message to user
6dfbd1e [Andrew Or] Don't swallow original exception
0d1861f [Andrew Or] Provide more helpful output if stdout is garbled
21c9d7c [Andrew Or] Do not mask stderr from output


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5603e4c4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5603e4c4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5603e4c4

Branch: refs/heads/master
Commit: 5603e4c47f1dc1b87336f57ed4d6bd9e88f5abcc
Parents: ac06a85
Author: Andrew Or <andrewo...@gmail.com>
Authored: Wed Jun 25 10:47:22 2014 -0700
Committer: Xiangrui Meng <m...@databricks.com>
Committed: Wed Jun 25 10:47:22 2014 -0700

----------------------------------------------------------------------
 python/pyspark/java_gateway.py | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/5603e4c4/python/pyspark/java_gateway.py
----------------------------------------------------------------------
diff --git a/python/pyspark/java_gateway.py b/python/pyspark/java_gateway.py
index 19235d5..0dbead4 100644
--- a/python/pyspark/java_gateway.py
+++ b/python/pyspark/java_gateway.py
@@ -43,18 +43,23 @@ def launch_gateway():
             # Don't send ctrl-c / SIGINT to the Java gateway:
             def preexec_func():
                 signal.signal(signal.SIGINT, signal.SIG_IGN)
-            proc = Popen(command, stdout=PIPE, stdin=PIPE, stderr=PIPE, 
preexec_fn=preexec_func)
+            proc = Popen(command, stdout=PIPE, stdin=PIPE, 
preexec_fn=preexec_func)
         else:
             # preexec_fn not supported on Windows
-            proc = Popen(command, stdout=PIPE, stdin=PIPE, stderr=PIPE)
-        
+            proc = Popen(command, stdout=PIPE, stdin=PIPE)
+
         try:
             # Determine which ephemeral port the server started on:
-            gateway_port = int(proc.stdout.readline())
-        except:
-            error_code = proc.poll()
-            raise Exception("Launching GatewayServer failed with exit code %d: 
%s" %
-                (error_code, "".join(proc.stderr.readlines())))
+            gateway_port = proc.stdout.readline()
+            gateway_port = int(gateway_port)
+        except ValueError:
+            (stdout, _) = proc.communicate()
+            exit_code = proc.poll()
+            error_msg = "Launching GatewayServer failed"
+            error_msg += " with exit code %d!" % exit_code if exit_code else 
"! "
+            error_msg += "(Warning: unexpected output detected.)\n\n"
+            error_msg += gateway_port + stdout
+            raise Exception(error_msg)
 
         # Create a thread to echo output from the GatewayServer, which is 
required
         # for Java log output to show up:

Reply via email to