Repository: spark
Updated Branches:
  refs/heads/master 6e27cb630 -> 5b4a5b1ac


[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and 
PYSPARK_PYTHON unset

### Problem

The section "Using the shell" in Spark Programming Guide 
(https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) 
says that we can run pyspark REPL through IPython.
But a folloing command does not run IPython but a default Python executable.

```
$ IPYTHON=1 ./bin/pyspark
Python 2.7.8 (default, Jul  2 2014, 10:14:46)
...
```

the spark/bin/pyspark script on the commit 
b235e013638685758885842dc3268e9800af3678 decides which executable and options 
it use folloing way.

1. if PYSPARK_PYTHON unset
   * → defaulting to "python"
2. if IPYTHON_OPTS set
   * → set IPYTHON "1"
3. some python scripts passed to ./bin/pyspak → run it with ./bin/spark-submit
   * out of this issues scope
4. if IPYTHON set as "1"
   * → execute $PYSPARK_PYTHON (default: ipython) with arguments $IPYTHON_OPTS
   * otherwise execute $PYSPARK_PYTHON

Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is 
"1".
In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no 
effect on decide which command to use.

PYSPARK_PYTHON | IPYTHON_OPTS | IPYTHON | resulting command | expected command
---- | ---- | ----- | ----- | -----
(unset → defaults to python) | (unset) | (unset) | python | (same)
(unset → defaults to python) | (unset) | 1 | python | ipython
(unset → defaults to python) | an_option | (unset → set to 1) | python 
an_option | ipython an_option
(unset → defaults to python) | an_option | 1 | python an_option | ipython 
an_option
ipython | (unset) | (unset) | ipython | (same)
ipython | (unset) | 1 | ipython | (same)
ipython | an_option | (unset → set to 1) | ipython an_option | (same)
ipython | an_option | 1 | ipython an_option | (same)

### Suggestion

The pyspark script should determine firstly whether a user wants to run IPython 
or other executables.

1. if IPYTHON_OPTS set
   * set IPYTHON "1"
2.  if IPYTHON has a value "1"
   * PYSPARK_PYTHON defaults to "ipython" if not set
3. PYSPARK_PYTHON defaults to "python" if not set

See the pull request for more detailed modification.

Author: cocoatomo <[email protected]>

Closes #2554 from cocoatomo/issues/cannot-run-ipython-without-options and 
squashes the following commits:

d2a9b06 [cocoatomo] [SPARK-3706][PySpark] Use PYTHONUNBUFFERED environment 
variable instead of -u option
264114c [cocoatomo] [SPARK-3706][PySpark] Remove the sentence about deprecated 
environment variables
42e02d5 [cocoatomo] [SPARK-3706][PySpark] Replace environment variables used to 
customize execution of PySpark REPL
10d56fb [cocoatomo] [SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON 
set to "1" and PYSPARK_PYTHON unset


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5b4a5b1a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5b4a5b1a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5b4a5b1a

Branch: refs/heads/master
Commit: 5b4a5b1acdc439a58aa2a3561ac0e3fb09f529d6
Parents: 6e27cb6
Author: cocoatomo <[email protected]>
Authored: Thu Oct 2 11:13:19 2014 -0700
Committer: Josh Rosen <[email protected]>
Committed: Thu Oct 2 11:13:19 2014 -0700

----------------------------------------------------------------------
 bin/pyspark                                     | 24 ++++++++++----------
 .../org/apache/spark/deploy/PythonRunner.scala  |  3 ++-
 docs/programming-guide.md                       |  8 +++----
 3 files changed, 18 insertions(+), 17 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/5b4a5b1a/bin/pyspark
----------------------------------------------------------------------
diff --git a/bin/pyspark b/bin/pyspark
index 5142411..6655725 100755
--- a/bin/pyspark
+++ b/bin/pyspark
@@ -52,10 +52,20 @@ fi
 
 # Figure out which Python executable to use
 if [[ -z "$PYSPARK_PYTHON" ]]; then
-  PYSPARK_PYTHON="python"
+  if [[ "$IPYTHON" = "1" || -n "$IPYTHON_OPTS" ]]; then
+    # for backward compatibility
+    PYSPARK_PYTHON="ipython"
+  else
+    PYSPARK_PYTHON="python"
+  fi
 fi
 export PYSPARK_PYTHON
 
+if [[ -z "$PYSPARK_PYTHON_OPTS" && -n "$IPYTHON_OPTS" ]]; then
+  # for backward compatibility
+  PYSPARK_PYTHON_OPTS="$IPYTHON_OPTS"
+fi
+
 # Add the PySpark classes to the Python path:
 export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
 export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"
@@ -64,11 +74,6 @@ export 
PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"
 export OLD_PYTHONSTARTUP="$PYTHONSTARTUP"
 export PYTHONSTARTUP="$FWDIR/python/pyspark/shell.py"
 
-# If IPython options are specified, assume user wants to run IPython
-if [[ -n "$IPYTHON_OPTS" ]]; then
-  IPYTHON=1
-fi
-
 # Build up arguments list manually to preserve quotes and backslashes.
 # We export Spark submit arguments as an environment variable because shell.py 
must run as a
 # PYTHONSTARTUP script, which does not take in arguments. This is required for 
IPython notebooks.
@@ -106,10 +111,5 @@ if [[ "$1" =~ \.py$ ]]; then
 else
   # PySpark shell requires special handling downstream
   export PYSPARK_SHELL=1
-  # Only use ipython if no command line arguments were provided [SPARK-1134]
-  if [[ "$IPYTHON" = "1" ]]; then
-    exec ${PYSPARK_PYTHON:-ipython} $IPYTHON_OPTS
-  else
-    exec "$PYSPARK_PYTHON"
-  fi
+  exec "$PYSPARK_PYTHON" $PYSPARK_PYTHON_OPTS
 fi

http://git-wip-us.apache.org/repos/asf/spark/blob/5b4a5b1a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
----------------------------------------------------------------------
diff --git a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
index b66c3ba..79b4d7e 100644
--- a/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala
@@ -54,9 +54,10 @@ object PythonRunner {
     val pythonPath = PythonUtils.mergePythonPaths(pathElements: _*)
 
     // Launch Python process
-    val builder = new ProcessBuilder(Seq(pythonExec, "-u", 
formattedPythonFile) ++ otherArgs)
+    val builder = new ProcessBuilder(Seq(pythonExec, formattedPythonFile) ++ 
otherArgs)
     val env = builder.environment()
     env.put("PYTHONPATH", pythonPath)
+    env.put("PYTHONUNBUFFERED", "YES") // value is needed to be set to a 
non-empty string
     env.put("PYSPARK_GATEWAY_PORT", "" + gatewayServer.getListeningPort)
     builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr 
to synchronize
     val process = builder.start()

http://git-wip-us.apache.org/repos/asf/spark/blob/5b4a5b1a/docs/programming-guide.md
----------------------------------------------------------------------
diff --git a/docs/programming-guide.md b/docs/programming-guide.md
index 1d61a3c..8e8cc1d 100644
--- a/docs/programming-guide.md
+++ b/docs/programming-guide.md
@@ -211,17 +211,17 @@ For a complete list of options, run `pyspark --help`. 
Behind the scenes,
 
 It is also possible to launch the PySpark shell in 
[IPython](http://ipython.org), the
 enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To
-use IPython, set the `IPYTHON` variable to `1` when running `bin/pyspark`:
+use IPython, set the `PYSPARK_PYTHON` variable to `ipython` when running 
`bin/pyspark`:
 
 {% highlight bash %}
-$ IPYTHON=1 ./bin/pyspark
+$ PYSPARK_PYTHON=ipython ./bin/pyspark
 {% endhighlight %}
 
-You can customize the `ipython` command by setting `IPYTHON_OPTS`. For 
example, to launch
+You can customize the `ipython` command by setting `PYSPARK_PYTHON_OPTS`. For 
example, to launch
 the [IPython Notebook](http://ipython.org/notebook.html) with PyLab plot 
support:
 
 {% highlight bash %}
-$ IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark
+$ PYSPARK_PYTHON=ipython PYSPARK_PYTHON_OPTS="notebook --pylab inline" 
./bin/pyspark
 {% endhighlight %}
 
 </div>


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to