zeppelin git commit: ZEPPELIN-2195. Use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON over zeppelin.pyspark.python

felixcheung Thu, 02 Mar 2017 22:53:20 -0800

Repository: zeppelin
Updated Branches:
  refs/heads/master df320481b -> bab985461



ZEPPELIN-2195. Use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON over 
zeppelin.pyspark.python

### What is this PR for?
`zeppelin.pyspark.python` is zeppelin configuration for the python exec on 
driver side, it won't affect executor side. It would be better to use 
`PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON`  which is what spark use 
officially.  So that user can define their own python exec in interpreter 
setting for different version of python rather than defining them 
`zeppelin-env.sh` which is shared globally.

### What type of PR is it?
[ Improvement ]

### Todos
* [ ] - Task

### What is the Jira issue?
* https://issues.apache.org/jira/browse/ZEPPELIN-2195

### How should this be tested?
Tested it manually.

### Questions:
* Does the licenses files need update? No
* Is there breaking changes for older versions? No
* Does this needs documentation? No

Author: Jeff Zhang <zjf...@apache.org>

Closes #2079 from zjffdu/ZEPPELIN-2195 and squashes the following commits:

fa71cb2 [Jeff Zhang] address comments
fd89a1e [Jeff Zhang] ZEPPELIN-2195. Use PYSPARK_PYTHON and 
PYSPARK_DRIVER_PYTHON over zeppelin.pyspark.python


Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo
Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/bab98546
Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/bab98546
Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/bab98546

Branch: refs/heads/master
Commit: bab985461c507f276e648f335997acc340416f56
Parents: df32048
Author: Jeff Zhang <zjf...@apache.org>
Authored: Tue Feb 28 17:42:51 2017 +0800
Committer: Felix Cheung <felixche...@apache.org>
Committed: Thu Mar 2 22:51:06 2017 -0800

----------------------------------------------------------------------
 docs/install/upgrade.md                                  |  5 +++++
 docs/interpreter/spark.md                                | 11 +++++++++--
 .../org/apache/zeppelin/spark/PySparkInterpreter.java    | 11 ++++++++++-
 3 files changed, 24 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/docs/install/upgrade.md
----------------------------------------------------------------------
diff --git a/docs/install/upgrade.md b/docs/install/upgrade.md
index f310196..44e38a3 100644
--- a/docs/install/upgrade.md
+++ b/docs/install/upgrade.md
@@ -56,3 +56,8 @@ So, copying `notebook` and `conf` directory should be enough.
  - From 0.7, we uses `pegdown` as the `markdown.parser.type` option for the 
`%md` interpreter. Rendered markdown might be different from what you expected
  - From 0.7 note.json format has been changed to support multiple outputs in a 
paragraph. Zeppelin will automatically convert old format to new format. 0.6 or 
lower version can read new note.json format but output will not be displayed. 
For the detail, see 
[ZEPPELIN-212](http://issues.apache.org/jira/browse/ZEPPELIN-212) and [pull 
request](https://github.com/apache/zeppelin/pull/1658).
  - From 0.7 note storage layer will utilize `GitNotebookRepo` by default 
instead of `VFSNotebookRepo` storage layer, which is an extension of latter one 
with versioning capabilities on top of it.
+
+### Upgrading from Zeppelin 0.7 to 0.8
+
+ - From 0.8, we recommend to use `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` 
instead of `zeppelin.pyspark.python` as `zeppelin.pyspark.python` only effects 
driver. You can use `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` as using them 
in spark.
+ 
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/docs/interpreter/spark.md
----------------------------------------------------------------------
diff --git a/docs/interpreter/spark.md b/docs/interpreter/spark.md
index 6d92561..a19eda2 100644
--- a/docs/interpreter/spark.md
+++ b/docs/interpreter/spark.md
@@ -104,9 +104,16 @@ You can also set other Spark properties which are not 
listed in the table. For a
     <td>Local repository for dependency loader</td>
   </tr>
   <tr>
-    <td>zeppelin.pyspark.python</td>
+    <td>PYSPARK_PYTHON</td>
     <td>python</td>
-    <td>Python command to run pyspark with</td>
+    <td>Python binary executable to use for PySpark in both driver and workers 
(default is <code>python</code>).
+            Property <code>spark.pyspark.python</code> take precedence if it 
is set</td>
+  </tr>
+  <tr>
+    <td>PYSPARK_DRIVER_PYTHON</td>
+    <td>python</td>
+    <td>Python binary executable to use for PySpark in driver only (default is 
<code>PYSPARK_PYTHON</code>).
+            Property <code>spark.pyspark.driver.python</code> take precedence 
if it is set</td>
   </tr>
   <tr>
     <td>zeppelin.spark.concurrentSQL</td>

http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
----------------------------------------------------------------------
diff --git 
a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java 
b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
index 0679fcc..b7dc67d 100644
--- a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
+++ b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java
@@ -200,7 +200,16 @@ public class PySparkInterpreter extends Interpreter 
implements ExecuteResultHand
     gatewayServer.start();
 
     // Run python shell
-    CommandLine cmd = 
CommandLine.parse(getProperty("zeppelin.pyspark.python"));
+    // Choose python in the order of
+    // PYSPARK_DRIVER_PYTHON > PYSPARK_PYTHON > zeppelin.pyspark.python
+    String pythonExec = getProperty("zeppelin.pyspark.python");
+    if (System.getenv("PYSPARK_PYTHON") != null) {
+      pythonExec = System.getenv("PYSPARK_PYTHON");
+    }
+    if (System.getenv("PYSPARK_DRIVER_PYTHON") != null) {
+      pythonExec = System.getenv("PYSPARK_DRIVER_PYTHON");
+    }
+    CommandLine cmd = CommandLine.parse(pythonExec);
     cmd.addArgument(scriptPath, false);
     cmd.addArgument(Integer.toString(port), false);
     
cmd.addArgument(Integer.toString(getSparkInterpreter().getSparkVersion().toNumber()),
 false);

zeppelin git commit: ZEPPELIN-2195. Use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON over zeppelin.pyspark.python

Reply via email to