Repository: zeppelin Updated Branches: refs/heads/master df320481b -> bab985461
ZEPPELIN-2195. Use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON over zeppelin.pyspark.python ### What is this PR for? `zeppelin.pyspark.python` is zeppelin configuration for the python exec on driver side, it won't affect executor side. It would be better to use `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` which is what spark use officially. So that user can define their own python exec in interpreter setting for different version of python rather than defining them `zeppelin-env.sh` which is shared globally. ### What type of PR is it? [ Improvement ] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-2195 ### How should this be tested? Tested it manually. ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjf...@apache.org> Closes #2079 from zjffdu/ZEPPELIN-2195 and squashes the following commits: fa71cb2 [Jeff Zhang] address comments fd89a1e [Jeff Zhang] ZEPPELIN-2195. Use PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON over zeppelin.pyspark.python Project: http://git-wip-us.apache.org/repos/asf/zeppelin/repo Commit: http://git-wip-us.apache.org/repos/asf/zeppelin/commit/bab98546 Tree: http://git-wip-us.apache.org/repos/asf/zeppelin/tree/bab98546 Diff: http://git-wip-us.apache.org/repos/asf/zeppelin/diff/bab98546 Branch: refs/heads/master Commit: bab985461c507f276e648f335997acc340416f56 Parents: df32048 Author: Jeff Zhang <zjf...@apache.org> Authored: Tue Feb 28 17:42:51 2017 +0800 Committer: Felix Cheung <felixche...@apache.org> Committed: Thu Mar 2 22:51:06 2017 -0800 ---------------------------------------------------------------------- docs/install/upgrade.md | 5 +++++ docs/interpreter/spark.md | 11 +++++++++-- .../org/apache/zeppelin/spark/PySparkInterpreter.java | 11 ++++++++++- 3 files changed, 24 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/docs/install/upgrade.md ---------------------------------------------------------------------- diff --git a/docs/install/upgrade.md b/docs/install/upgrade.md index f310196..44e38a3 100644 --- a/docs/install/upgrade.md +++ b/docs/install/upgrade.md @@ -56,3 +56,8 @@ So, copying `notebook` and `conf` directory should be enough. - From 0.7, we uses `pegdown` as the `markdown.parser.type` option for the `%md` interpreter. Rendered markdown might be different from what you expected - From 0.7 note.json format has been changed to support multiple outputs in a paragraph. Zeppelin will automatically convert old format to new format. 0.6 or lower version can read new note.json format but output will not be displayed. For the detail, see [ZEPPELIN-212](http://issues.apache.org/jira/browse/ZEPPELIN-212) and [pull request](https://github.com/apache/zeppelin/pull/1658). - From 0.7 note storage layer will utilize `GitNotebookRepo` by default instead of `VFSNotebookRepo` storage layer, which is an extension of latter one with versioning capabilities on top of it. + +### Upgrading from Zeppelin 0.7 to 0.8 + + - From 0.8, we recommend to use `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` instead of `zeppelin.pyspark.python` as `zeppelin.pyspark.python` only effects driver. You can use `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` as using them in spark. + \ No newline at end of file http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/docs/interpreter/spark.md ---------------------------------------------------------------------- diff --git a/docs/interpreter/spark.md b/docs/interpreter/spark.md index 6d92561..a19eda2 100644 --- a/docs/interpreter/spark.md +++ b/docs/interpreter/spark.md @@ -104,9 +104,16 @@ You can also set other Spark properties which are not listed in the table. For a <td>Local repository for dependency loader</td> </tr> <tr> - <td>zeppelin.pyspark.python</td> + <td>PYSPARK_PYTHON</td> <td>python</td> - <td>Python command to run pyspark with</td> + <td>Python binary executable to use for PySpark in both driver and workers (default is <code>python</code>). + Property <code>spark.pyspark.python</code> take precedence if it is set</td> + </tr> + <tr> + <td>PYSPARK_DRIVER_PYTHON</td> + <td>python</td> + <td>Python binary executable to use for PySpark in driver only (default is <code>PYSPARK_PYTHON</code>). + Property <code>spark.pyspark.driver.python</code> take precedence if it is set</td> </tr> <tr> <td>zeppelin.spark.concurrentSQL</td> http://git-wip-us.apache.org/repos/asf/zeppelin/blob/bab98546/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java ---------------------------------------------------------------------- diff --git a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java index 0679fcc..b7dc67d 100644 --- a/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java +++ b/spark/src/main/java/org/apache/zeppelin/spark/PySparkInterpreter.java @@ -200,7 +200,16 @@ public class PySparkInterpreter extends Interpreter implements ExecuteResultHand gatewayServer.start(); // Run python shell - CommandLine cmd = CommandLine.parse(getProperty("zeppelin.pyspark.python")); + // Choose python in the order of + // PYSPARK_DRIVER_PYTHON > PYSPARK_PYTHON > zeppelin.pyspark.python + String pythonExec = getProperty("zeppelin.pyspark.python"); + if (System.getenv("PYSPARK_PYTHON") != null) { + pythonExec = System.getenv("PYSPARK_PYTHON"); + } + if (System.getenv("PYSPARK_DRIVER_PYTHON") != null) { + pythonExec = System.getenv("PYSPARK_DRIVER_PYTHON"); + } + CommandLine cmd = CommandLine.parse(pythonExec); cmd.addArgument(scriptPath, false); cmd.addArgument(Integer.toString(port), false); cmd.addArgument(Integer.toString(getSparkInterpreter().getSparkVersion().toNumber()), false);