This is an automated email from the ASF dual-hosted git repository. zjffdu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/zeppelin.git
The following commit(s) were added to refs/heads/master by this push: new 1a6bce6 ZEPPELIN-4437. Update python document 1a6bce6 is described below commit 1a6bce627abfcd3d6ce0100665f2b444ae2d1fcc Author: Jeff Zhang <zjf...@apache.org> AuthorDate: Fri Nov 8 17:22:43 2019 +0800 ZEPPELIN-4437. Update python document ### What is this PR for? This PR is to polish the python interpreter document. ### What type of PR is it? [Documentation] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-4437 ### How should this be tested? * CI pass ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjf...@apache.org> Closes #3538 from zjffdu/ZEPPELIN-4437 and squashes the following commits: 48163d089 [Jeff Zhang] ZEPPELIN-4437. Update python document --- .../img/docs-img/ipython_code_completion.png | Bin 0 -> 56915 bytes .../themes/zeppelin/img/docs-img/ipython_error.png | Bin 0 -> 57506 bytes .../zeppelin/img/docs-img/ipython_hvplot.png | Bin 0 -> 293938 bytes docs/interpreter/python.md | 462 +++++++++++++-------- .../src/main/resources/python/zeppelin_context.py | 12 +- 5 files changed, 296 insertions(+), 178 deletions(-) diff --git a/docs/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png b/docs/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png new file mode 100644 index 0000000..75a642f Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/ipython_error.png b/docs/assets/themes/zeppelin/img/docs-img/ipython_error.png new file mode 100644 index 0000000..8747969 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/ipython_error.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png b/docs/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png new file mode 100644 index 0000000..b5b6dfe Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png differ diff --git a/docs/interpreter/python.md b/docs/interpreter/python.md index 82280ac..6bb7f29 100644 --- a/docs/interpreter/python.md +++ b/docs/interpreter/python.md @@ -23,6 +23,34 @@ limitations under the License. <div id="toc"></div> +## Overview + +Zeppelin supports python language which is very popular in data analytics and machine learning. + +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%python</td> + <td>PythonInterpreter</td> + <td>Vanilla python interpreter, with least dependencies, only python environment installed is required</td> + </tr> + <tr> + <td>%python.ipython</td> + <td>IPythonInterpreter</td> + <td>Provide more fancy python runtime via IPython, almost the same experience like Jupyter. It requires more things, but is the recommended interpreter for using python in Zeppelin, see below</td> + </tr> + <tr> + <td>%python.sql</td> + <td>PythonInterpreterPandasSql</td> + <td>Provide sql capability to query data in Pandas DataFrame via <code>pandasql</code></td> + </tr> +</table> + + ## Configuration <table class="table-configuration"> <tr> @@ -33,8 +61,8 @@ limitations under the License. <tr> <td>zeppelin.python</td> <td>python</td> - <td>Path of the already installed Python binary (could be python2 or python3). - If python is not in your $PATH you can set the absolute directory (example : /usr/bin/python) + <td>Path of the installed Python binary (could be python2 or python3). + You should set this property explicitly if python is not in your <code>$PATH</code>(example: /usr/bin/python). </td> </tr> <tr> @@ -42,139 +70,35 @@ limitations under the License. <td>1000</td> <td>Max number of dataframe rows to display.</td> </tr> + <tr> + <td>zeppelin.python.useIPython</td> + <td>true</td> + <td>When this property is true, <code>%python</code> would be delegated to <code>%python.ipython</code> if IPython is available, otherwise + IPython is only used in <code>%python.ipython</code>. + </td> + </tr> </table> -## Enabling Python Interpreter - -In a notebook, to enable the **Python** interpreter, click on the **Gear** icon and select **Python** - -## Using the Python Interpreter - -In a paragraph, use **_%python_** to select the **Python** interpreter and then input all commands. - -The interpreter can only work if you already have python installed (the interpreter doesn't bring it own python binaries). - -To access the help, type **help()** - -## Python environments - -### Default -By default, PythonInterpreter will use python command defined in `zeppelin.python` property to run python process. -The interpreter can use all modules already installed (with pip, easy_install...) - -### Conda -[Conda](http://conda.pydata.org/) is an package management system and environment management system for python. -`%python.conda` interpreter lets you change between environments. - -#### Usage - -- get the Conda Infomation: - - ``` - %python.conda info - ``` - -- list the Conda environments: - - ``` - %python.conda env list - ``` - -- create a conda enviornment: - - ``` - %python.conda create --name [ENV NAME] - ``` - -- activate an environment (python interpreter will be restarted): - - ``` - %python.conda activate [ENV NAME] - ``` - -- deactivate - - ``` - %python.conda deactivate - ``` - -- get installed package list inside the current environment - - ``` - %python.conda list - ``` - -- install package - - ``` - %python.conda install [PACKAGE NAME] - ``` - -- uninstall package - - ``` - %python.conda uninstall [PACKAGE NAME] - ``` - -### Docker - -`%python.docker` interpreter allows PythonInterpreter creates python process in a specified docker container. -#### Usage +## Vanilla Python Interpreter (`%python`) -- activate an environment - - ``` - %python.docker activate [Repository] - %python.docker activate [Repository:Tag] - %python.docker activate [Image Id] - ``` - -- deactivate - - ``` - %python.docker deactivate - ``` - -<br/> -Here is an example - -``` -# activate latest tensorflow image as a python environment -%python.docker activate gcr.io/tensorflow/tensorflow:latest -``` - -## Using Zeppelin Dynamic Forms -You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/usage/dynamic_form/intro.html) inside your Python code. - -**Zeppelin Dynamic Form can only be used if py4j Python library is installed in your system. If not, you can install it with `pip install py4j`.** - -Example : - -```python -%python -### Input form -print (z.input("f1","defaultValue")) - -### Select form -print (z.select("f1",[("o1","1"),("o2","2")],"2")) - -### Checkbox form -print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["1"]))) -``` +The vanilla python interpreter provides basic python interpreter feature, only python installed is required. -## Matplotlib integration +### Matplotlib integration - The python interpreter can display matplotlib figures inline automatically using the `pyplot` module: +The vanilla python interpreter can display matplotlib figures inline automatically using the `matplotlib`: ```python %python + import matplotlib.pyplot as plt plt.plot([1, 2, 3]) ``` -This is the recommended method for using matplotlib from within a Zeppelin notebook. The output of this command will by default be converted to HTML by implicitly making use of the `%html` magic. Additional configuration can be achieved using the builtin `z.configure_mpl()` method. For example, + +The output of this command will by default be converted to HTML by implicitly making use of the `%html` magic. Additional configuration can be achieved using the builtin `z.configure_mpl()` method. For example, ```python + z.configure_mpl(width=400, height=300, fmt='svg') plt.plot([1, 2, 3]) ``` @@ -191,6 +115,7 @@ If you are unable to load the inline backend, use `z.show(plt)`: ```python %python + import matplotlib.pyplot as plt plt.figure() (.. ..) @@ -201,20 +126,88 @@ The `z.show()` function can take optional parameters to adapt graph dimensions ( ```python %python + z.show(plt, width='50px') z.show(plt, height='150px', fmt='svg') ``` <img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/pythonMatplotlib.png" /> + +## IPython Interpreter (`%python.ipython`) (recommended) + +IPython is more powerful than the vanilla python interpreter with extra functionality. You can use IPython with Python2 or Python3 which depends on which python you set in `zeppelin.python`. + +For non-anaconda environment + + **Prerequisites** + + - Jupyter `pip install jupyter` + - grpcio `pip install grpcio` + - protobuf `pip install protobuf` + +For anaconda environment (`zeppelin.python` points to the python under anaconda) + + **Prerequisites** + + - grpcio `pip install grpcio` + - protobuf `pip install protobuf` + +In addition to all the basic functions of the vanilla python interpreter, you can use all the IPython advanced features as you use it in Jupyter Notebook. + +e.g. + +### Use IPython magic + +``` +%python.ipython + +#python help +range? + +#timeit +%timeit range(100) +``` + +### Use matplotlib + +``` +%python.ipython + +%matplotlib inline +import matplotlib.pyplot as plt + +print("hello world") +data=[1,2,3,4] +plt.figure() +plt.plot(data) +``` + +### Colored text output + +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_error.png" /> + +### More types of visualization +e.g. IPython supports hvplot +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_hvplot.png" /> + +### Better code completion +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/ipython_code_completion.png" /> + + +By default, Zeppelin would use IPython in `%python` if IPython prerequisites are meet, otherwise it would use vanilla Python interpreter in `%python`. +If you don't want to use IPython via `%python`, then you can set `zeppelin.python.useIPython` as `false` in interpreter setting. + + ## Pandas integration Apache Zeppelin [Table Display System](../usage/display_system/basic.html#table) provides built-in data visualization capabilities. -Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, -same as with [Matplotlib integration](#matplotlib-integration). +Python interpreter leverages it to visualize Pandas DataFrames though similar `z.show()` API, same as with [Matplotlib integration](#matplotlib-integration). Example: ```python +%python + import pandas as pd rates = pd.read_csv("bank.csv", sep=";") z.show(rates) @@ -226,16 +219,18 @@ There is a convenience `%python.sql` interpreter that matches Apache Spark exper enables usage of SQL language to query [Pandas DataFrames](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.html) and visualization of results though built-in [Table Display System](../usage/display_system/basic.html#table). - **Pre-requests** + **Prerequisites** - Pandas `pip install pandas` - PandaSQL `pip install -U pandasql` -In case default binded interpreter is Python (first in the interpreter list, under the _Gear Icon_), you can just use it as `%sql` i.e +Here's one example: - first paragraph ```python +%python + import pandas as pd rates = pd.read_csv("bank.csv", sep=";") ``` @@ -243,88 +238,211 @@ rates = pd.read_csv("bank.csv", sep=";") - next paragraph ```sql -%sql +%python.sql + SELECT * FROM rates WHERE age < 40 ``` -Otherwise it can be referred to as `%python.sql` +## Using Zeppelin Dynamic Forms +You can leverage [Zeppelin Dynamic Form]({{BASE_PATH}}/usage/dynamic_form/intro.html) inside your Python code. -## IPython Support +Example : -IPython is more powerful than the default python interpreter with extra functionality. You can use IPython with Python2 or Python3 which depends on which python you set `zeppelin.python`. +```python +%python - **Pre-requests** - - - Jupyter `pip install jupyter` - - grpcio `pip install grpcio` - - protobuf `pip install protobuf` +### Input form +print(z.input("f1","defaultValue")) -If you already install anaconda, then you just need to install `grpcio` as Jupyter is already included in anaconda. For grpcio version >= 1.12.0 you'll also need to install protobuf separately. +### Select form +print(z.select("f2",[("o1","1"),("o2","2")],"o1")) -In addition to all basic functions of the python interpreter, you can use all the IPython advanced features as you use it in Jupyter Notebook. +### Checkbox form +print("".join(z.checkbox("f3", [("o1","1"), ("o2","2")],["o1"]))) +``` -e.g. +## ZeppelinContext API -Use IPython magic +Python interpreter create a variable `z` which represent `ZeppelinContext` for you. User can use it to do more fancy and complex things in Zeppelin. -``` -%python.ipython +<table class="table-configuration"> + <tr> + <th>API</th> + <th>Description</th> + </tr> + <tr> + <td>z.put(key, value)</td> + <td>Put object <code>value</code> with identifier <code>key</code> to distributed resource pool of Zeppelin, + so that it can be used by other interpreters</td> + </tr> + <tr> + <td>z.get(key)</td> + <td>Get object with identifier <code>key</code> from distributed resource pool of Zeppelin</td> + </tr> + <tr> + <td>z.remove(key)</td> + <td>Remove object with identifier <code>key</code> from distributed resource pool of Zeppelin</td> + </tr> + <tr> + <td>z.getAsDataFrame(key)</td> + <td>Get object with identifier <code>key</code> from distributed resource pool of Zeppelin and converted into pandas dataframe. + The object in the distributed resource pool must be table type, e.g. jdbc interpreter result. + </td> + </tr> + <tr> + <td>z.angular(name, noteId = None, paragraphId = None)</td> + <td>Get the angular object with identifier <code>name</code></td> + </tr> + <tr> + <td>z.angularBind(name, value, noteId = None, paragraphId = None)</td> + <td>Bind value to angular object with identifier <code>name</code></td> + </tr> + <tr> + <td>z.angularUnbind(name, noteId = None)</td> + <td>Unbind value from angular object with identifier <code>name</code></td> + </tr> + <tr> + <td>z.show(p)</td> + <td>Show python object <code>p</code> in Zeppelin, if it is pandas dataframe, it would be displayed in Zeppelin's table format, + others will be converted to string</td> + </tr> + <tr> + <td>z.textbox(name, defaultValue="")</td> + <td>Create dynamic form Textbox <code>name</code> with defaultValue</td> + </tr> + <tr> + <td>z.select(name, options, defaultValue="")</td> + <td>Create dynamic form Select <code>name</code> with options and defaultValue. options should be a list of Tuple(first element is key, + the second element is the displayed value) e.g. <code>z.select("f2",[("o1","1"),("o2","2")],"o1")</code></td> + </tr> + <tr> + <td>z.checkbox(name, options, defaultChecked=[])</td> + <td>Create dynamic form Checkbox `name` with options and defaultChecked. options should be a list of Tuple(first element is key, + the second element is the displayed value) e.g. <code>z.checkbox("f3", [("o1","1"), ("o2","2")],["o1"])</code></td> + </tr> + <tr> + <td>z.noteTextbox(name, defaultValue="")</td> + <td>Create note level dynamic form Textbox</td> + </tr> + <tr> + <td>z.noteSelect(name, options, defaultValue="")</td> + <td>Create note level dynamic form Select</td> + </tr> + <tr> + <td>z.noteCheckbox(name, options, defaultChecked=[])</td> + <td>Create note level dynamic form Checkbox</td> + </tr> + <tr> + <td>z.run(paragraphId)</td> + <td>Run paragraph</td> + </tr> + <tr> + <td>z.run(noteId, paragraphId)</td> + <td>Run paragraph</td> + </tr> + <tr> + <td>z.runNote(noteId)</td> + <td>Run the whole note</td> + </tr> +</table> -#python help -range? +## Python environments -#timeit -%timeit range(100) -``` +### Default +By default, PythonInterpreter will use python command defined in `zeppelin.python` property to run python process. +The interpreter can use all modules already installed (with pip, easy_install...) + +### Conda +[Conda](http://conda.pydata.org/) is an package management system and environment management system for python. +`%python.conda` interpreter lets you change between environments. -Use matplotlib +#### Usage -``` -%python.ipython +- get the Conda Information: + ``` + %python.conda info + ``` + +- list the Conda environments: -%matplotlib inline -import matplotlib.pyplot as plt + ``` + %python.conda env list + ``` -print("hello world") -data=[1,2,3,4] -plt.figure() -plt.plot(data) -``` +- create a conda enviornment: + + ``` + %python.conda create --name [ENV NAME] + ``` + +- activate an environment (python interpreter will be restarted): -We also make `ZeppelinContext` available in IPython Interpreter. You can use `ZeppelinContext` to create dynamic forms and display pandas DataFrame. + ``` + %python.conda activate [ENV NAME] + ``` -e.g. +- deactivate -Create dynamic form + ``` + %python.conda deactivate + ``` + +- get installed package list inside the current environment -``` -z.input(name='my_name', defaultValue='hello') -``` + ``` + %python.conda list + ``` + +- install package -Show pandas dataframe + ``` + %python.conda install [PACKAGE NAME] + ``` + +- uninstall package + + ``` + %python.conda uninstall [PACKAGE NAME] + ``` -``` -import pandas as pd -df = pd.DataFrame({'id':[1,2,3], 'name':['a','b','c']}) -z.show(df) +### Docker -``` +`%python.docker` interpreter allows PythonInterpreter creates python process in a specified docker container. -By default, we would use IPython in `%python.python` if IPython is available. Otherwise it would fall back to the original Python implementation. -If you don't want to use IPython, then you can set `zeppelin.python.useIPython` as `false` in interpreter setting. +#### Usage + +- activate an environment + + ``` + %python.docker activate [Repository] + %python.docker activate [Repository:Tag] + %python.docker activate [Image Id] + ``` + +- deactivate + + ``` + %python.docker deactivate + ``` + +<br/> +Here is an example + +``` +# activate latest tensorflow image as a python environment +%python.docker activate gcr.io/tensorflow/tensorflow:latest +``` ## Technical description For in-depth technical details on current implementation please refer to [python/README.md](https://github.com/apache/zeppelin/blob/master/python/README.md). -### Some features not yet implemented in the Python Interpreter +## Some features not yet implemented in the vanilla Python interpreter * Interrupt a paragraph execution (`cancel()` method) is currently only supported in Linux and MacOs. If interpreter runs in another operating system (for instance MS Windows) , interrupt a paragraph will close the whole interpreter. A JIRA ticket ([ZEPPELIN-893](https://issues.apache.org/jira/browse/ZEPPELIN-893)) is opened to implement this feature in a next release of the interpreter. * Progression bar in webUI (`getProgress()` method) is currently not implemented. -* Code-completion is currently not implemented. - diff --git a/python/src/main/resources/python/zeppelin_context.py b/python/src/main/resources/python/zeppelin_context.py index 0eb02db..b0cdadc 100644 --- a/python/src/main/resources/python/zeppelin_context.py +++ b/python/src/main/resources/python/zeppelin_context.py @@ -66,12 +66,12 @@ class PyZeppelinContext(object): print("fail to call getAsDataFrame as pandas is not installed") return pd.read_csv(StringIO(value), sep="\t") - def angular(self, key, noteId = None, paragraphId = None): - return self.z.angular(key, noteId, paragraphId) - def remove(self, key): self.z.remove(key) + def angular(self, key, noteId = None, paragraphId = None): + return self.z.angular(key, noteId, paragraphId) + def contains(self, key): return self.contains(key) @@ -120,11 +120,11 @@ class PyZeppelinContext(object): def runAll(self): return self.z.runAll() - def angular(self, key, noteId = None, paragraphId = None): + def angular(self, name, noteId = None, paragraphId = None): if noteId == None: - return self.z.angular(key, self.z.getInterpreterContext().getNoteId(), paragraphId) + return self.z.angular(name, self.z.getInterpreterContext().getNoteId(), paragraphId) else: - return self.z.angular(key, noteId, paragraphId) + return self.z.angular(name, noteId, paragraphId) def angularBind(self, name, value, noteId = None, paragraphId = None): if noteId == None: