[zeppelin] branch master updated: [ZEPPELIN-4839]. Update flink interpreter doc

zjffdu Sun, 21 Jun 2020 22:15:57 -0700

This is an automated email from the ASF dual-hosted git repository.

zjffdu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/zeppelin.git



The following commit(s) were added to refs/heads/master by this push:
     new b09d57e  [ZEPPELIN-4839]. Update flink interpreter doc
b09d57e is described below

commit b09d57e99f4cbdc9d1a1d71f14b53998424b82fc
Author: Jeff Zhang <zjf...@apache.org>
AuthorDate: Thu May 28 10:39:25 2020 +0800

    [ZEPPELIN-4839]. Update flink interpreter doc
    
    ### What is this PR for?
    
    This PR update the flink interpreter doc for the recently improvement in 
flink interpreter and add some screenshot.
    
    ### What type of PR is it?
    [Documentation]
    
    ### Todos
    * [ ] - Task
    
    ### What is the Jira issue?
    * https://issues.apache.org/jira/browse/ZEPPELIN-4839
    
    ### How should this be tested?
    * CI pass
    
    ### Screenshots (if appropriate)
    
    ### Questions:
    * Does the licenses files need update? No
    * Is there breaking changes for older versions? No
    * Does this needs documentation? No
    
    Author: Jeff Zhang <zjf...@apache.org>
    
    Closes #3779 from zjffdu/ZEPPELIN-4839 and squashes the following commits:
    
    ddc774078 [Jeff Zhang] [ZEPPELIN-4839]. Update flink interpreter doc
---
 .../zeppelin/img/docs-img/flink_append_mode.gif    | Bin 0 -> 294307 bytes
 .../zeppelin/img/docs-img/flink_single_mode.gif    | Bin 0 -> 58198 bytes
 .../zeppelin/img/docs-img/flink_update_mode.gif    | Bin 0 -> 131055 bytes
 .../zeppelin/img/docs-img/flink_z_batch_table.png  | Bin 0 -> 189710 bytes
 .../zeppelin/img/docs-img/flink_z_dataset.png      | Bin 0 -> 160627 bytes
 .../zeppelin/img/docs-img/flink_z_stream_table.gif | Bin 0 -> 226356 bytes
 docs/interpreter/flink.md                          | 174 ++++++++++++++++-----
 7 files changed, 137 insertions(+), 37 deletions(-)

diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif 
b/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif
new file mode 100644
index 0000000..3c827f4
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif 
b/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif
new file mode 100644
index 0000000..91b49ed
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif 
b/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif
new file mode 100644
index 0000000..fe7e2e9
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png
new file mode 100644
index 0000000..216675e
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png
new file mode 100644
index 0000000..052b74d
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png differ
diff --git a/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif
new file mode 100644
index 0000000..fb52e99
Binary files /dev/null and 
b/docs/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif differ
diff --git a/docs/interpreter/flink.md b/docs/interpreter/flink.md
index 40cf058..108d7b5 100644
--- a/docs/interpreter/flink.md
+++ b/docs/interpreter/flink.md
@@ -26,7 +26,7 @@ limitations under the License.
 ## Overview
 [Apache Flink](https://flink.apache.org) is an open source platform for 
distributed stream and batch data processing. Flink’s core is a streaming 
dataflow engine that provides data distribution, communication, and fault 
tolerance for distributed computations over data streams. Flink also builds 
batch processing on top of the streaming engine, overlaying native iteration 
support, managed memory, and program optimization.
 
-In Zeppelin 0.9, we refactor the Flink interpreter in Zeppelin to support the 
latest version of Flink. **Only Flink 1.10+ is supported, old version of flink 
may not work.**
+In Zeppelin 0.9, we refactor the Flink interpreter in Zeppelin to support the 
latest version of Flink. **Only Flink 1.10+ is supported, old version of flink 
won't work.**
 Apache Flink is supported in Zeppelin with Flink interpreter group which 
consists of below five interpreters.
 
 <table class="table-configuration">
@@ -65,11 +65,10 @@ Apache Flink is supported in Zeppelin with Flink 
interpreter group which consist
 ## Prerequisites
 
 * Download Flink 1.10 for scala 2.11 (Only scala-2.11 is supported, scala-2.12 
is not supported yet in Zeppelin)
-* Download 
[flink-hadoop-shaded](https://repo1.maven.org/maven2/org/apache/flink/flink-shaded-hadoop-2/2.8.3-10.0/flink-shaded-hadoop-2-2.8.3-10.0.jar)
 and put it under lib folder of flink (flink interpreter need that to support 
yarn mode)
 
 ## Configuration
 The Flink interpreter can be configured with properties provided by Zeppelin 
(as following table).
-You can also set other flink properties which are not listed in the table. For 
a list of additional properties, refer to [Flink Available 
Properties](https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html).
+You can also add and set other flink properties which are not listed in the 
table. For a list of additional properties, refer to [Flink Available 
Properties](https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html).
 <table class="table-configuration">
   <tr>
     <th>Property</th>
@@ -191,11 +190,7 @@ You can also set other flink properties which are not 
listed in the table. For a
     <td>true</td>
     <td>Whether display scala shell output in colorful format</td>
   </tr>
-  <tr>
-    <td>zeppelin.flink.enableHive</td>
-    <td>false</td>
-    <td>Whether enable hive</td>
-  </tr>
+
   <tr>
     <td>zeppelin.flink.enableHive</td>
     <td>false</td>
@@ -249,6 +244,16 @@ And will create 6 variables as pyflink (`%flink.pyflink` 
or `%flink.ipyflink`) e
 * `st_env_2`   (StreamTableEnvironment for flink planner) 
 * `bt_env_2`   (BatchTableEnvironment for flink planner)
 
+## Blink/Flink Planner
+
+There're 2 planners supported by Flink's table api: `flink` & `blink`.
+
+* If you want to use DataSet api, and convert it to flink table then please 
use flink planner (`btenv_2` and `stenv_2`).
+* In other cases, we would always recommend you to use `blink` planner. This 
is also what flink batch/streaming sql interpreter use (`%flink.bsql` & 
`%flink.ssql`)
+
+Check this 
[page](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/common.html#main-differences-between-the-two-planners)
 for the difference between flink planner and blink planner.
+
+
 ## Execution mode (Local/Remote/Yarn)
 
 Flink in Zeppelin supports 3 execution modes (`flink.execution.mode`):
@@ -274,15 +279,7 @@ In order to run flink in Yarn mode, you need to make the 
following settings:
 
 * Set `flink.execution.mode` to `yarn`
 * Set `HADOOP_CONF_DIR` in flink's interpreter setting.
-* Make sure `hadoop` command is your PATH. Because internally flink will call 
command `hadoop classpath` and load all the hadoop related jars in the flink 
interpreter process
-
-
-## Blink/Flink Planner
-
-There're 2 planners supported by Flink's table api: `flink` & `blink`.
-
-* If you want to use DataSet api, and convert it to flink table then please 
use flink planner (`btenv_2` and `stenv_2`).
-* In other cases, we would always recommend you to use `blink` planner. This 
is also what flink batch/streaming sql interpreter use (`%flink.bsql` & 
`%flink.ssql`)
+* Make sure `hadoop` command is on your PATH. Because internally flink will 
call command `hadoop classpath` and load all the hadoop related jars in the 
flink interpreter process
 
 
 ## How to use Hive
@@ -291,31 +288,63 @@ In order to use Hive in Flink, you have to make the 
following setting.
 
 * Set `zeppelin.flink.enableHive` to be true
 * Set `zeppelin.flink.hive.version` to be the hive version you are using.
-* Set `HIVE_CONF_DIR` to be the location where `hive-site.xml` is located. 
Make sure hive metastore is started and you have configure 
`hive.metastore.uris` in `hive-site.xml`
+* Set `HIVE_CONF_DIR` to be the location where `hive-site.xml` is located. 
Make sure hive metastore is started and you have configured 
`hive.metastore.uris` in `hive-site.xml`
 * Copy the following dependencies to the lib folder of flink installation. 
     * flink-connector-hive_2.11–1.10.0.jar
     * flink-hadoop-compatibility_2.11–1.10.0.jar
     * hive-exec-2.x.jar (for hive 1.x, you need to copy hive-exec-1.x.jar, 
hive-metastore-1.x.jar, libfb303–0.9.2.jar and libthrift-0.9.2.jar)
 
-After these settings, you will be able to query hive table via either table 
api `%flink` or batch sql `%flink.bsql`
 
 ## Flink Batch SQL
 
-`%flink.bsql` is used for flink's batch sql. You just type `help` to get all 
the available commands.
+`%flink.bsql` is used for flink's batch sql. You can type `help` to get all 
the available commands. 
+It supports all the flink sql, including DML/DDL/DQL.
 
 * Use `insert into` statement for batch ETL
-* Use `select` statement for exploratory data analytics 
+* Use `select` statement for batch data analytics 
 
 ## Flink Streaming SQL
 
-`%flink.ssql` is used for flink's streaming sql. You just type `help` to get 
all the available commands. Mainlly there're 2 cases:
+`%flink.ssql` is used for flink's streaming sql. You just type `help` to get 
all the available commands. 
+It supports all the flink sql, including DML/DDL/DQL.
 
-* Use `insert into` statement for streaming processing
+* Use `insert into` statement for streaming ETL
 * Use `select` statement for streaming data analytics
 
+## Streaming Data Visualization
+
+Zeppelin supports 3 types of streaming data analytics:
+* Single
+* Update
+* Append
+
+### type=single
+Single mode is for the case when the result of sql statement is always one 
row, such as the following example. The output format is HTML, 
+and you can specify paragraph local property `template` for the final output 
content template. 
+And you can use `{i}` as placeholder for the ith column of result.
+
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_single_mode.gif)
+ </center>
+
+### type=update
+Update mode is suitable for the case when the output is more than one rows, 
and always will be updated continuously. 
+Here’s one example where we use group by.
+
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_update_mode.gif)
+ </center>
+
+### type=append
+Append mode is suitable for the scenario where output data is always appended. 
E.g. the following example which use tumble window.
+
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_append_mode.gif)
+ </center>
+ 
 ## Flink UDF
 
-You can use Flink scala UDF or Python UDF in sql. UDF for batch and streaming 
sql is the same. Here's 2 examples.
+You can use Flink scala UDF or Python UDF in sql. UDF for batch and streaming 
sql is the same. Here're 2 examples.
 
 * Scala UDF
 
@@ -343,29 +372,100 @@ bt_env.register_function("python_upper", 
udf(PythonUpper(), DataTypes.STRING(),
 
 ```
 
-Besides defining udf in Zeppelin, you can also load udfs in jars via 
`flink.udf.jars`. For example, you can create
-udfs in intellij and then build these udfs in one jar. After that you can 
specify `flink.udf.jars` to this jar, and flink
+Zeppelin only supports scala and python for flink interpreter, if you want to 
write java udf or the udf is pretty complicated which make it not suitable to 
write in Zeppelin,
+then you can write the udf in IDE and build a udf jar.
+In Zeppelin you just need to specify `flink.udf.jars` to this jar, and flink
 interpreter will detect all the udfs in this jar and register all the udfs to 
TableEnvironment, the udf name is the class name.
 
-## ZeppelinContext
-Zeppelin automatically injects `ZeppelinContext` as variable `z` in your 
Scala/Python environment. `ZeppelinContext` provides some additional functions 
and utilities.
-See [Zeppelin-Context](../usage/other_features/zeppelin_context.html) for more 
details.
+## PyFlink(%flink.pyflink)
+In order to use PyFlink in Zeppelin, you just need to do the following 
configuration.
+* Install apache-flink (e.g. pip install apache-flink)
+* Set `zeppelin.pyflink.python` to the python executable where apache-flink is 
installed in case you have multiple python installed.
+* Copy flink-python_2.11–1.10.0.jar from flink opt folder to flink lib folder
+
+And PyFlink will create 6 variables for you:
+
+* `s_env`    (StreamExecutionEnvironment), 
+* `b_env`     (ExecutionEnvironment)
+* `st_env`   (StreamTableEnvironment for blink planner) 
+* `bt_env`   (BatchTableEnvironment for blink planner)
+* `st_env_2`   (StreamTableEnvironment for flink planner) 
+* `bt_env_2`   (BatchTableEnvironment for flink planner)
 
-## IPython Support
+### IPython Support(%flink.ipyflink)
 
 By default, zeppelin would use IPython in `%flink.pyflink` when IPython is 
available, Otherwise it would fall back to the original python implementation.
 For the IPython features, you can refer doc[Python Interpreter](python.html)
 
-## Tutorial Notes
+## ZeppelinContext
+Zeppelin automatically injects `ZeppelinContext` as variable `z` in your 
Scala/Python environment. `ZeppelinContext` provides some additional functions 
and utilities.
+See [Zeppelin-Context](../usage/other_features/zeppelin_context.html) for more 
details. You can use `z` to display both flink DataSet and batch/stream table.
+
+* Display DataSet
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_dataset.png)
+ </center>
+ 
+* Display Batch Table
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_batch_table.png)
+ </center>
+* Display Stream Table
+ <center>
+   ![Interactive 
Help]({{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/flink_z_stream_table.gif)
+ </center>
+
+## Paragraph local properties
+
+In the section of `Streaming Data Visualization`, we demonstrate the different 
visualization type via paragraph local properties: `type`. 
+In this section, we will list and explain all the supported local properties 
in flink interpreter.
 
-Zeppelin is shipped with several Flink tutorial notes which may be helpful for 
you. Except the first one, the below 4 notes cover the 4 main scenarios of 
flink.
+<table class="table-configuration">
+  <tr>
+    <th>Property</th>
+    <th>Default</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>type</td>
+    <td></td>
+    <td>Used in %flink.ssql to specify the streaming visualization type 
(single, update, append)</td>
+  </tr>
+  <tr>
+    <td>refreshInterval</td>
+    <td>3000</td>
+    <td>Used in `%flink.ssql` to specify frontend refresh interval for 
streaming data visualization.</td>
+  </tr>
+  <tr>
+    <td>template</td>
+    <td>{0}</td>
+    <td>Used in `%flink.ssql` to specify html template for `single` type of 
streaming data visualization, And you can use `{i}` as placeholder for the 
{i}th column of the result.</td>
+  </tr>
+  <tr>
+    <td>parallelism</td>
+    <td></td>
+    <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job 
parallelism</td>
+  </tr>
+  <tr>
+    <td>maxParallelism</td>
+    <td></td>
+    <td>Used in %flink.ssql & %flink.bsql to specify the flink sql job max 
parallelism in case you want to change parallelism later. For more details, 
refer this 
[link](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/parallel.html#setting-the-maximum-parallelism)
 </td>
+  </tr>
+  <tr>
+    <td>savepointDir</td>
+    <td></td>
+    <td>If you specify it, then when you cancel your flink job in Zeppelin, it 
would also do savepoint and store state in this directory. And when you resume 
your job, it would resume from this savepoint.</td>
+  </tr>
+  <tr>
+    <td>runAsOne</td>
+    <td>false</td>
+    <td>All the insert into sql will run in a single flink job if this is 
true.</td>
+  </tr>
+</table>
 
-* Flink Basic
-* Batch ETL
-* Exploratory Data Analytics
-* Streaming ETL
-* Streaming Data Analytics
+## Tutorial Notes
 
+Zeppelin is shipped with several Flink tutorial notes which may be helpful for 
you. You check more features in the tutorial notes.

[zeppelin] branch master updated: [ZEPPELIN-4839]. Update flink interpreter doc

Reply via email to