Modified: zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json?rev=1869172&r1=1869171&r2=1869172&view=diff
==============================================================================
--- zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json (original)
+++ zeppelin/site/docs/0.9.0-SNAPSHOT/search_data.json Wed Oct 30 13:45:26 2019
@@ -3,7 +3,7 @@
 
     "/interpreter/livy.html": {
       "title": "Livy Interpreter for Apache Zeppelin",
-      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Livy Interpreter for Apache 
ZeppelinOverviewLivy is an open source REST interface for interacting with 
Spark from anywhere. It supports executing snippets of code or programs in a 
Spark context that runs locally or in YARN.Interactive Scala, Python and R 
shellsBatch submissions in Scala, Java, PythonMulti users can share the same 
server (impersonation support)Can be used for submitting jobs from anywhere 
with RESTDoes not require a
 ny code change to your programsRequirementsAdditional requirements for the 
Livy interpreter are:Spark 1.3 or above.Livy server.ConfigurationWe added some 
common configurations for spark, and you can set any configuration you want.You 
can find all Spark configurations in here.And instead of starting property with 
spark. it should be replaced with livy.spark..Example: spark.driver.memory to 
livy.spark.driver.memory      Property    Default    Description        
zeppelin.livy.url    http://localhost:8998    URL where livy server is running  
      zeppelin.livy.spark.sql.maxResult    1000    Max number of Spark SQL 
result to display.        zeppelin.livy.spark.sql.field.truncate    true    
Whether to truncate field values longer than 20 characters or not        
zeppelin.livy.session.create_timeout    120    Timeout in seconds for session 
creation        zeppelin.livy.displayAppInfo    true    Whether to display app 
info        zeppelin.livy.pull_status.interval.millis    1000    The int
 erval for checking paragraph execution status        livy.spark.driver.cores   
     Driver cores. ex) 1, 2.          livy.spark.driver.memory        Driver 
memory. ex) 512m, 32g.          livy.spark.executor.instances        Executor 
instances. ex) 1, 4.          livy.spark.executor.cores        Num cores per 
executor. ex) 1, 4.        livy.spark.executor.memory        Executor memory 
per worker instance. ex) 512m, 32g.        livy.spark.dynamicAllocation.enabled 
       Use dynamic resource allocation. ex) True, False.        
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout        Remove an 
executor which has cached data blocks.        
livy.spark.dynamicAllocation.minExecutors        Lower bound for the number of 
executors.        livy.spark.dynamicAllocation.initialExecutors        Initial 
number of executors to run.        livy.spark.dynamicAllocation.maxExecutors    
    Upper bound for the number of executors.            
livy.spark.jars.packages            Adding extra libr
 aries to livy interpreter          zeppelin.livy.ssl.trustStore        client 
trustStore file. Used when livy ssl is enabled        
zeppelin.livy.ssl.trustStorePassword        password for trustStore file. Used 
when livy ssl is enabled        zeppelin.livy.http.headers    key_1: value_1; 
key_2: value_2    custom http headers when calling livy rest api. Each http 
header is separated by `;`, and each header is one key value pair where key 
value is separated by `:`  We remove livy.spark.master in zeppelin-0.7. Because 
we sugguest user to use livy 0.3 in zeppelin-0.7. And livy 0.3 don't 
allow to specify livy.spark.master, it enfornce yarn-cluster mode.Adding 
External librariesYou can load dynamic library to livy interpreter by set 
livy.spark.jars.packages property to comma-separated list of maven coordinates 
of jars to include on the driver and executor classpaths. The format for the 
coordinates should be groupId:artifactId:version.Example      Property    
Example    Description
           livy.spark.jars.packages      io.spray:spray-json_2.10:1.3.1      
Adding extra libraries to livy interpreter      How to useBasically, you can 
usespark%livy.sparksc.versionpyspark%livy.pysparkprint 
"1"sparkR%livy.sparkrhello <- function( name ) {    
sprintf( "Hello, %s", name 
);}hello("livy")ImpersonationWhen Zeppelin server is running 
with authentication enabled,then this interpreter utilizes Livy’s user 
impersonation featurei.e. sends extra parameter for creating and running a 
session ("proxyUser": 
"${loggedInUser}").This is particularly useful when multi 
users are sharing a Notebook server.Apply Zeppelin Dynamic FormsYou can 
leverage Zeppelin Dynamic Form. Form templates is only avalible for livy sql 
interpreter.%livy.sqlselect * from products where ${product_id=1}And creating 
dynamic formst programmatically is not feasible in livy interpreter, because 
ZeppelinContext i
 s not available in livy interpreter.Shared SparkContextStarting from livy 0.5 
which is supported by Zeppelin 0.8.0, SparkContext is shared between scala, 
python, r and sql.That means you can query the table via %livy.sql when this 
table is registered in %livy.spark, %livy.pyspark, $livy.sparkr.FAQLivy 
debugging: If you see any of these in error consoleConnect to livyhost:8998 
[livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] failed: Connection refusedLooks 
like the livy server is not up yet or the config is wrongException: Session not 
found, Livy server would have restarted, or lost session.The session would have 
timed out, you may need to restart the interpreter.Blacklisted configuration 
values in session config: spark.masterEdit conf/spark-blacklist.conf file in 
livy server and comment out #spark.master line.If you choose to work on livy in 
apps/spark/java directory in https://github.com/cloudera/hue,copy 
spark-user-configurable-options.template to spark-user-configurable-options.con
 f file in livy server and comment out #spark.master.",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Livy Interpreter for Apache 
ZeppelinOverviewLivy is an open source REST interface for interacting with 
Spark from anywhere. It supports executing snippets of code or programs in a 
Spark context that runs locally or in YARN.Interactive Scala, Python and R 
shellsBatch submissions in Scala, Java, PythonMulti users can share the same 
server (impersonation support)Can be used for submitting jobs from anywhere 
with RESTDoes not require a
 ny code change to your programsRequirementsAdditional requirements for the 
Livy interpreter are:Spark 1.3 or above.Livy server.ConfigurationWe added some 
common configurations for spark, and you can set any configuration you want.You 
can find all Spark configurations in here.And instead of starting property with 
spark. it should be replaced with livy.spark..Example: spark.driver.memory to 
livy.spark.driver.memory      Property    Default    Description        
zeppelin.livy.url    http://localhost:8998    URL where livy server is running  
      zeppelin.livy.spark.sql.maxResult    1000    Max number of Spark SQL 
result to display.        zeppelin.livy.spark.sql.field.truncate    true    
Whether to truncate field values longer than 20 characters or not        
zeppelin.livy.session.create_timeout    120    Timeout in seconds for session 
creation        zeppelin.livy.displayAppInfo    true    Whether to display app 
info        zeppelin.livy.pull_status.interval.millis    1000    The int
 erval for checking paragraph execution status        livy.spark.driver.cores   
     Driver cores. ex) 1, 2.          livy.spark.driver.memory        Driver 
memory. ex) 512m, 32g.          livy.spark.executor.instances        Executor 
instances. ex) 1, 4.          livy.spark.executor.cores        Num cores per 
executor. ex) 1, 4.        livy.spark.executor.memory        Executor memory 
per worker instance. ex) 512m, 32g.        livy.spark.dynamicAllocation.enabled 
       Use dynamic resource allocation. ex) True, False.        
livy.spark.dynamicAllocation.cachedExecutorIdleTimeout        Remove an 
executor which has cached data blocks.        
livy.spark.dynamicAllocation.minExecutors        Lower bound for the number of 
executors.        livy.spark.dynamicAllocation.initialExecutors        Initial 
number of executors to run.        livy.spark.dynamicAllocation.maxExecutors    
    Upper bound for the number of executors.            
livy.spark.jars.packages            Adding extra libr
 aries to livy interpreter          zeppelin.livy.ssl.trustStore        client 
trustStore file. Used when livy ssl is enabled        
zeppelin.livy.ssl.trustStorePassword        password for trustStore file. Used 
when livy ssl is enabled        zeppelin.livy.ssl.trustStoreType    JKS    type 
of truststore. Either JKS or PKCS12.        zeppelin.livy.ssl.keyStore        
client keyStore file. Needed if Livy requires two way SSL authentication.       
 zeppelin.livy.ssl.keyStorePassword        password for keyStore file.        
zeppelin.livy.ssl.keyStoreType    JKS    type of keystore. Either JKS or 
PKCS12.        zeppelin.livy.ssl.keyPassword        password for key in the 
keyStore file. Defaults to zeppelin.livy.ssl.keyStorePassword.               
zeppelin.livy.http.headers    key_1: value_1; key_2: value_2    custom http 
headers when calling livy rest api. Each http header is separated by `;`, and 
each header is one key value pair where key value is separated by `:`  We 
remove livy.spar
 k.master in zeppelin-0.7. Because we sugguest user to use livy 0.3 in 
zeppelin-0.7. And livy 0.3 don't allow to specify livy.spark.master, it 
enfornce yarn-cluster mode.Adding External librariesYou can load dynamic 
library to livy interpreter by set livy.spark.jars.packages property to 
comma-separated list of maven coordinates of jars to include on the driver and 
executor classpaths. The format for the coordinates should be 
groupId:artifactId:version.Example      Property    Example    Description      
    livy.spark.jars.packages      io.spray:spray-json_2.10:1.3.1      Adding 
extra libraries to livy interpreter      How to useBasically, you can 
usespark%livy.sparksc.versionpyspark%livy.pysparkprint 
"1"sparkR%livy.sparkrhello <- function( name ) {    
sprintf( "Hello, %s", name 
);}hello("livy")ImpersonationWhen Zeppelin server is running 
with authentication enabled,then this interpreter utilizes Livy’s user im
 personation featurei.e. sends extra parameter for creating and running a 
session ("proxyUser": 
"${loggedInUser}").This is particularly useful when multi 
users are sharing a Notebook server.Apply Zeppelin Dynamic FormsYou can 
leverage Zeppelin Dynamic Form. Form templates is only avalible for livy sql 
interpreter.%livy.sqlselect * from products where ${product_id=1}And creating 
dynamic formst programmatically is not feasible in livy interpreter, because 
ZeppelinContext is not available in livy interpreter.Shared 
SparkContextStarting from livy 0.5 which is supported by Zeppelin 0.8.0, 
SparkContext is shared between scala, python, r and sql.That means you can 
query the table via %livy.sql when this table is registered in %livy.spark, 
%livy.pyspark, $livy.sparkr.FAQLivy debugging: If you see any of these in error 
consoleConnect to livyhost:8998 [livyhost/127.0.0.1, livyhost/0:0:0:0:0:0:0:1] 
failed: Connection refusedLooks like the livy server is not u
 p yet or the config is wrongException: Session not found, Livy server would 
have restarted, or lost session.The session would have timed out, you may need 
to restart the interpreter.Blacklisted configuration values in session config: 
spark.masterEdit conf/spark-blacklist.conf file in livy server and comment out 
#spark.master line.If you choose to work on livy in apps/spark/java directory 
in https://github.com/cloudera/hue,copy 
spark-user-configurable-options.template to 
spark-user-configurable-options.conf file in livy server and comment out 
#spark.master.",
       "url": " /interpreter/livy.html",
       "group": "interpreter",
       "excerpt": "Livy is an open source REST interface for interacting with 
Spark from anywhere. It supports executing snippets of code or programs in a 
Spark context that runs locally or in YARN."
@@ -25,7 +25,7 @@
 
     "/interpreter/markdown.html": {
       "title": "Markdown Interpreter for Apache Zeppelin",
-      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Markdown Interpreter for 
Apache ZeppelinOverviewMarkdown is a plain text formatting syntax designed so 
that it can be converted to HTML.Apache Zeppelin uses pegdown and markdown4j as 
markdown parsers.In Zeppelin notebook, you can use %md in the beginning of a 
paragraph to invoke the Markdown interpreter and generate static html from 
Markdown plain text.In Zeppelin, Markdown interpreter is enabled by default and 
uses the pegdown par
 ser.ExampleThe following example demonstrates the basic usage of Markdown in a 
Zeppelin notebook.Mathematical expressionMarkdown interpreter leverages %html 
display system internally. That means you can mix mathematical expressions with 
markdown syntax. For more information, please see Mathematical Expression 
section.Configuration      Name    Default Value    Description        
markdown.parser.type    pegdown    Markdown Parser Type.  Available values: 
pegdown, markdown4j.  Pegdown Parserpegdown parser provides github flavored 
markdown.pegdown parser provides YUML and Websequence plugins also. Markdown4j 
ParserSince pegdown parser is more accurate and provides much more markdown 
syntax markdown4j option might be removed later. But keep this parser for the 
backward compatibility.",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Markdown Interpreter for 
Apache ZeppelinOverviewMarkdown is a plain text formatting syntax designed so 
that it can be converted to HTML.Apache Zeppelin uses flexmark, pegdown and 
markdown4j as markdown parsers.In Zeppelin notebook, you can use %md in the 
beginning of a paragraph to invoke the Markdown interpreter and generate static 
html from Markdown plain text.In Zeppelin, Markdown interpreter is enabled by 
default and uses the p
 egdown parser.ExampleThe following example demonstrates the basic usage of 
Markdown in a Zeppelin notebook.Mathematical expressionMarkdown interpreter 
leverages %html display system internally. That means you can mix mathematical 
expressions with markdown syntax. For more information, please see Mathematical 
Expression section.Configuration      Name    Default Value    Description      
  markdown.parser.type    flexmark    Markdown Parser Type.  Available values: 
flexmark, pegdown, markdown4j.  Flexmark parser (Default Markdown 
Parser)CommonMark/Markdown Java parser with source level AST.flexmark parser 
provides YUML and Websequence extensions also.Pegdown Parserpegdown parser 
provides github flavored markdown. Although still one of the most popular 
Markdown parsing libraries for the JVM, pegdown has reached its end of life.The 
project is essentially unmaintained with tickets piling up and crucial bugs not 
being fixed.pegdown's parsing performance isn't great. But k
 eep this parser for the backward compatibility.Markdown4j ParserSince pegdown 
parser is more accurate and provides much more markdown syntax markdown4j 
option might be removed later. But keep this parser for the backward 
compatibility.",
       "url": " /interpreter/markdown.html",
       "group": "interpreter",
       "excerpt": "Markdown is a plain text formatting syntax designed so that 
it can be converted to HTML. Apache Zeppelin uses markdown4j."
@@ -34,6 +34,17 @@
     
   
 
+    "/interpreter/submarine.html": {
+      "title": "Apache Hadoop Submarine Interpreter for Apache Zeppelin",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Submarine Interpreter for 
Apache ZeppelinHadoop Submarine  is the latest machine learning framework 
subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, 
MXNet, Caffe, Spark, etc. A variety of deep learning frameworks provide a 
full-featured system framework for machine learning algorithm development, 
distributed model training, model management, and model publishing, combined 
with hadoop's intrinsic
  data storage and data processing capabilities to enable data scientists to 
Good mining and the value of the data.A deep learning algorithm project 
requires data acquisition, data processing, data cleaning, interactive visual 
programming adjustment parameters, algorithm testing, algorithm publishing, 
algorithm job scheduling, offline model training, model online services and 
many other processes and processes. Zeppelin is a web-based notebook that 
supports interactive data analysis. You can use SQL, Scala, Python, etc. to 
make data-driven, interactive, collaborative documents.You can use the more 
than 20 interpreters in zeppelin (for example: spark, hive, Cassandra, 
Elasticsearch, Kylin, HBase, etc.) to collect data, clean data, feature 
extraction, etc. in the data in Hadoop before completing the machine learning 
model training. The data preprocessing process.By integrating submarine in 
zeppelin, we use zeppelin's data discovery, data analysis and data 
visualization and coll
 aboration capabilities to visualize the results of algorithm development and 
parameter adjustment during machine learning model training.ArchitectureAs 
shown in the figure above, how the Submarine develops and models the machine 
learning algorithms through Zeppelin is explained from the system 
architecture.After installing and deploying Hadoop 3.1+ and Zeppelin, submarine 
will create a fully separate Zeppelin Submarine interpreter Docker container 
for each user in YARN. This container contains the development and runtime 
environment for Tensorflow. Zeppelin Server connects to the Zeppelin Submarine 
interpreter Docker container in YARN. allows algorithmic engineers to perform 
algorithm development and data visualization in Tensorflow's 
stand-alone environment in Zeppelin Notebook.After the algorithm is developed, 
the algorithm engineer can submit the algorithm directly to the YARN in offline 
transfer training in Zeppelin, real-time demonstration of model training with 
Submari
 ne's TensorBoard for each algorithm engineer.You can not only complete 
the model training of the algorithm, but you can also use the more than twenty 
interpreters in Zeppelin. Complete the data preprocessing of the model, For 
example, you can perform data extraction, filtering, and feature extraction 
through the Spark interpreter in Zeppelin in the Algorithm Note.In the future, 
you can also use Zeppelin's upcoming Workflow workflow orchestration 
service. You can complete Spark, Hive data processing and Tensorflow model 
training in one Note. It is organized into a workflow through visualization, 
etc., and the scheduling of jobs is performed in the production 
environment.OverviewAs shown in the figure above, from the internal 
implementation, how Submarine combines Zeppelin's machine learning 
algorithm development and model training.The algorithm engineer created a 
Tensorflow notebook (left image) in Zeppelin by using Submarine interpreter.It 
is important to not
 e that you need to complete the development of the entire algorithm in a 
Note.You can use Spark for data preprocessing in some of the paragraphs in 
Note.Use Python for algorithm development and debugging of Tensorflow in other 
paragraphs of notebook, Submarine creates a Zeppelin Submarine Interpreter 
Docker Container for you in YARN, which contains the following features and 
services:Shell Command line tool:Allows you to view the system environment in 
the Zeppelin Submarine Interpreter Docker Container, Install the extension 
tools you need or the Python dependencies.Kerberos lib:Allows you to perform 
kerberos authentication and access to Hadoop clusters with Kerberos 
authentication enabled.Tensorflow environment:Allows you to develop 
tensorflow algorithm code.Python environment:Allows you to develop tensorflow 
code.Complete a complete algorithm development with a Note in Zeppelin. If this 
algorithm contains multiple modules, You can write different algorithm modu
 les in multiple paragraphs in Note. The title of each paragraph is the name of 
the algorithm module. The content of the paragraph is the code content of this 
algorithm module.HDFS Client:Zeppelin Submarine Interpreter will 
automatically submit the algorithm code you wrote in Note to HDFS.Submarine 
interpreter Docker Image It is Submarine that provides you with an image file 
that supports Tensorflow (CPU and GPU versions).And installed the algorithm 
library commonly used by Python.You can also install other development 
dependencies you need on top of the base image provided by Submarine.When you 
complete the development of the algorithm module, You can do this by creating a 
new paragraph in Note and typing %submarine dashboard. Zeppelin will create a 
Submarine Dashboard. The machine learning algorithm written in this Note can be 
submitted to YARN as a JOB by selecting the JOB RUN command option in the 
Control Panel. Create a Tensorflow Model Training Docker Container, The contai
 ner contains the following sections:Tensorflow environmentHDFS Client Will 
automatically download the algorithm file Mount from HDFS into the container 
for distributed model training. Mount the algorithm file to the Work Dir path 
of the container.Submarine Tensorflow Docker Image There is Submarine that 
provides you with an image file that supports Tensorflow (CPU and GPU 
versions). And installed the algorithm library commonly used by Python. You can 
also install other development dependencies you need on top of the base image 
provided by Submarine.      Name    Class    Description        %submarine    
SubmarineInterpreter    Provides interpreter for Apache Submarine dashboard     
   %submarine.sh    SubmarineShellInterpreter    Provides interpreter for 
Apache Submarine shell        %submarine.python    PySubmarineInterpreter    
Provides interpreter for Apache Submarine python  Submarine shellAfter creating 
a Note with Submarine Interpreter in Zeppelin, You can add a paragraph to N
 ote if you need it. Using the %submarine.sh identifier, you can use the Shell 
command to perform various operations on the Submarine Interpreter Docker 
Container, such as:View the Pythone version in the ContainerView the system 
environment of the ContainerInstall the dependencies you need yourselfKerberos 
certification with kinitUse Hadoop in Container for HDFS operations, 
etc.Submarine pythonYou can add one or more paragraphs to Note. Write the 
algorithm module for Tensorflow in Python using the %submarine.python 
identifier.Submarine DashboardAfter writing the Tensorflow algorithm by using 
%submarine.python, You can add a paragraph to Note. Enter the %submarine 
dashboard and execute it. Zeppelin will create a Submarine Dashboard.With 
Submarine Dashboard you can do all the operational control of Submarine, for 
example:Usage:Display Submarine's command description to help 
developers locate problems.Refresh:Zeppelin will erase all your input in the 
Dashboard.Tensorbo
 ard:You will be redirected to the Tensorboard WEB system created by 
Submarine for each user. With Tensorboard you can view the real-time status of 
the Tensorflow model training in real time.CommandJOB RUN:Selecting JOB RUN 
will display the parameter input interface for submitting JOB.      Name    
Description        Checkpoint Path/td>    Submarine sets up a separate 
Checkpoint path for each user's Note for Tensorflow training. Saved the 
training data for this Note history, Used to train the output of model data, 
Tensorboard uses the data in this path for model presentation. Users cannot 
modify it. For example: `hdfs://cluster1/...` , The environment variable name 
for Checkpoint Path is `%checkpoint_path%`, You can use `%checkpoint_path%` 
instead of the input value in Data Path in `PS Launch Cmd` and `Worker Launch 
Cmd`.        Input Path    The user specifies the data data directory of the 
Tensorflow algorithm. Only HDFS-enabled directories are supported. The envir
 onment variable name for Data Path is `%input_path%`, You can use 
`%input_path%` instead of the input value in Data Path in `PS Launch Cmd` and 
`Worker Launch Cmd`.        PS Launch Cmd    Tensorflow Parameter services 
launch command,例如:`python cifar10_main.py --data-dir=%input_path% 
--job-dir=%checkpoint_path% --num-gpus=0 ...`        Worker Launch Cmd    
Tensorflow Worker services launch command,例如:`python cifar10_main.py 
--data-dir=%input_path% --job-dir=%checkpoint_path% --num-gpus=1 ...`  JOB 
STOPYou can choose to execute the JOB STOP command. Stop a Tensorflow model 
training task that has been submitted and is runningTENSORBOARD STARTYou can 
choose to execute the TENSORBOARD START command to create your TENSORBOARD 
Docker Container.TENSORBOARD STOPYou can choose to execute the TENSORBOARD STOP 
command to stop and destroy your TENSORBOARD Docker Container.Run 
Command:Execute the action command of your choiceClean Chechkpoint:Che
 cking this option will clear the data in this Note's Checkpoint Path 
before each JOB RUN execution.ConfigurationZeppelin Submarine interpreter 
provides the following properties to customize the Submarine interpreter      
Attribute name    Attribute value    Description        
DOCKER_CONTAINER_TIME_ZONE    Etc/UTC    Set the time zone in the container     
                      |        DOCKER_HADOOP_HDFS_HOME    /hadoop-3.1-0    
Hadoop path in the following 3 
images(SUBMARINE_INTERPRETER_DOCKER_IMAGE、tf.parameter.services.docker.image、tf.worker.services.docker.image)
 |        DOCKER_JAVA_HOME    /opt/java    JAVA path in the following 3 
images(SUBMARINE_INTERPRETER_DOCKER_IMAGE、tf.parameter.services.docker.image、tf.worker.services.docker.image)
 |        HADOOP_YARN_SUBMARINE_JAR        Path to the Submarine JAR package in 
the Hadoop-3.1+ release installed on the Zeppelin server |        
INTERPRETER_LAUNCH_MODE    local/yarn    Run the S
 ubmarine interpreter instance in local or YARN local mainly for submarine 
interpreter development and debugging YARN mode for production environment |    
    SUBMARINE_HADOOP_CONF_DIR        Set the HADOOP-CONF path to support 
multiple Hadoop cluster environments        SUBMARINE_HADOOP_HOME        
Hadoop-3.1+ above path installed on the Zeppelin server        
SUBMARINE_HADOOP_KEYTAB        Keytab file path for a hadoop cluster with 
kerberos authentication turned on        SUBMARINE_HADOOP_PRINCIPAL        
PRINCIPAL information for the keytab file of the hadoop cluster with kerberos 
authentication turned on        SUBMARINE_INTERPRETER_DOCKER_IMAGE        At 
INTERPRETER_LAUNCH_MODE=yarn, Submarine uses this image to create a Zeppelin 
Submarine interpreter container to create an algorithm development environment 
for the user. |        docker.container.network        YARN's Docker 
network name        machinelearing.distributed.enable        Whether to use the 
model training of the
  distributed mode JOB RUN submission        shell.command.timeout.millisecs    
60000    Execute timeout settings for shell commands in the Submarine 
interpreter container        submarine.algorithm.hdfs.path        Save 
machine-based algorithms developed using Submarine interpreter to HDFS as files 
       submarine.yarn.queue    root.default    Submarine submits model training 
YARN queue name        tf.checkpoint.path        Tensorflow checkpoint path, 
Each user will create a user's checkpoint secondary path using the username 
under this path. Each algorithm submitted by the user will create a checkpoint 
three-level path using the note id (the user's Tensorboard uses the 
checkpoint data in this path for visual display)        
tf.parameter.services.cpu        Number of CPU cores applied to Tensorflow 
parameter services when Submarine submits model distributed training        
tf.parameter.services.docker.image        Submarine creates a mirror for 
Tensorflow parameter services
  when submitting model distributed training        tf.parameter.services.gpu   
     GPU cores applied to Tensorflow parameter services when Submarine submits 
model distributed training        tf.parameter.services.memory    2G    Memory 
resources requested by Tensorflow parameter services when Submarine submits 
model distributed training        tf.parameter.services.num        Number of 
Tensorflow parameter services used by Submarine to submit model distributed 
training        tf.tensorboard.enable    true    Create a separate Tensorboard 
for each user        tf.worker.services.cpu        Submarine submits model 
resources for Tensorflow worker services when submitting model training        
tf.worker.services.docker.image        Submarine creates a mirror for 
Tensorflow worker services when submitting model distributed training        
tf.worker.services.gpu        Submarine submits GPU resources for Tensorflow 
worker services when submitting model training        tf.worker.services.m
 emory        Submarine submits model resources for Tensorflow worker services 
when submitting model training        tf.worker.services.num        Number of 
Tensorflow worker services used by Submarine to submit model distributed 
training        yarn.webapp.http.address    http://hadoop:8088    YARN web ui 
address        zeppelin.interpreter.rpc.portRange    29914    You need to 
export this port in the SUBMARINE_INTERPRETER_DOCKER_IMAGE configuration image. 
RPC communication for Zeppelin Server and Submarine interpreter containers      
  zeppelin.ipython.grpc.message_size    33554432    Message size setting for 
IPython grpc in Submarine interpreter container        
zeppelin.ipython.launch.timeout    30000    IPython execution timeout setting 
in Submarine interpreter container        zeppelin.python    python    
Execution path of python in Submarine interpreter container        
zeppelin.python.maxResult    10000    The maximum number of python execution 
results returned from the Subma
 rine interpreter container        zeppelin.python.useIPython    false    
IPython is currently not supported and must be false        
zeppelin.submarine.auth.type    simple/kerberos    Has Hadoop turned on 
kerberos authentication?  Docker imagesThe docker images file is stored in the 
zeppelin/scripts/docker/submarine directory.submarine interpreter cpu 
versionsubmarine interpreter gpu versiontensorflow 1.10 & hadoop 3.1.2 
cpu versiontensorflow 1.10 & hadoop 3.1.2 gpu versionChange Log0.1.0 
(Zeppelin 0.9.0) :Support distributed or standolone tensorflow model 
training.Support submarine interpreter running local.Support submarine 
interpreter running YARN.Support Docker on YARN-3.3.0, Plan compatible with 
lower versions of yarn.Bugs & ContactsSubmarine interpreter BUGIf you 
encounter a bug for this interpreter, please create a sub JIRA ticket on 
ZEPPELIN-3856.Submarine Running problemIf you encounter a problem for Submarine 
runtime, please create a ISSUE on hadoop
 -submarine-ecosystem.YARN Submarine BUGIf you encounter a bug for Yarn 
Submarine, please create a JIRA ticket on SUBMARINE.DependencyYARNSubmarine 
currently need to run on Hadoop 3.3+The hadoop version of the hadoop submarine 
team git repository is periodically submitted to the code repository of the 
hadoop.The version of the git repository for the hadoop submarine team will be 
faster than the hadoop version release cycle.You can use the hadoop version of 
the hadoop submarine team git repository.Submarine runtime environmentyou can 
use Submarine-installer https://github.com/hadoopsubmarine, Deploy Docker and 
network environments.MoreHadoop Submarine Project: 
https://hadoop.apache.org/submarineYoutube Submarine Channel: 
https://www.youtube.com/channel/UC4JBt8Y8VJ0BW0IM9YpdCyQ";,
+      "url": " /interpreter/submarine.html",
+      "group": "interpreter",
+      "excerpt": "Hadoop Submarine is the latest machine learning framework 
subproject in the Hadoop 3.1 release. It allows Hadoop to support Tensorflow, 
MXNet, Caffe, Spark, etc."
+    }
+    ,
+    
+  
+
     "/interpreter/mahout.html": {
       "title": "Mahout Interpreter for Apache Zeppelin",
       "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Apache Mahout Interpreter 
for Apache ZeppelinInstallationApache Mahout is a collection of packages that 
enable machine learning and matrix algebra on underlying engines such as Apache 
Flink or Apache Spark.  A convenience script for creating and configuring two 
Mahout enabled interpreters exists.  The %sparkMahout and %flinkMahout 
interpreters do not exist by default but can be easily created using this 
script.  Easy InstallationTo
  quickly and easily get up and running using Apache Mahout, run the following 
command from the top-level directory of the Zeppelin install:python 
scripts/mahout/add_mahout.pyThis will create the %sparkMahout and %flinkMahout 
interpreters, and restart Zeppelin.Advanced InstallationThe add_mahout.py 
script contains several command line arguments for advanced users.      
Argument    Description    Example        --zeppelin_home    This is the path 
to the Zeppelin installation.  This flag is not needed if the script is run 
from the top-level installation directory or from the zeppelin/scripts/mahout 
directory.    /path/to/zeppelin        --mahout_home    If the user has already 
installed Mahout, this flag can set the path to MAHOUT_HOME.  If this is set, 
downloading Mahout will be skipped.    /path/to/mahout_home        
--restart_later    Restarting is necessary for updates to take effect. By 
default the script will restart Zeppelin for you. Restart will be skipped if 
this flag is set. 
    NA        --force_download    This flag will force the script to 
re-download the binary even if it already exists.  This is useful for 
previously failed downloads.    NA          --overwrite_existing      This flag 
will force the script to overwrite existing %sparkMahout and %flinkMahout 
interpreters. Useful when you want to just start over.      NA    NOTE 1: 
Apache Mahout at this time only supports Spark 1.5 and Spark 1.6 and Scala 
2.10.  If the user is using another version of Spark (e.g. 2.0), the 
%sparkMahout will likely not work.  The %flinkMahout interpreter will still 
work and the user is encouraged to develop with that engine as the code can be 
ported via copy and paste, as is evidenced by the tutorial notebook.NOTE 2: If 
using Apache Flink in cluster mode, the following libraries will also need to 
be coppied to ${FLINK_HOME}/lib- mahout-math-0.12.2.jar- 
mahout-math-scala2.10-0.12.2.jar- mahout-flink2.10-0.12.2.jar- 
mahout-hdfs-0.12.2.jar- com.google.guava:guava:14.0.1Ov
 erviewThe Apache Mahout™ project's goal is to build an environment 
for quickly creating scalable performant machine learning applications.Apache 
Mahout software provides three major features:A simple and extensible 
programming environment and framework for building scalable algorithmsA wide 
variety of premade algorithms for Scala + Apache Spark, H2O, Apache 
FlinkSamsara, a vector math experimentation environment with R-like syntax 
which works at scaleIn other words:Apache Mahout provides a unified API for 
quickly creating machine learning algorithms on a variety of engines.How to 
useWhen starting a session with Apache Mahout, depending on which engine you 
are using (Spark or Flink), a few imports must be made and a Distributed 
Context must be declared.  Copy and paste the following code and run once to 
get started.Flink%flinkMahoutimport org.apache.flink.api.scala._import 
org.apache.mahout.math.drm._import 
org.apache.mahout.math.drm.RLikeDrmOps._import org.apache.mahout
 .flinkbindings._import org.apache.mahout.math._import scalabindings._import 
RLikeOps._implicit val ctx = new 
FlinkDistributedContext(benv)Spark%sparkMahoutimport 
org.apache.mahout.math._import org.apache.mahout.math.scalabindings._import 
org.apache.mahout.math.drm._import 
org.apache.mahout.math.scalabindings.RLikeOps._import 
org.apache.mahout.math.drm.RLikeDrmOps._import 
org.apache.mahout.sparkbindings._implicit val sdc: 
org.apache.mahout.sparkbindings.SparkDistributedContext = sc2sdc(sc)Same Code, 
Different EnginesAfter importing and setting up the distributed context, the 
Mahout R-Like DSL is consistent across engines.  The following code will run in 
both %flinkMahout and %sparkMahoutval drmData = drmParallelize(dense(  (2, 2, 
10.5, 10, 29.509541),  // Apple Cinnamon Cheerios  (1, 2, 12,   12, 18.042851), 
 // Cap'n'Crunch  (1, 1, 12,   13, 22.736446),  // Cocoa Puffs  
(2, 1, 11,   13, 32.207582),  // Froot Loops  (1, 2, 12,   11, 21.871292),  // 
Honey Graham Ohs  (
 2, 1, 16,   8,  36.187559),  // Wheaties Honey Gold  (6, 2, 17,   1,  
50.764999),  // Cheerios  (3, 2, 13,   7,  40.400208),  // Clusters  (3, 3, 13, 
  4,  45.811716)), numPartitions = 2)drmData.collect(::, 0 until 4)val drmX = 
drmData(::, 0 until 4)val y = drmData.collect(::, 4)val drmXtX = drmX.t %*% 
drmXval drmXty = drmX.t %*% yval XtX = drmXtX.collectval Xty = 
drmXty.collect(::, 0)val beta = solve(XtX, Xty)Leveraging Resource Pools and R 
for VisualizationResource Pools are a powerful Zeppelin feature that lets us 
share information between interpreters. A fun trick is to take the output of 
our work in Mahout and analyze it in other languages.Setting up a Resource Pool 
in FlinkIn Spark based interpreters resource pools are accessed via the 
ZeppelinContext API.  To put and get things from the resource pool one can be 
done simpleval myVal = 1z.put("foo", myVal)val myFetchedVal = 
z.get("foo")To add this functionality to a Flink based 
interpreter we
  declare the follwoing%flinkMahoutimport 
org.apache.zeppelin.interpreter.InterpreterContextval z = 
InterpreterContext.get().getResourcePool()Now we can access the resource pool 
in a consistent manner from the %flinkMahout interpreter.Passing a variable 
from Mahout to R and PlottingIn this simple example, we use Mahout (on Flink or 
Spark, the code is the same) to create a random matrix and then take the Sin of 
each element. We then randomly sample the matrix and create a tab separated 
string. Finally we pass that string to R where it is read as a .tsv file, and a 
DataFrame is created and plotted using native R plotting libraries.val mxRnd = 
Matrices.symmetricUniformView(5000, 2, 1234)val drmRand = 
drmParallelize(mxRnd)val drmSin = drmRand.mapBlock() {case (keys, block) 
=>    val blockB = block.like()  for (i <- 0 until block.nrow) {  
  blockB(i, 0) = block(i, 0)    blockB(i, 1) = Math.sin((block(i, 0) * 8))  }  
keys -> blockB}z.put("sinDrm", org
 .apache.mahout.math.drm.drmSampleToTSV(drmSin, 0.85))And then in an R 
paragraph...%spark.r {"imageWidth": 
"400px"}library("ggplot2")sinStr = 
z.get("flinkSinDrm")data <- read.table(text= sinStr, 
sep="t", header=FALSE)plot(data,  
col="red")",
@@ -47,7 +58,7 @@
 
     "/interpreter/spark.html": {
       "title": "Apache Spark Interpreter for Apache Zeppelin",
-      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Spark Interpreter for Apache 
ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing 
system.It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution graphs.Apache Spark is 
supported in Zeppelin with Spark interpreter group which consists of below five 
interpreters.      Name    Class    Description        %spark    
SparkInterpreter    Creates a SparkConte
 xt and provides a Scala environment        %spark.pyspark    
PySparkInterpreter    Provides a Python environment        %spark.r    
SparkRInterpreter    Provides an R environment with SparkR support        
%spark.sql    SparkSQLInterpreter    Provides a SQL environment        
%spark.dep    DepInterpreter    Dependency loader  ConfigurationThe Spark 
interpreter can be configured with properties provided by Zeppelin.You can also 
set other Spark properties which are not listed in the table. For a list of 
additional properties, refer to Spark Available Properties.      Property    
Default    Description        args        Spark commandline args      master    
local[*]    Spark master uri.  ex) spark://masterhost:7077      spark.app.name  
  Zeppelin    The name of spark application.        spark.cores.max        
Total number of cores to use.  Empty value uses all available core.        
spark.executor.memory     1g    Executor memory per worker instance.  ex) 512m, 
32g        zeppelin.dep
 .additionalRemoteRepository    spark-packages,  
http://dl.bintray.com/spark-packages/maven,  false;    A list of 
id,remote-repository-URL,is-snapshot;  for each remote repository.        
zeppelin.dep.localrepo    local-repo    Local repository for dependency loader  
      PYSPARKPYTHON    python    Python binary executable to use for PySpark in 
both driver and workers (default is python).            Property 
spark.pyspark.python take precedence if it is set        PYSPARKDRIVERPYTHON    
python    Python binary executable to use for PySpark in driver only (default 
is PYSPARKPYTHON).            Property spark.pyspark.driver.python take 
precedence if it is set        zeppelin.spark.concurrentSQL    false    Execute 
multiple SQL concurrently if set true.        zeppelin.spark.concurrentSQL.max  
  10    Max number of SQL concurrently executed        zeppelin.spark.maxResult 
   1000    Max number of Spark SQL result to display.        
zeppelin.spark.printREPLOutput    true    Print REPL o
 utput        zeppelin.spark.useHiveContext    true    Use HiveContext instead 
of SQLContext if it is true.        zeppelin.spark.importImplicit    true    
Import implicits, UDF collection, and sql if set true.        
zeppelin.spark.enableSupportedVersionCheck    true    Do not change - developer 
only setting, not for production use        zeppelin.spark.sql.interpolation    
false    Enable ZeppelinContext variable interpolation into paragraph text      
zeppelin.spark.uiWebUrl        Overrides Spark UI default URL. Value should be 
a full URL (ex: http://{hostName}/{uniquePath}    zeppelin.spark.scala.color    
true    Whether to enable color output of spark scala interpreter  Without any 
configuration, Spark interpreter works out of box in local mode. But if you 
want to connect to your Spark cluster, you'll need to follow below two 
simple steps.1. Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME 
environment variable with your Spark installation path.For example,expo
 rt SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables# 
set hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set options to pass 
spark-submit commandexport SPARK_SUBMIT_OPTIONS="--packages 
com.databricks:spark-csv_2.10:1.2.0"# extra classpath. e.g. set 
classpath for hive-site.xmlexport 
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have 
winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows 
for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to 
Interpreter menu and edit master property in your Spark interpreter setting. 
The value may vary depending on your Spark cluster deployment type.For 
example,local[*] in local modespark://master:7077 in standalone 
clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster 
modemesos://host:5050 in Mesos clusterThat's it. Zeppelin will work 
with any version of Spark and any deployment type without rebuilding Zeppe
 lin in this way.For the further information about Spark & Zeppelin 
version compatibility, please refer to "Available 
Interpreters" section in Zeppelin download page.Note that without 
exporting SPARK_HOME, it's running in local mode with included version 
of Spark. The included version may vary depending on the build profile.3. Yarn 
modeZeppelin support both yarn client and yarn cluster mode (yarn cluster mode 
is supported from 0.8.0). For yarn mode, you must specify SPARK_HOME & 
HADOOP_CONF_DIR.You can either specify them in zeppelin-env.sh, or in 
interpreter setting page. Specifying them in zeppelin-env.sh means you can use 
only one version of spark & hadoop. Specifying themin interpreter 
setting page means you can use multiple versions of spark & hadoop in 
one zeppelin instance.4. New Version of SparkInterpreterThere's one new 
version of SparkInterpreter with better spark support and code completion 
starting from Zep
 pelin 0.8.0. We enable it by default, but user can still use the old version 
of SparkInterpreter by setting zeppelin.spark.useNew as false in its 
interpreter setting.SparkContext, SQLContext, SparkSession, 
ZeppelinContextSparkContext, SQLContext and ZeppelinContext are automatically 
created and exposed as variable names sc, sqlContext and z, respectively, in 
Scala, Python and R environments.Staring from 0.6.1 SparkSession is available 
as variable spark when you are using Spark 2.x.Note that Scala/Python/R 
environment shares the same SparkContext, SQLContext and ZeppelinContext 
instance. How to pass property to SparkConfThere're 2 kinds of 
properties that would be passed to SparkConfStandard spark property (prefix 
with spark.). e.g. spark.executor.memory will be passed to 
SparkConfNon-standard spark property (prefix with zeppelin.spark.).  e.g. 
zeppelin.spark.property_1, property_1 will be passed to SparkConfDependency 
ManagementThere are two ways to load external libraries i
 n Spark interpreter. First is using interpreter setting menu and second is 
loading Spark properties.1. Setting Dependencies via Interpreter SettingPlease 
see Dependency Management for the details.2. Loading Spark PropertiesOnce 
SPARK_HOME is set in conf/zeppelin-env.sh, Zeppelin uses spark-submit as spark 
interpreter runner. spark-submit supports two ways to load configurations.The 
first is command line options such as --master and Zeppelin can pass these 
options to spark-submit by exporting SPARK_SUBMIT_OPTIONS in 
conf/zeppelin-env.sh. Second is reading configuration options from 
SPARK_HOME/conf/spark-defaults.conf. Spark properties that user can set to 
distribute libraries are:      spark-defaults.conf    SPARK_SUBMIT_OPTIONS    
Description        spark.jars    --jars    Comma-separated list of local jars 
to include on the driver and executor classpaths.        spark.jars.packages    
--packages    Comma-separated list of maven coordinates of jars to include on 
the driver and execu
 tor classpaths. Will search the local maven repo, then maven central and any 
additional remote repositories given by --repositories. The format for the 
coordinates should be groupId:artifactId:version.        spark.files    --files 
   Comma-separated list of files to be placed in the working directory of each 
executor.  Here are few examples:SPARK_SUBMIT_OPTIONS in 
conf/zeppelin-env.shexport SPARK_SUBMIT_OPTIONS="--packages 
com.databricks:spark-csv_2.10:1.2.0 --jars /path/mylib1.jar,/path/mylib2.jar 
--files 
/path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg"SPARK_HOME/conf/spark-defaults.confspark.jars
        /path/mylib1.jar,/path/mylib2.jarspark.jars.packages   
com.databricks:spark-csv_2.10:1.2.0spark.files       
/path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip3. Dynamic Dependency Loading 
via %spark.dep interpreterNote: %spark.dep interpreter loads libraries to 
%spark and %spark.pyspark but not to  %spark.sql interpreter. So we recommend 
you to use the first opt
 ion instead.When your code requires external library, instead of doing 
download/copy/restart Zeppelin, you can easily do following jobs using 
%spark.dep interpreter.Load libraries recursively from maven repositoryLoad 
libraries from local filesystemAdd additional maven repositoryAutomatically add 
libraries to SparkCluster (You can turn off)Dep interpreter leverages Scala 
environment. So you can write any Scala code here.Note that %spark.dep 
interpreter should be used before %spark, %spark.pyspark, 
%spark.sql.Here's usages.%spark.depz.reset() // clean up previously 
added artifact and repository// add maven 
repositoryz.addRepo("RepoName").url("RepoURL")//
 add maven snapshot 
repositoryz.addRepo("RepoName").url("RepoURL").snapshot()//
 add credentials for private maven 
repositoryz.addRepo("RepoName").url("RepoURL").username("username").password("p
 assword")// add artifact from 
filesystemz.load("/path/to.jar")// add artifact from maven 
repository, with no 
dependencyz.load("groupId:artifactId:version").excludeAll()// 
add artifact 
recursivelyz.load("groupId:artifactId:version")// add 
artifact recursively except comma separated GroupID:ArtifactId 
listz.load("groupId:artifactId:version").exclude("groupId:artifactId,groupId:artifactId,
 ...")// exclude with 
patternz.load("groupId:artifactId:version").exclude(*)z.load("groupId:artifactId:version").exclude("groupId:artifactId:*")z.load("groupId:artifactId:version").exclude("groupId:*")//
 local() skips adding artifact to spark clusters (skipping 
sc.addJar())z.load("groupId:artifactId:version").local()ZeppelinContextZeppelin
 automatically injects ZeppelinContext as variable z in your
  Scala/Python environment. ZeppelinContext provides some additional functions 
and utilities.See Zeppelin-Context for more details.Matplotlib Integration 
(pyspark)Both the python and pyspark interpreters have built-in support for 
inline visualization using matplotlib,a popular plotting library for python. 
More details can be found in the python interpreter documentation,since 
matplotlib support is identical. More advanced interactive plotting can be done 
with pyspark throughutilizing Zeppelin's built-in Angular Display 
System, as shown below:Running spark sql concurrentlyBy default, each sql 
statement would run sequentially in %spark.sql. But you can run them 
concurrently by following setup.set zeppelin.spark.concurrentSQL to true to 
enable the sql concurrent feature, underneath zeppelin will change to use 
fairscheduler for spark. And also set zeppelin.spark.concurrentSQL.max to 
control the max number of sql statements running concurrently.configure pools 
by creating fairsche
 duler.xml under your SPARK_CONF_DIR, check the offical spark doc Configuring 
Pool Propertiesset pool property via setting paragraph property. 
e.g.%spark(pool=pool1)sql statementThis feature is available for both all 
versions of scala spark, pyspark. For sparkr, it is only available starting 
from 2.3.0.Interpreter setting optionYou can choose one of shared, scoped and 
isolated options wheh you configure Spark interpreter.Spark interpreter creates 
separated Scala compiler per each notebook but share a single SparkContext in 
scoped mode (experimental).It creates separated SparkContext per each notebook 
in isolated mode.IPython supportBy default, zeppelin would use IPython in 
pyspark when IPython is available, Otherwise it would fall back to the original 
PySpark implementation.If you don't want to use IPython, then you can 
set zeppelin.pyspark.useIPython as false in interpreter setting. For the 
IPython features, you can refer docPython InterpreterSetting up Zeppelin with 
Kerbero
 sLogical setup with Zeppelin, Kerberos Key Distribution Center (KDC), and 
Spark on YARN:Configuration SetupOn the server that Zeppelin is installed, 
install Kerberos client modules and configuration, krb5.conf.This is to make 
the server communicate with KDC.Set SPARK_HOME in 
[ZEPPELIN_HOME]/conf/zeppelin-env.sh to use spark-submit(Additionally, you 
might have to set export HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two 
properties below to Spark configuration 
([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE:
 If you do not have permission to access for the above spark-defaults.conf 
file, optionally, you can add the above lines to the Spark Interpreter setting 
through the Interpreter tab in the Zeppelin UI.That's it. Play with 
Zeppelin!",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Spark Interpreter for Apache 
ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing 
system.It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution graphs.Apache Spark is 
supported in Zeppelin with Spark interpreter group which consists of below five 
interpreters.      Name    Class    Description        %spark    
SparkInterpreter    Creates a SparkConte
 xt and provides a Scala environment        %spark.pyspark    
PySparkInterpreter    Provides a Python environment        %spark.r    
SparkRInterpreter    Provides an R environment with SparkR support        
%spark.sql    SparkSQLInterpreter    Provides a SQL environment        
%spark.dep    DepInterpreter    Dependency loader  ConfigurationThe Spark 
interpreter can be configured with properties provided by Zeppelin.You can also 
set other Spark properties which are not listed in the table. For a list of 
additional properties, refer to Spark Available Properties.      Property    
Default    Description        args        Spark commandline args      master    
local[*]    Spark master uri.  ex) spark://masterhost:7077      spark.app.name  
  Zeppelin    The name of spark application.        spark.cores.max        
Total number of cores to use.  Empty value uses all available core.        
spark.executor.memory     1g    Executor memory per worker instance.  ex) 512m, 
32g        zeppelin.dep
 .additionalRemoteRepository    spark-packages,  
http://dl.bintray.com/spark-packages/maven,  false;    A list of 
id,remote-repository-URL,is-snapshot;  for each remote repository.        
zeppelin.dep.localrepo    local-repo    Local repository for dependency loader  
      PYSPARK_PYTHON    python    Python binary executable to use for PySpark 
in both driver and workers (default is python).            Property 
spark.pyspark.python take precedence if it is set        PYSPARK_DRIVER_PYTHON  
  python    Python binary executable to use for PySpark in driver only (default 
is PYSPARK_PYTHON).            Property spark.pyspark.driver.python take 
precedence if it is set        zeppelin.spark.concurrentSQL    false    Execute 
multiple SQL concurrently if set true.        zeppelin.spark.concurrentSQL.max  
  10    Max number of SQL concurrently executed        zeppelin.spark.maxResult 
   1000    Max number of Spark SQL result to display.        
zeppelin.spark.printREPLOutput    true    Print RE
 PL output        zeppelin.spark.useHiveContext    true    Use HiveContext 
instead of SQLContext if it is true.        zeppelin.spark.importImplicit    
true    Import implicits, UDF collection, and sql if set true.        
zeppelin.spark.enableSupportedVersionCheck    true    Do not change - developer 
only setting, not for production use        zeppelin.spark.sql.interpolation    
false    Enable ZeppelinContext variable interpolation into paragraph text      
zeppelin.spark.uiWebUrl        Overrides Spark UI default URL. Value should be 
a full URL (ex: http://{hostName}/{uniquePath}    zeppelin.spark.scala.color    
true    Whether to enable color output of spark scala interpreter  Without any 
configuration, Spark interpreter works out of box in local mode. But if you 
want to connect to your Spark cluster, you'll need to follow below two 
simple steps.1. Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME 
environment variable with your Spark installation path.For example,
 export SPARK_HOME=/usr/lib/sparkYou can optionally set more environment 
variables# set hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set 
options to pass spark-submit commandexport 
SPARK_SUBMIT_OPTIONS="--packages 
com.databricks:spark-csv_2.10:1.2.0"# extra classpath. e.g. set 
classpath for hive-site.xmlexport 
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have 
winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows 
for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to 
Interpreter menu and edit master property in your Spark interpreter setting. 
The value may vary depending on your Spark cluster deployment type.For 
example,local[*] in local modespark://master:7077 in standalone 
clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster 
modemesos://host:5050 in Mesos clusterThat's it. Zeppelin will work 
with any version of Spark and any deployment type without rebuilding Z
 eppelin in this way.For the further information about Spark & Zeppelin 
version compatibility, please refer to "Available 
Interpreters" section in Zeppelin download page.Note that without 
exporting SPARK_HOME, it's running in local mode with included version 
of Spark. The included version may vary depending on the build profile.3. Yarn 
modeZeppelin support both yarn client and yarn cluster mode (yarn cluster mode 
is supported from 0.8.0). For yarn mode, you must specify SPARK_HOME & 
HADOOP_CONF_DIR.You can either specify them in zeppelin-env.sh, or in 
interpreter setting page. Specifying them in zeppelin-env.sh means you can use 
only one version of spark & hadoop. Specifying themin interpreter 
setting page means you can use multiple versions of spark & hadoop in 
one zeppelin instance.4. New Version of SparkInterpreterStarting from 0.9, we 
totally removed the old spark interpreter implementation, and make the new 
spark interpre
 ter as the official spark interpreter.SparkContext, SQLContext, SparkSession, 
ZeppelinContextSparkContext, SQLContext and ZeppelinContext are automatically 
created and exposed as variable names sc, sqlContext and z, respectively, in 
Scala, Python and R environments.Staring from 0.6.1 SparkSession is available 
as variable spark when you are using Spark 2.x.Note that Scala/Python/R 
environment shares the same SparkContext, SQLContext and ZeppelinContext 
instance. How to pass property to SparkConfThere're 2 kinds of 
properties that would be passed to SparkConfStandard spark property (prefix 
with spark.). e.g. spark.executor.memory will be passed to 
SparkConfNon-standard spark property (prefix with zeppelin.spark.).  e.g. 
zeppelin.spark.property_1, property_1 will be passed to SparkConfDependency 
ManagementFor spark interpreter, you should not use Zeppelin's 
Dependency Management for managing third party dependencies, (%spark.dep also 
is not the recommended approach star
 ting from Zeppelin 0.8). Instead you should set spark properties (spark.jars, 
spark.files, spark.jars.packages) in 2 ways.      spark-defaults.conf    
SPARK_SUBMIT_OPTIONS    Description        spark.jars    --jars    
Comma-separated list of local jars to include on the driver and executor 
classpaths.        spark.jars.packages    --packages    Comma-separated list of 
maven coordinates of jars to include on the driver and executor classpaths. 
Will search the local maven repo, then maven central and any additional remote 
repositories given by --repositories. The format for the coordinates should be 
groupId:artifactId:version.        spark.files    --files    Comma-separated 
list of files to be placed in the working directory of each executor.  1. Set 
spark properties in zeppelin side.In zeppelin side, you can either set them in 
spark interpreter setting page or via Generic ConfInterpreter.It is not 
recommended to set them in SPARK_SUBMIT_OPTIONS. Because it will be shared by 
all spar
 k interpreters, you can not set different dependencies for different users.2. 
Set spark properties in spark side.In spark side, you can set them in 
spark-defaults.conf.e.g.    spark.jars        /path/mylib1.jar,/path/mylib2.jar 
   spark.jars.packages   com.databricks:spark-csv_2.10:1.2.0    spark.files     
  /path/mylib1.py,/path/mylib2.egg,/path/mylib3.zipZeppelinContextZeppelin 
automatically injects ZeppelinContext as variable z in your Scala/Python 
environment. ZeppelinContext provides some additional functions and 
utilities.See Zeppelin-Context for more details.Matplotlib Integration 
(pyspark)Both the python and pyspark interpreters have built-in support for 
inline visualization using matplotlib,a popular plotting library for python. 
More details can be found in the python interpreter documentation,since 
matplotlib support is identical. More advanced interactive plotting can be done 
with pyspark throughutilizing Zeppelin's built-in Angular Display 
System, as shown below:
 Running spark sql concurrentlyBy default, each sql statement would run 
sequentially in %spark.sql. But you can run them concurrently by following 
setup.set zeppelin.spark.concurrentSQL to true to enable the sql concurrent 
feature, underneath zeppelin will change to use fairscheduler for spark. And 
also set zeppelin.spark.concurrentSQL.max to control the max number of sql 
statements running concurrently.configure pools by creating fairscheduler.xml 
under your SPARK_CONF_DIR, check the offical spark doc Configuring Pool 
Propertiesset pool property via setting paragraph property. 
e.g.%spark(pool=pool1)sql statementThis feature is available for both all 
versions of scala spark, pyspark. For sparkr, it is only available starting 
from 2.3.0.Interpreter setting optionYou can choose one of shared, scoped and 
isolated options wheh you configure Spark interpreter.Spark interpreter creates 
separated Scala compiler per each notebook but share a single SparkContext in 
scoped mode (experimental).
 It creates separated SparkContext per each notebook in isolated mode.IPython 
supportBy default, zeppelin would use IPython in pyspark when IPython is 
available, Otherwise it would fall back to the original PySpark 
implementation.If you don't want to use IPython, then you can set 
zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython 
features, you can refer docPython InterpreterSetting up Zeppelin with 
KerberosLogical setup with Zeppelin, Kerberos Key Distribution Center (KDC), 
and Spark on YARN:Deprecate Spark 2.2 and earlier versionsStarting from 0.9, 
Zeppelin deprecate Spark 2.2 and earlier versions. So you will see a warning 
message when you use Spark 2.2 and earlier.You can get rid of this message by 
setting zeppelin.spark.deprecatedMsg.show to false.Configuration SetupOn the 
server that Zeppelin is installed, install Kerberos client modules and 
configuration, krb5.conf.This is to make the server communicate with KDC.Set 
SPARK_HOME in [ZEPPELIN_HOME
 ]/conf/zeppelin-env.sh to use spark-submit(Additionally, you might have to set 
export HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two properties below to Spark 
configuration 
([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE:
 If you do not have permission to access for the above spark-defaults.conf 
file, optionally, you can add the above lines to the Spark Interpreter setting 
through the Interpreter tab in the Zeppelin UI.That's it. Play with 
Zeppelin!",
       "url": " /interpreter/spark.html",
       "group": "interpreter",
       "excerpt": "Apache Spark is a fast and general-purpose cluster computing 
system. It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution engine."
@@ -80,7 +91,7 @@
 
     "/interpreter/ignite.html": {
       "title": "Ignite Interpreter for Apache Zeppelin",
-      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Ignite Interpreter for 
Apache ZeppelinOverviewApache Ignite In-Memory Data Fabric is a 
high-performance, integrated and distributed in-memory platform for computing 
and transacting on large-scale data sets in real-time, orders of magnitude 
faster than possible with traditional disk-based or flash technologies.You can 
use Zeppelin to retrieve distributed data from cache using Ignite SQL 
interpreter. Moreover, Ignite interpreter allo
 ws you to execute any Scala code in cases when SQL doesn't fit to your 
requirements. For example, you can populate data into your caches or execute 
distributed computations.Installing and Running Ignite exampleIn order to use 
Ignite interpreters, you may install Apache Ignite in some simple steps:Ignite 
provides examples only with source or binary release. Download Ignite source 
release or binary release whatever you want. But you must download Ignite as 
the same version of Zeppelin's. If it is not, you can't use 
scala code on Zeppelin. The supported Ignite version is specified in Supported 
Interpreter table for each Zeppelin release. If you're using Zeppelin 
master branch, please see ignite.version in 
path/to/your-Zeppelin/ignite/pom.xml.Examples are shipped as a separate Maven 
project, so to start running you simply need to import provided 
<dest_dir>/apache-ignite-fabric-{version}-bin/examples/pom.xml 
file into your favourite IDE, such
  as Eclipse.In case of Eclipse, Eclipse -> File -> Import 
-> Existing Maven ProjectsSet examples directory path to Eclipse and 
select the pom.xml.Then start org.apache.ignite.examples.ExampleNodeStartup (or 
whatever you want) to run at least one or more ignite node. When you run 
example code, you may notice that the number of node is increase one by 
one.Tip. If you want to run Ignite examples on the cli not IDE, you can export 
executable Jar file from IDE. Then run it by using below command.nohup java 
-jar </path/to/your Jar file name>Configuring Ignite 
InterpreterAt the "Interpreters" menu, you may edit Ignite 
interpreter or create new one. Zeppelin provides these properties for Ignite.   
   Property Name    value    Description        ignite.addresses    
127.0.0.1:47500..47509    Coma separated list of Ignite cluster hosts. See 
[Ignite Cluster 
Configuration](https://apacheignite.readme.io/docs/cluster-config) section for 
more de
 tails.        ignite.clientMode    true    You can connect to the Ignite 
cluster as client or server node. See [Ignite Clients vs. 
Servers](https://apacheignite.readme.io/docs/clients-vs-servers) section for 
details. Use true or false values in order to connect in client or server mode 
respectively.        ignite.config.url        Configuration URL. Overrides all 
other settings.        ignite.jdbc.url    
jdbc:ignite:cfg://default-ignite-jdbc.xml    Ignite JDBC connection URL.        
ignite.peerClassLoadingEnabled    true    Enables peer-class-loading. See [Zero 
Deployment](https://apacheignite.readme.io/docs/zero-deployment) section for 
details. Use true or false values in order to enable or disable P2P class 
loading respectively.  How to useAfter configuring Ignite interpreter, create 
your own notebook. Then you can bind interpreters like below image.For more 
interpreter binding information see here.Ignite SQL interpreterIn order to 
execute SQL query, use %ignite.ignitesql prefix. 
 Supposing you are running 
org.apache.ignite.examples.streaming.wordcount.StreamWords, then you can use 
"words" cache( Of course you have to specify this cache name 
to the Ignite interpreter setting section ignite.jdbc.url of Zeppelin ).For 
example, you can select top 10 words in the words cache using the following 
query%ignite.ignitesqlselect _val, count(_val) as cnt from String group by _val 
order by cnt desc limit 10As long as your Ignite version and Zeppelin Ignite 
version is same, you can also use scala code. Please check the Zeppelin Ignite 
version before you download your own Ignite.%igniteimport 
org.apache.ignite._import org.apache.ignite.cache.affinity._import 
org.apache.ignite.cache.query._import org.apache.ignite.configuration._import 
scala.collection.JavaConversions._val cache: IgniteCache[AffinityUuid, String] 
= ignite.cache("words")val qry = new 
SqlFieldsQuery("select avg(cnt), min(cnt), max(cnt) from (select 
count(_val) as c
 nt from String group by _val)", true)val res = 
cache.query(qry).getAll()collectionAsScalaIterable(res).foreach(println 
_)Apache Ignite also provides a guide docs for Zeppelin "Ignite with 
Apache Zeppelin"",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Ignite Interpreter for 
Apache ZeppelinOverviewApache Ignite In-Memory Data Fabric is a 
high-performance, integrated and distributed in-memory platform for computing 
and transacting on large-scale data sets in real-time, orders of magnitude 
faster than possible with traditional disk-based or flash technologies.You can 
use Zeppelin to retrieve distributed data from cache using Ignite SQL 
interpreter. Moreover, Ignite interpreter allo
 ws you to execute any Scala code in cases when SQL doesn't fit to your 
requirements. For example, you can populate data into your caches or execute 
distributed computations.Installing and Running Ignite exampleIn order to use 
Ignite interpreters, you may install Apache Ignite in some simple steps:Ignite 
provides examples only with source or binary release. Download Ignite source 
release or binary release whatever you want. But you must download Ignite as 
the same version of Zeppelin's. If it is not, you can't use 
scala code on Zeppelin. The supported Ignite version is specified in Supported 
Interpreter table for each Zeppelin release. If you're using Zeppelin 
master branch, please see ignite.version in 
path/to/your-Zeppelin/ignite/pom.xml.Examples are shipped as a separate Maven 
project, so to start running you simply need to import provided 
<dest_dir>/apache-ignite-fabric-{version}-bin/examples/pom.xml 
file into your favourite IDE, such
  as Eclipse.In case of Eclipse, Eclipse -> File -> Import 
-> Existing Maven ProjectsSet examples directory path to Eclipse and 
select the pom.xml.Then start org.apache.ignite.examples.ExampleNodeStartup (or 
whatever you want) to run at least one or more ignite node. When you run 
example code, you may notice that the number of node is increase one by 
one.Tip. If you want to run Ignite examples on the cli not IDE, you can export 
executable Jar file from IDE. Then run it by using below command.nohup java 
-jar </path/to/your Jar file name>Configuring Ignite 
InterpreterAt the "Interpreters" menu, you may edit Ignite 
interpreter or create new one. Zeppelin provides these properties for Ignite.   
   Property Name    value    Description        ignite.addresses    
127.0.0.1:47500..47509    Coma separated list of Ignite cluster hosts. See 
Ignite Cluster Configuration section for more details.        ignite.clientMode 
   true    You can con
 nect to the Ignite cluster as client or server node. See Ignite Clients vs. 
Servers section for details. Use true or false values in order to connect in 
client or server mode respectively.        ignite.config.url        
Configuration URL. Overrides all other settings.        ignite.jdbc.url    
jdbc:ignite:cfg://default-ignite-jdbc.xml    Ignite JDBC connection URL.        
ignite.peerClassLoadingEnabled    true    Enables peer-class-loading. See Zero 
Deployment section for details. Use true or false values in order to enable or 
disable P2P class loading respectively.  How to useAfter configuring Ignite 
interpreter, create your own notebook. Then you can bind interpreters like 
below image.For more interpreter binding information see here.Ignite SQL 
interpreterIn order to execute SQL query, use %ignite.ignitesql prefix. 
Supposing you are running 
org.apache.ignite.examples.streaming.wordcount.StreamWords, then you can use 
"words" cache( Of course you have to specify t
 his cache name to the Ignite interpreter setting section ignite.jdbc.url of 
Zeppelin ).For example, you can select top 10 words in the words cache using 
the following query%ignite.ignitesqlselect _val, count(_val) as cnt from String 
group by _val order by cnt desc limit 10As long as your Ignite version and 
Zeppelin Ignite version is same, you can also use scala code. Please check the 
Zeppelin Ignite version before you download your own Ignite.%igniteimport 
org.apache.ignite._import org.apache.ignite.cache.affinity._import 
org.apache.ignite.cache.query._import org.apache.ignite.configuration._import 
scala.collection.JavaConversions._val cache: IgniteCache[AffinityUuid, String] 
= ignite.cache("words")val qry = new 
SqlFieldsQuery("select avg(cnt), min(cnt), max(cnt) from (select 
count(_val) as cnt from String group by _val)", true)val res = 
cache.query(qry).getAll()collectionAsScalaIterable(res).foreach(println 
_)Apache Ignite also provides a guide d
 ocs for Zeppelin "Ignite with Apache Zeppelin"",
       "url": " /interpreter/ignite.html",
       "group": "interpreter",
       "excerpt": "Apache Ignite in-memory Data Fabric is a high-performance, 
integrated and distributed in-memory platform for computing and transacting on 
large-scale data sets in real-time, orders of magnitude faster than possible 
with traditional disk-based or flash technologies."
@@ -201,7 +212,7 @@
 
     "/interpreter/flink.html": {
       "title": "Flink Interpreter for Apache Zeppelin",
-      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Flink interpreter for Apache 
ZeppelinOverviewApache Flink is an open source platform for distributed stream 
and batch data processing. Flink’s core is a streaming dataflow engine that 
provides data distribution, communication, and fault tolerance for distributed 
computations over data streams. Flink also builds batch processing on top of 
the streaming engine, overlaying native iteration support, managed memory, and 
program opt
 imization.How to start local Flink cluster, to test the interpreterZeppelin 
comes with pre-configured flink-local interpreter, which starts Flink in a 
local mode on your machine, so you do not need to install anything.How to 
configure interpreter to point to Flink clusterAt the 
"Interpreters" menu, you have to create a new Flink 
interpreter and provide next properties:      property    value    Description  
      host    local    host name of running JobManager. 'local' runs 
flink in local mode (default)        port    6123    port of running JobManager 
 For more information about Flink configuration, you can find it here.How to 
test it's workingYou can find an example of Flink usage in the Zeppelin 
Tutorial folder or try the following word count example, by using the Zeppelin 
notebook from Till Rohrmann's presentation Interactive data analysis 
with Apache Flink for Apache Flink Meetup.%shrm 10.txt.utf-8wget 
http://www.gutenberg.org/ebooks/1
 0.txt.utf-8%flinkcase class WordCount(word: String, frequency: Int)val 
bible:DataSet[String] = benv.readTextFile("10.txt.utf-8")val 
partialCounts: DataSet[WordCount] = bible.flatMap{    line =>        
"""bw+b""".r.findAllIn(line).map(word
 => WordCount(word, 1))//        line.split(" 
").map(word => WordCount(word, 1))}val wordCounts = 
partialCounts.groupBy("word").reduce{    (left, right) 
=> WordCount(left.word, left.frequency + right.frequency)}val result10 = 
wordCounts.first(10).collect()",
+      "content"  : "<!--Licensed under the Apache License, Version 2.0 (the 
"License");you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an "AS IS" BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.-->Flink interpreter for Apache 
ZeppelinOverviewApache Flink is an open source platform for distributed stream 
and batch data processing. Flink’s core is a streaming dataflow engine that 
provides data distribution, communication, and fault tolerance for distributed 
computations over data streams. Flink also builds batch processing on top of 
the streaming engine, overlaying native iteration support, managed memory, and 
program opt
 imization.Apache Flink is supported in Zeppelin with Flink interpreter group 
which consists of below five interpreters.      Name    Class    Description    
    %flink    FlinkInterpreter    Creates 
ExecutionEnvironment/StreamExecutionEnvironment/BatchTableEnvironment/StreamTableEnvironment
 and provides a Scala environment        %flink.pyflink    PyFlinkInterpreter   
 Provides a python environment        %flink.ipyflink    IPyFlinkInterpreter    
Provides an ipython environment        %flink.ssql    FlinkStreamSqlInterpreter 
   Provides a stream sql environment        %flink.bsql    
FlinkBatchSqlInterpreter    Provides a batch sql environment  ConfigurationThe 
Flink interpreter can be configured with properties provided by Zeppelin.You 
can also set other flink properties which are not listed in the table. For a 
list of additional properties, refer to Flink Available Properties.      
Property    Default    Description        FLINK_HOME        Location of flink 
installation. It is mus
 t be specified, otherwise you can not use flink in zeppelin        
flink.execution.mode    local    Execution mode of flink, e.g. 
local/yarn/remote        flink.execution.remote.host        jobmanager hostname 
if it is remote mode        flink.execution.remote.port        jobmanager port 
if it is remote mode        flink.jm.memory    1024    Total number of 
memory(mb) of JobManager        flink.tm.memory    1024    Total number of 
memory(mb) of TaskManager        flink.tm.num    2    Number of TaskManager     
   flink.tm.slot    1    Number of slot per TaskManager        
flink.yarn.appName    Zeppelin Flink Session    Yarn app name        
flink.yarn.queue        queue name of yarn app        flink.yarn.jars        
additional user jars (comma separated)        zeppelin.flink.scala.color    
true    whether display scala shell output in colorful format        
zeppelin.flink.enableHive    false    whether enable hive        
zeppelin.flink.printREPLOutput    true    Print REPL output    
     zeppelin.flink.maxResult    1000    max number of row returned by sql 
interpreter        zeppelin.flink.planner    blink    planner or flink table 
api, blink or flink        zeppelin.pyflink.python    python    python 
executable for pyflink  StreamExecutionEnvironment, ExecutionEnvironment, 
StreamTableEnvironment, BatchTableEnvironmentZeppelin will create 4 variables 
to represent flink's entrypoint:* senv    (StreamExecutionEnvironment), 
* env     (ExecutionEnvironment)* stenv   (StreamTableEnvironment) * btenv   
(BatchTableEnvironment)ZeppelinContextZeppelin automatically injects 
ZeppelinContext as variable z in your Scala/Python environment. ZeppelinContext 
provides some additional functions and utilities.See Zeppelin-Context for more 
details.IPython supportBy default, zeppelin would use IPython in pyflink when 
IPython is available, Otherwise it would fall back to the original PyFlink 
implementation.If you don't want to use IPython, then you can set 
zeppelin.py
 flink.useIPython as false in interpreter setting. For the IPython features, 
you can refer docPython Interpreter",
       "url": " /interpreter/flink.html",
       "group": "interpreter",
       "excerpt": "Apache Flink is an open source platform for distributed 
stream and batch data processing."
@@ -223,7 +234,7 @@
 
     "/interpreter/cassandra.html": {
       "title": "Cassandra CQL Interpreter for Apache Zeppelin",

[... 159 lines stripped ...]

Reply via email to