spark.html rss.xml search_data.json

zjffdu Wed, 30 Oct 2019 06:32:43 -0700

Author: zjffdu
Date: Wed Oct 30 13:30:46 2019
New Revision: 1869170

URL: http://svn.apache.org/viewvc?rev=1869170&view=rev
Log:
fix format of PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON


Modified:
    zeppelin/site/docs/0.8.2/atom.xml
    zeppelin/site/docs/0.8.2/interpreter/spark.html
    zeppelin/site/docs/0.8.2/rss.xml
    zeppelin/site/docs/0.8.2/search_data.json

Modified: zeppelin/site/docs/0.8.2/atom.xml
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.2/atom.xml?rev=1869170&r1=1869169&r2=1869170&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.2/atom.xml (original)
+++ zeppelin/site/docs/0.8.2/atom.xml Wed Oct 30 13:30:46 2019
@@ -4,7 +4,7 @@
  <title>Apache Zeppelin</title>
  <link href="http://zeppelin.apache.org/"; rel="self"/>
  <link href="http://zeppelin.apache.org"/>
- <updated>2019-09-29T15:48:18+08:00</updated>
+ <updated>2019-10-30T21:27:31+08:00</updated>
  <id>http://zeppelin.apache.org</id>
  <author>
    <name>The Apache Software Foundation</name>

Modified: zeppelin/site/docs/0.8.2/interpreter/spark.html
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.2/interpreter/spark.html?rev=1869170&r1=1869169&r2=1869170&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.2/interpreter/spark.html (original)
+++ zeppelin/site/docs/0.8.2/interpreter/spark.html Wed Oct 30 13:30:46 2019
@@ -343,15 +343,15 @@ You can also set other Spark properties
     <td>Local repository for dependency loader</td>
   </tr>
   <tr>
-    <td><code>PYSPARK<em>PYTHON</code></td>
+    <td><code>PYSPARK_PYTHON</code></td>
     <td>python</td>
     <td>Python binary executable to use for PySpark in both driver and workers 
(default is <code>python</code>).
             Property <code>spark.pyspark.python</code> take precedence if it 
is set</td>
   </tr>
   <tr>
-    <td><code>PYSPARK</em>DRIVER<em>PYTHON</code></td>
+    <td><code>PYSPARK_DRIVER_PYTHON</code></td>
     <td>python</td>
-    <td>Python binary executable to use for PySpark in driver only (default is 
<code>PYSPARK</em>PYTHON</code>).
+    <td>Python binary executable to use for PySpark in driver only (default is 
<code>PYSPARK_PYTHON</code>).
             Property <code>spark.pyspark.driver.python</code> take precedence 
if it is set</td>
   </tr>
   <tr>

Modified: zeppelin/site/docs/0.8.2/rss.xml
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.2/rss.xml?rev=1869170&r1=1869169&r2=1869170&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.2/rss.xml (original)
+++ zeppelin/site/docs/0.8.2/rss.xml Wed Oct 30 13:30:46 2019
@@ -5,8 +5,8 @@
         <description>Apache Zeppelin - The Apache Software 
Foundation</description>
         <link>http://zeppelin.apache.org</link>
         <link>http://zeppelin.apache.org</link>
-        <lastBuildDate>2019-09-29T15:48:18+08:00</lastBuildDate>
-        <pubDate>2019-09-29T15:48:18+08:00</pubDate>
+        <lastBuildDate>2019-10-30T21:27:31+08:00</lastBuildDate>
+        <pubDate>2019-10-30T21:27:31+08:00</pubDate>
         <ttl>1800</ttl>
 
 

Modified: zeppelin/site/docs/0.8.2/search_data.json
URL: 
http://svn.apache.org/viewvc/zeppelin/site/docs/0.8.2/search_data.json?rev=1869170&r1=1869169&r2=1869170&view=diff
==============================================================================
--- zeppelin/site/docs/0.8.2/search_data.json (original)
+++ zeppelin/site/docs/0.8.2/search_data.json Wed Oct 30 13:30:46 2019
@@ -47,7 +47,7 @@
 
     "/interpreter/spark.html": {
       "title": "Apache Spark Interpreter for Apache Zeppelin",
-      "content"  : "&lt;!--Licensed under the Apache License, Version 2.0 (the 
&quot;License&quot;);you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an &quot;AS IS&quot; BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.--&gt;Spark Interpreter for Apache 
ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing 
system.It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution graphs.Apache Spark is 
supported in Zeppelin with Spark interpreter group which consists of below five 
interpreters.      Name    Class    Description        %spark    
SparkInterpreter    Creates a SparkConte
 xt and provides a Scala environment        %spark.pyspark    
PySparkInterpreter    Provides a Python environment        %spark.r    
SparkRInterpreter    Provides an R environment with SparkR support        
%spark.sql    SparkSQLInterpreter    Provides a SQL environment        
%spark.dep    DepInterpreter    Dependency loader  ConfigurationThe Spark 
interpreter can be configured with properties provided by Zeppelin.You can also 
set other Spark properties which are not listed in the table. For a list of 
additional properties, refer to Spark Available Properties.      Property    
Default    Description        args        Spark commandline args      master    
local[*]    Spark master uri.  ex) spark://masterhost:7077      spark.app.name  
  Zeppelin    The name of spark application.        spark.cores.max        
Total number of cores to use.  Empty value uses all available core.        
spark.executor.memory     1g    Executor memory per worker instance.  ex) 512m, 
32g        zeppelin.dep
 .additionalRemoteRepository    spark-packages,  
http://dl.bintray.com/spark-packages/maven,  false;    A list of 
id,remote-repository-URL,is-snapshot;  for each remote repository.        
zeppelin.dep.localrepo    local-repo    Local repository for dependency loader  
      PYSPARKPYTHON    python    Python binary executable to use for PySpark in 
both driver and workers (default is python).            Property 
spark.pyspark.python take precedence if it is set        PYSPARKDRIVERPYTHON    
python    Python binary executable to use for PySpark in driver only (default 
is PYSPARKPYTHON).            Property spark.pyspark.driver.python take 
precedence if it is set        zeppelin.spark.concurrentSQL    false    Execute 
multiple SQL concurrently if set true.        zeppelin.spark.maxResult    1000  
  Max number of Spark SQL result to display.        
zeppelin.spark.printREPLOutput    true    Print REPL output        
zeppelin.spark.useHiveContext    true    Use HiveContext instead of SQLConte
 xt if it is true.        zeppelin.spark.importImplicit    true    Import 
implicits, UDF collection, and sql if set true.        
zeppelin.spark.enableSupportedVersionCheck    true    Do not change - developer 
only setting, not for production use        zeppelin.spark.sql.interpolation    
false    Enable ZeppelinContext variable interpolation into paragraph text      
zeppelin.spark.uiWebUrl        Overrides Spark UI default URL. Value should be 
a full URL (ex: http://{hostName}/{uniquePath}  Without any configuration, 
Spark interpreter works out of box in local mode. But if you want to connect to 
your Spark cluster, you&amp;#39;ll need to follow below two simple steps.1. 
Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME environment 
variable with your Spark installation path.For example,export 
SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables# set 
hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set options to pass 
spark-submit commandexport SPA
 RK_SUBMIT_OPTIONS=&amp;quot;--packages 
com.databricks:spark-csv_2.10:1.2.0&amp;quot;# extra classpath. e.g. set 
classpath for hive-site.xmlexport 
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have 
winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows 
for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to 
Interpreter menu and edit master property in your Spark interpreter setting. 
The value may vary depending on your Spark cluster deployment type.For 
example,local[*] in local modespark://master:7077 in standalone 
clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster 
modemesos://host:5050 in Mesos clusterThat&amp;#39;s it. Zeppelin will work 
with any version of Spark and any deployment type without rebuilding Zeppelin 
in this way.For the further information about Spark &amp;amp; Zeppelin version 
compatibility, please refer to &amp;quot;Available Interpreters&amp;quot; 
section in Zeppelin download pa
 ge.Note that without exporting SPARK_HOME, it&amp;#39;s running in local mode 
with included version of Spark. The included version may vary depending on the 
build profile.3. Yarn modeZeppelin support both yarn client and yarn cluster 
mode (yarn cluster mode is supported from 0.8.0). For yarn mode, you must 
specify SPARK_HOME &amp;amp; HADOOP_CONF_DIR.You can either specify them in 
zeppelin-env.sh, or in interpreter setting page. Specifying them in 
zeppelin-env.sh means you can use only one version of spark &amp;amp; hadoop. 
Specifying themin interpreter setting page means you can use multiple versions 
of spark &amp;amp; hadoop in one zeppelin instance.4. New Version of 
SparkInterpreterThere&amp;#39;s one new version of SparkInterpreter with better 
spark support and code completion starting from Zeppelin 0.8.0. We enable it by 
default, but user can still use the old version of SparkInterpreter by setting 
zeppelin.spark.useNew as false in its interpreter setting.SparkContext, SQLConte
 xt, SparkSession, ZeppelinContextSparkContext, SQLContext and ZeppelinContext 
are automatically created and exposed as variable names sc, sqlContext and z, 
respectively, in Scala, Python and R environments.Staring from 0.6.1 
SparkSession is available as variable spark when you are using Spark 2.x.Note 
that Scala/Python/R environment shares the same SparkContext, SQLContext and 
ZeppelinContext instance. How to pass property to SparkConfThere&amp;#39;re 2 
kinds of properties that would be passed to SparkConfStandard spark property 
(prefix with spark.). e.g. spark.executor.memory will be passed to 
SparkConfNon-standard spark property (prefix with zeppelin.spark.).  e.g. 
zeppelin.spark.property_1, property_1 will be passed to SparkConfDependency 
ManagementThere are two ways to load external libraries in Spark interpreter. 
First is using interpreter setting menu and second is loading Spark 
properties.1. Setting Dependencies via Interpreter SettingPlease see Dependency 
Management for the 
 details.2. Loading Spark PropertiesOnce SPARK_HOME is set in 
conf/zeppelin-env.sh, Zeppelin uses spark-submit as spark interpreter runner. 
spark-submit supports two ways to load configurations.The first is command line 
options such as --master and Zeppelin can pass these options to spark-submit by 
exporting SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh. Second is reading 
configuration options from SPARK_HOME/conf/spark-defaults.conf. Spark 
properties that user can set to distribute libraries are:      
spark-defaults.conf    SPARK_SUBMIT_OPTIONS    Description        spark.jars    
--jars    Comma-separated list of local jars to include on the driver and 
executor classpaths.        spark.jars.packages    --packages    
Comma-separated list of maven coordinates of jars to include on the driver and 
executor classpaths. Will search the local maven repo, then maven central and 
any additional remote repositories given by --repositories. The format for the 
coordinates should be groupId:artifa
 ctId:version.        spark.files    --files    Comma-separated list of files 
to be placed in the working directory of each executor.  Here are few 
examples:SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.shexport 
SPARK_SUBMIT_OPTIONS=&amp;quot;--packages com.databricks:spark-csv_2.10:1.2.0 
--jars /path/mylib1.jar,/path/mylib2.jar --files 
/path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg&amp;quot;SPARK_HOME/conf/spark-defaults.confspark.jars
        /path/mylib1.jar,/path/mylib2.jarspark.jars.packages   
com.databricks:spark-csv_2.10:1.2.0spark.files       
/path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip3. Dynamic Dependency Loading 
via %spark.dep interpreterNote: %spark.dep interpreter loads libraries to 
%spark and %spark.pyspark but not to  %spark.sql interpreter. So we recommend 
you to use the first option instead.When your code requires external library, 
instead of doing download/copy/restart Zeppelin, you can easily do following 
jobs using %spark.dep interpreter.Load libraries recursiv
 ely from maven repositoryLoad libraries from local filesystemAdd additional 
maven repositoryAutomatically add libraries to SparkCluster (You can turn 
off)Dep interpreter leverages Scala environment. So you can write any Scala 
code here.Note that %spark.dep interpreter should be used before %spark, 
%spark.pyspark, %spark.sql.Here&amp;#39;s usages.%spark.depz.reset() // clean 
up previously added artifact and repository// add maven 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;)//
 add maven snapshot 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;).snapshot()//
 add credentials for private maven 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;).username(&amp;quot;username&amp;quot;).password(&amp;quot;password&amp;quot;)//
 add artifact from filesystemz.load(&amp;quot;/path/to.jar&amp;quot;)// add 
artifact from maven repository, with no 
dependencyz.load(&amp;quot;groupId:artifactId:versio
 n&amp;quot;).excludeAll()// add artifact 
recursivelyz.load(&amp;quot;groupId:artifactId:version&amp;quot;)// add 
artifact recursively except comma separated GroupID:ArtifactId 
listz.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:artifactId,groupId:artifactId,
 ...&amp;quot;)// exclude with 
patternz.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(*)z.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:artifactId:*&amp;quot;)z.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:*&amp;quot;)//
 local() skips adding artifact to spark clusters (skipping 
sc.addJar())z.load(&amp;quot;groupId:artifactId:version&amp;quot;).local()ZeppelinContextZeppelin
 automatically injects ZeppelinContext as variable z in your Scala/Python 
environment. ZeppelinContext provides some additional functions and 
utilities.See Zeppelin-Context for more details.Matplotlib Integration 
(pyspark)Both the python and pyspar
 k interpreters have built-in support for inline visualization using 
matplotlib,a popular plotting library for python. More details can be found in 
the python interpreter documentation,since matplotlib support is identical. 
More advanced interactive plotting can be done with pyspark throughutilizing 
Zeppelin&amp;#39;s built-in Angular Display System, as shown below:Interpreter 
setting optionYou can choose one of shared, scoped and isolated options wheh 
you configure Spark interpreter.Spark interpreter creates separated Scala 
compiler per each notebook but share a single SparkContext in scoped mode 
(experimental).It creates separated SparkContext per each notebook in isolated 
mode.IPython supportBy default, zeppelin would use IPython in pyspark when 
IPython is available, Otherwise it would fall back to the original PySpark 
implementation.If you don&amp;#39;t want to use IPython, then you can set 
zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython 
features, you
  can refer docPython InterpreterSetting up Zeppelin with KerberosLogical setup 
with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on 
YARN:Configuration SetupOn the server that Zeppelin is installed, install 
Kerberos client modules and configuration, krb5.conf.This is to make the server 
communicate with KDC.Set SPARK_HOME in [ZEPPELIN_HOME]/conf/zeppelin-env.sh to 
use spark-submit(Additionally, you might have to set export 
HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two properties below to Spark 
configuration 
([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE:
 If you do not have permission to access for the above spark-defaults.conf 
file, optionally, you can add the above lines to the Spark Interpreter setting 
through the Interpreter tab in the Zeppelin UI.That&amp;#39;s it. Play with 
Zeppelin!",
+      "content"  : "&lt;!--Licensed under the Apache License, Version 2.0 (the 
&quot;License&quot;);you may not use this file except in compliance with the 
License.You may obtain a copy of the License 
athttp://www.apache.org/licenses/LICENSE-2.0Unless required by applicable law 
or agreed to in writing, softwaredistributed under the License is distributed 
on an &quot;AS IS&quot; BASIS,WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, 
either express or implied.See the License for the specific language governing 
permissions andlimitations under the License.--&gt;Spark Interpreter for Apache 
ZeppelinOverviewApache Spark is a fast and general-purpose cluster computing 
system.It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution graphs.Apache Spark is 
supported in Zeppelin with Spark interpreter group which consists of below five 
interpreters.      Name    Class    Description        %spark    
SparkInterpreter    Creates a SparkConte
 xt and provides a Scala environment        %spark.pyspark    
PySparkInterpreter    Provides a Python environment        %spark.r    
SparkRInterpreter    Provides an R environment with SparkR support        
%spark.sql    SparkSQLInterpreter    Provides a SQL environment        
%spark.dep    DepInterpreter    Dependency loader  ConfigurationThe Spark 
interpreter can be configured with properties provided by Zeppelin.You can also 
set other Spark properties which are not listed in the table. For a list of 
additional properties, refer to Spark Available Properties.      Property    
Default    Description        args        Spark commandline args      master    
local[*]    Spark master uri.  ex) spark://masterhost:7077      spark.app.name  
  Zeppelin    The name of spark application.        spark.cores.max        
Total number of cores to use.  Empty value uses all available core.        
spark.executor.memory     1g    Executor memory per worker instance.  ex) 512m, 
32g        zeppelin.dep
 .additionalRemoteRepository    spark-packages,  
http://dl.bintray.com/spark-packages/maven,  false;    A list of 
id,remote-repository-URL,is-snapshot;  for each remote repository.        
zeppelin.dep.localrepo    local-repo    Local repository for dependency loader  
      PYSPARK_PYTHON    python    Python binary executable to use for PySpark 
in both driver and workers (default is python).            Property 
spark.pyspark.python take precedence if it is set        PYSPARK_DRIVER_PYTHON  
  python    Python binary executable to use for PySpark in driver only (default 
is PYSPARK_PYTHON).            Property spark.pyspark.driver.python take 
precedence if it is set        zeppelin.spark.concurrentSQL    false    Execute 
multiple SQL concurrently if set true.        zeppelin.spark.maxResult    1000  
  Max number of Spark SQL result to display.        
zeppelin.spark.printREPLOutput    true    Print REPL output        
zeppelin.spark.useHiveContext    true    Use HiveContext instead of SQLC
 ontext if it is true.        zeppelin.spark.importImplicit    true    Import 
implicits, UDF collection, and sql if set true.        
zeppelin.spark.enableSupportedVersionCheck    true    Do not change - developer 
only setting, not for production use        zeppelin.spark.sql.interpolation    
false    Enable ZeppelinContext variable interpolation into paragraph text      
zeppelin.spark.uiWebUrl        Overrides Spark UI default URL. Value should be 
a full URL (ex: http://{hostName}/{uniquePath}  Without any configuration, 
Spark interpreter works out of box in local mode. But if you want to connect to 
your Spark cluster, you&amp;#39;ll need to follow below two simple steps.1. 
Export SPARK_HOMEIn conf/zeppelin-env.sh, export SPARK_HOME environment 
variable with your Spark installation path.For example,export 
SPARK_HOME=/usr/lib/sparkYou can optionally set more environment variables# set 
hadoop conf direxport HADOOP_CONF_DIR=/usr/lib/hadoop# set options to pass 
spark-submit commandexport
  SPARK_SUBMIT_OPTIONS=&amp;quot;--packages 
com.databricks:spark-csv_2.10:1.2.0&amp;quot;# extra classpath. e.g. set 
classpath for hive-site.xmlexport 
ZEPPELIN_INTP_CLASSPATH_OVERRIDES=/etc/hive/confFor Windows, ensure you have 
winutils.exe in %HADOOP_HOME%bin. Please see Problems running Hadoop on Windows 
for the details.2. Set master in Interpreter menuAfter start Zeppelin, go to 
Interpreter menu and edit master property in your Spark interpreter setting. 
The value may vary depending on your Spark cluster deployment type.For 
example,local[*] in local modespark://master:7077 in standalone 
clusteryarn-client in Yarn client modeyarn-cluster in Yarn cluster 
modemesos://host:5050 in Mesos clusterThat&amp;#39;s it. Zeppelin will work 
with any version of Spark and any deployment type without rebuilding Zeppelin 
in this way.For the further information about Spark &amp;amp; Zeppelin version 
compatibility, please refer to &amp;quot;Available Interpreters&amp;quot; 
section in Zeppelin downloa
 d page.Note that without exporting SPARK_HOME, it&amp;#39;s running in local 
mode with included version of Spark. The included version may vary depending on 
the build profile.3. Yarn modeZeppelin support both yarn client and yarn 
cluster mode (yarn cluster mode is supported from 0.8.0). For yarn mode, you 
must specify SPARK_HOME &amp;amp; HADOOP_CONF_DIR.You can either specify them 
in zeppelin-env.sh, or in interpreter setting page. Specifying them in 
zeppelin-env.sh means you can use only one version of spark &amp;amp; hadoop. 
Specifying themin interpreter setting page means you can use multiple versions 
of spark &amp;amp; hadoop in one zeppelin instance.4. New Version of 
SparkInterpreterThere&amp;#39;s one new version of SparkInterpreter with better 
spark support and code completion starting from Zeppelin 0.8.0. We enable it by 
default, but user can still use the old version of SparkInterpreter by setting 
zeppelin.spark.useNew as false in its interpreter setting.SparkContext, SQLC
 ontext, SparkSession, ZeppelinContextSparkContext, SQLContext and 
ZeppelinContext are automatically created and exposed as variable names sc, 
sqlContext and z, respectively, in Scala, Python and R environments.Staring 
from 0.6.1 SparkSession is available as variable spark when you are using Spark 
2.x.Note that Scala/Python/R environment shares the same SparkContext, 
SQLContext and ZeppelinContext instance. How to pass property to 
SparkConfThere&amp;#39;re 2 kinds of properties that would be passed to 
SparkConfStandard spark property (prefix with spark.). e.g. 
spark.executor.memory will be passed to SparkConfNon-standard spark property 
(prefix with zeppelin.spark.).  e.g. zeppelin.spark.property_1, property_1 will 
be passed to SparkConfDependency ManagementThere are two ways to load external 
libraries in Spark interpreter. First is using interpreter setting menu and 
second is loading Spark properties.1. Setting Dependencies via Interpreter 
SettingPlease see Dependency Management for 
 the details.2. Loading Spark PropertiesOnce SPARK_HOME is set in 
conf/zeppelin-env.sh, Zeppelin uses spark-submit as spark interpreter runner. 
spark-submit supports two ways to load configurations.The first is command line 
options such as --master and Zeppelin can pass these options to spark-submit by 
exporting SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.sh. Second is reading 
configuration options from SPARK_HOME/conf/spark-defaults.conf. Spark 
properties that user can set to distribute libraries are:      
spark-defaults.conf    SPARK_SUBMIT_OPTIONS    Description        spark.jars    
--jars    Comma-separated list of local jars to include on the driver and 
executor classpaths.        spark.jars.packages    --packages    
Comma-separated list of maven coordinates of jars to include on the driver and 
executor classpaths. Will search the local maven repo, then maven central and 
any additional remote repositories given by --repositories. The format for the 
coordinates should be groupId:ar
 tifactId:version.        spark.files    --files    Comma-separated list of 
files to be placed in the working directory of each executor.  Here are few 
examples:SPARK_SUBMIT_OPTIONS in conf/zeppelin-env.shexport 
SPARK_SUBMIT_OPTIONS=&amp;quot;--packages com.databricks:spark-csv_2.10:1.2.0 
--jars /path/mylib1.jar,/path/mylib2.jar --files 
/path/mylib1.py,/path/mylib2.zip,/path/mylib3.egg&amp;quot;SPARK_HOME/conf/spark-defaults.confspark.jars
        /path/mylib1.jar,/path/mylib2.jarspark.jars.packages   
com.databricks:spark-csv_2.10:1.2.0spark.files       
/path/mylib1.py,/path/mylib2.egg,/path/mylib3.zip3. Dynamic Dependency Loading 
via %spark.dep interpreterNote: %spark.dep interpreter loads libraries to 
%spark and %spark.pyspark but not to  %spark.sql interpreter. So we recommend 
you to use the first option instead.When your code requires external library, 
instead of doing download/copy/restart Zeppelin, you can easily do following 
jobs using %spark.dep interpreter.Load libraries recu
 rsively from maven repositoryLoad libraries from local filesystemAdd 
additional maven repositoryAutomatically add libraries to SparkCluster (You can 
turn off)Dep interpreter leverages Scala environment. So you can write any 
Scala code here.Note that %spark.dep interpreter should be used before %spark, 
%spark.pyspark, %spark.sql.Here&amp;#39;s usages.%spark.depz.reset() // clean 
up previously added artifact and repository// add maven 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;)//
 add maven snapshot 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;).snapshot()//
 add credentials for private maven 
repositoryz.addRepo(&amp;quot;RepoName&amp;quot;).url(&amp;quot;RepoURL&amp;quot;).username(&amp;quot;username&amp;quot;).password(&amp;quot;password&amp;quot;)//
 add artifact from filesystemz.load(&amp;quot;/path/to.jar&amp;quot;)// add 
artifact from maven repository, with no 
dependencyz.load(&amp;quot;groupId:artifactId:ve
 rsion&amp;quot;).excludeAll()// add artifact 
recursivelyz.load(&amp;quot;groupId:artifactId:version&amp;quot;)// add 
artifact recursively except comma separated GroupID:ArtifactId 
listz.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:artifactId,groupId:artifactId,
 ...&amp;quot;)// exclude with 
patternz.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(*)z.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:artifactId:*&amp;quot;)z.load(&amp;quot;groupId:artifactId:version&amp;quot;).exclude(&amp;quot;groupId:*&amp;quot;)//
 local() skips adding artifact to spark clusters (skipping 
sc.addJar())z.load(&amp;quot;groupId:artifactId:version&amp;quot;).local()ZeppelinContextZeppelin
 automatically injects ZeppelinContext as variable z in your Scala/Python 
environment. ZeppelinContext provides some additional functions and 
utilities.See Zeppelin-Context for more details.Matplotlib Integration 
(pyspark)Both the python and py
 spark interpreters have built-in support for inline visualization using 
matplotlib,a popular plotting library for python. More details can be found in 
the python interpreter documentation,since matplotlib support is identical. 
More advanced interactive plotting can be done with pyspark throughutilizing 
Zeppelin&amp;#39;s built-in Angular Display System, as shown below:Interpreter 
setting optionYou can choose one of shared, scoped and isolated options wheh 
you configure Spark interpreter.Spark interpreter creates separated Scala 
compiler per each notebook but share a single SparkContext in scoped mode 
(experimental).It creates separated SparkContext per each notebook in isolated 
mode.IPython supportBy default, zeppelin would use IPython in pyspark when 
IPython is available, Otherwise it would fall back to the original PySpark 
implementation.If you don&amp;#39;t want to use IPython, then you can set 
zeppelin.pyspark.useIPython as false in interpreter setting. For the IPython 
features,
  you can refer docPython InterpreterSetting up Zeppelin with KerberosLogical 
setup with Zeppelin, Kerberos Key Distribution Center (KDC), and Spark on 
YARN:Configuration SetupOn the server that Zeppelin is installed, install 
Kerberos client modules and configuration, krb5.conf.This is to make the server 
communicate with KDC.Set SPARK_HOME in [ZEPPELIN_HOME]/conf/zeppelin-env.sh to 
use spark-submit(Additionally, you might have to set export 
HADOOP_CONF_DIR=/etc/hadoop/conf)Add the two properties below to Spark 
configuration 
([SPARK_HOME]/conf/spark-defaults.conf):spark.yarn.principalspark.yarn.keytabNOTE:
 If you do not have permission to access for the above spark-defaults.conf 
file, optionally, you can add the above lines to the Spark Interpreter setting 
through the Interpreter tab in the Zeppelin UI.That&amp;#39;s it. Play with 
Zeppelin!",
       "url": " /interpreter/spark.html",
       "group": "interpreter",
       "excerpt": "Apache Spark is a fast and general-purpose cluster computing 
system. It provides high-level APIs in Java, Scala, Python and R, and an 
optimized engine that supports general execution engine."

svn commit: r1869170 - in /zeppelin/site/docs/0.8.2: atom.xml interpreter/spark.html rss.xml search_data.json

Reply via email to