This is an automated email from the ASF dual-hosted git repository. zjffdu pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/zeppelin.git
The following commit(s) were added to refs/heads/master by this push: new 908efa3 [ZEPPELIN-4529]. Update R interpreter document 908efa3 is described below commit 908efa3fc71c23ae3ec94f868fa835f208ca05d3 Author: Jeff Zhang <zjf...@apache.org> AuthorDate: Thu Jan 16 16:10:56 2020 +0800 [ZEPPELIN-4529]. Update R interpreter document ### What is this PR for? This PR update the R interpreter document. In this document, I highlight how to use `%r.r`, `%r.ir` and how to create shiny app in R Interpreter. ### What type of PR is it? [ Documentation ] ### Todos * [ ] - Task ### What is the Jira issue? * https://issues.apache.org/jira/browse/ZEPPELIN-4529 ### How should this be tested? * No test needed ### Screenshots (if appropriate) ### Questions: * Does the licenses files need update? No * Is there breaking changes for older versions? No * Does this needs documentation? No Author: Jeff Zhang <zjf...@apache.org> Closes #3629 from zjffdu/ZEPPELIN-4529 and squashes the following commits: ce5767712 [Jeff Zhang] [ZEPPELIN-4529]. Update R interpreter document --- docs/_includes/themes/zeppelin/_navigation.html | 2 +- .../themes/zeppelin/img/docs-img/backtoscala.png | Bin 36308 -> 0 bytes .../themes/zeppelin/img/docs-img/knitgeo.png | Bin 59594 -> 0 bytes .../themes/zeppelin/img/docs-img/knitstock.png | Bin 108868 -> 0 bytes .../themes/zeppelin/img/docs-img/r_basic.png | Bin 0 -> 50805 bytes .../themes/zeppelin/img/docs-img/r_ggplot.png | Bin 0 -> 240763 bytes .../themes/zeppelin/img/docs-img/r_googlevis.png | Bin 0 -> 143390 bytes .../themes/zeppelin/img/docs-img/r_plotting.png | Bin 0 -> 163075 bytes .../themes/zeppelin/img/docs-img/r_shiny.png | Bin 0 -> 235777 bytes .../themes/zeppelin/img/docs-img/replhead.png | Bin 42923 -> 0 bytes .../zeppelin/img/docs-img/sparkrfaithful.png | Bin 52235 -> 0 bytes docs/assets/themes/zeppelin/img/docs-img/varr1.png | Bin 16703 -> 0 bytes docs/assets/themes/zeppelin/img/docs-img/varr2.png | Bin 18973 -> 0 bytes .../themes/zeppelin/img/docs-img/varscala.png | Bin 21073 -> 0 bytes docs/interpreter/r.md | 209 ++++++++++++++++----- 15 files changed, 165 insertions(+), 46 deletions(-) diff --git a/docs/_includes/themes/zeppelin/_navigation.html b/docs/_includes/themes/zeppelin/_navigation.html index a1796e7..f19dc30 100644 --- a/docs/_includes/themes/zeppelin/_navigation.html +++ b/docs/_includes/themes/zeppelin/_navigation.html @@ -127,6 +127,7 @@ <li><a href="{{BASE_PATH}}/interpreter/spark.html">Spark</a></li> <li><a href="{{BASE_PATH}}/interpreter/jdbc.html">JDBC</a></li> <li><a href="{{BASE_PATH}}/interpreter/python.html">Python</a></li> + <li><a href="{{BASE_PATH}}/interpreter/r.html">R</a></li> <li role="separator" class="divider"></li> <li><a href="{{BASE_PATH}}/interpreter/alluxio.html">Alluxio</a></li> <li><a href="{{BASE_PATH}}/interpreter/beam.html">Beam</a></li> @@ -151,7 +152,6 @@ <li><a href="{{BASE_PATH}}/interpreter/neo4j.html">Neo4j</a></li> <li><a href="{{BASE_PATH}}/interpreter/pig.html">Pig</a></li> <li><a href="{{BASE_PATH}}/interpreter/postgresql.html">Postgresql, HAWQ</a></li> - <li><a href="{{BASE_PATH}}/interpreter/r.html">R</a></li> <li><a href="{{BASE_PATH}}/interpreter/scalding.html">Scalding</a></li> <li><a href="{{BASE_PATH}}/interpreter/scio.html">Scio</a></li> <li><a href="{{BASE_PATH}}/interpreter/shell.html">Shell</a></li> diff --git a/docs/assets/themes/zeppelin/img/docs-img/backtoscala.png b/docs/assets/themes/zeppelin/img/docs-img/backtoscala.png deleted file mode 100644 index c0c897a..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/backtoscala.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/knitgeo.png b/docs/assets/themes/zeppelin/img/docs-img/knitgeo.png deleted file mode 100644 index d1eb0d8..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/knitgeo.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/knitstock.png b/docs/assets/themes/zeppelin/img/docs-img/knitstock.png deleted file mode 100644 index 7a27c60..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/knitstock.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/r_basic.png b/docs/assets/themes/zeppelin/img/docs-img/r_basic.png new file mode 100644 index 0000000..4c635b9 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/r_basic.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/r_ggplot.png b/docs/assets/themes/zeppelin/img/docs-img/r_ggplot.png new file mode 100644 index 0000000..6fb0588 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/r_ggplot.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/r_googlevis.png b/docs/assets/themes/zeppelin/img/docs-img/r_googlevis.png new file mode 100644 index 0000000..a876683 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/r_googlevis.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/r_plotting.png b/docs/assets/themes/zeppelin/img/docs-img/r_plotting.png new file mode 100644 index 0000000..7272384 Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/r_plotting.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/r_shiny.png b/docs/assets/themes/zeppelin/img/docs-img/r_shiny.png new file mode 100644 index 0000000..4372b8c Binary files /dev/null and b/docs/assets/themes/zeppelin/img/docs-img/r_shiny.png differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/replhead.png b/docs/assets/themes/zeppelin/img/docs-img/replhead.png deleted file mode 100644 index b09ccab..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/replhead.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/sparkrfaithful.png b/docs/assets/themes/zeppelin/img/docs-img/sparkrfaithful.png deleted file mode 100644 index ec956c7..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/sparkrfaithful.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/varr1.png b/docs/assets/themes/zeppelin/img/docs-img/varr1.png deleted file mode 100644 index ac997a8..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/varr1.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/varr2.png b/docs/assets/themes/zeppelin/img/docs-img/varr2.png deleted file mode 100644 index b49988d..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/varr2.png and /dev/null differ diff --git a/docs/assets/themes/zeppelin/img/docs-img/varscala.png b/docs/assets/themes/zeppelin/img/docs-img/varscala.png deleted file mode 100644 index 7f95ad2..0000000 Binary files a/docs/assets/themes/zeppelin/img/docs-img/varscala.png and /dev/null differ diff --git a/docs/interpreter/r.md b/docs/interpreter/r.md index 966dc1e..15bbe2c 100644 --- a/docs/interpreter/r.md +++ b/docs/interpreter/r.md @@ -74,79 +74,198 @@ We recommend you to also install the following optional R libraries for happy da + sqldf + wordcloud -## Configuration - -To run Zeppelin with the R Interpreter, the `SPARK_HOME` environment variable must be set. The best way to do this is by editing `conf/zeppelin-env.sh`. -If it is not set, the R Interpreter will not be able to interface with Spark. - -You should also copy `conf/zeppelin-site.xml.template` to `conf/zeppelin-site.xml`. That will ensure that Zeppelin sees the R Interpreter the first time it starts up. - -## Using the R Interpreter +## Supported Interpreters + +Zeppelin supports R language in 3 interpreters + +<table class="table-configuration"> + <tr> + <th>Name</th> + <th>Class</th> + <th>Description</th> + </tr> + <tr> + <td>%r.r</td> + <td>RInterpreter</td> + <td>Vanilla r interpreter, with least dependencies, only R environment installed is required. + It is always recommended to use the fully qualified interpreter name <code>%r.r</code>code>, because <code>%r</code> is ambiguous, + it could mean both <code>%spark.r</code> and <code>%r.r</code></td> + </tr> + <tr> + <td>%r.ir</td> + <td>IRInterpreter</td> + <td>Provide more fancy R runtime via [IRKernel](https://github.com/IRkernel/IRkernel), almost the same experience like using R in Jupyter. It requires more things, but is the recommended interpreter for using R in Zeppelin.</td> + </tr> + <tr> + <td>%r.shiny</td> + <td>ShinyInterpreter</td> + <td>Run Shiny app in Zeppelin</td> + </tr> +</table> + +If you want to use R with Spark, it is almost the same via `%spark.r`, `%spark.ir` & `%spark.shiny` . You can refer Spark Interpreter docs for more details. -By default, the R Interpreter appears as two Zeppelin Interpreters, `%r` and `%knitr`. - -`%r` will behave like an ordinary REPL. You can execute commands as in the CLI. +## Configuration -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/repl2plus2.png" width="700px"/> +<table class="table-configuration"> + <tr> + <th>Property</th> + <th>Default</th> + <th>Description</th> + </tr> + <tr> + <td>zeppelin.R.cmd</td> + <td>R</td> + <td>Path of the installed R binary. You should set this property explicitly if R is not in your <code>$PATH</code>(example: /usr/bin/R). + </td> + </tr> + <tr> + <td>zeppelin.R.knitr</td> + <td>true</td> + <td>Whether to use knitr or not. It is recommended to install [knitr](https://yihui.org/knitr/)</td> + </tr> + <tr> + <td>zeppelin.R.image.width</td> + <td>100%</td> + <td>Image width of R plotting</td> + </tr> + <tr> + <td>zeppelin.R.shiny.iframe_width</td> + <td>100%</td> + <td>IFrame width of Shiny App</td> + </tr> + <tr> + <td>zeppelin.R.shiny.iframe_height</td> + <td>500px</td> + <td>IFrame height of Shiny App</td> + </tr> +</table> + +## Using the R Interpreter(`%r.r` & `%r.ir`) + +By default, the R Interpreter appears as two Zeppelin Interpreters, `%r.r` and `%r.ir`. + +`%r.r` behaves like an ordinary REPL and use SparkR to communicate between R process and JVM process. +`%r.ir` use IRKernel underneath, it behaves like using IRKernel in Jupyter notebook. + +R basic expression + +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_basic.png" width="800px"/> R base plotting is fully supported -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/replhist.png" width="550px"/> - -If you return a data.frame, Zeppelin will attempt to display it using Zeppelin's built-in visualizations. +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_plotting.png" width="800px"/> -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/replhead.png" width="550px"/> +Besides R base plotting, you can use other visualization library, e.g. `ggplot` and `googlevis` -`%knitr` interfaces directly against `knitr`, with chunk options on the first line: +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_ggplot.png" width="800px"/> -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/knitgeo.png" width="550px"/> +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_googlevis.png" width="800px"/> -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/knitstock.png" width="550px"/> -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/knitmotion.png" width="550px"/> +## Make Shiny App in Zeppelin -The two interpreters share the same environment. If you define a variable from `%r`, it will be within-scope if you then make a call using `knitr`. +[Shiny](https://shiny.rstudio.com/tutorial/) is an R package that makes it easy to build interactive web applications (apps) straight from R. +For developing one Shiny App in Zeppelin, you need to at least 3 paragraphs (server paragraph, ui paragraph and run type paragraph) -## Using SparkR & Moving Between Languages +* Server type R shiny paragraph -If `SPARK_HOME` is set, the `SparkR` package will be loaded automatically: +```r -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/sparkrfaithful.png" width="550px"/> +%r.shiny(type=server) -The Spark Context and SQL Context are created and injected into the local environment automatically as `sc` and `sql`. +# Define server logic to summarize and view selected dataset ---- +server <- function(input, output) { -The same context are shared with the `%spark`, `%sql` and `%pyspark` interpreters: + # Return the requested dataset ---- + datasetInput <- reactive({ + switch(input$dataset, + "rock" = rock, + "pressure" = pressure, + "cars" = cars) + }) -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/backtoscala.png" width="700px"/> + # Generate a summary of the dataset ---- + output$summary <- renderPrint({ + dataset <- datasetInput() + summary(dataset) + }) -You can also make an ordinary R variable accessible in scala and Python: - -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/varr1.png" width="550px"/> - -And vice versa: + # Show the first "n" observations ---- + output$view <- renderTable({ + head(datasetInput(), n = input$obs) + }) +} +``` -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/varscala.png" width="550px"/> +* UI type R shiny paragraph + +```r +%r.shiny(type=ui) + +# Define UI for dataset viewer app ---- +ui <- fluidPage( + + # App title ---- + titlePanel("Shiny Text"), + + # Sidebar layout with a input and output definitions ---- + sidebarLayout( + + # Sidebar panel for inputs ---- + sidebarPanel( + + # Input: Selector for choosing dataset ---- + selectInput(inputId = "dataset", + label = "Choose a dataset:", + choices = c("rock", "pressure", "cars")), + + # Input: Numeric entry for number of obs to view ---- + numericInput(inputId = "obs", + label = "Number of observations to view:", + value = 10) + ), + + # Main panel for displaying outputs ---- + mainPanel( + + # Output: Verbatim text for data summary ---- + verbatimTextOutput("summary"), + + # Output: HTML table with requested number of observations ---- + tableOutput("view") + + ) + ) +) +``` -<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/varr2.png" width="550px"/> +* Run type R shiny paragraph -## Caveats & Troubleshooting +```r -* Almost all issues with the R interpreter turned out to be caused by an incorrectly set `SPARK_HOME`. The R interpreter must load a version of the `SparkR` package that matches the running version of Spark, and it does this by searching `SPARK_HOME`. If Zeppelin isn't configured to interface with Spark in `SPARK_HOME`, the R interpreter will not be able to connect to Spark. +%r.shiny(type=run) -* The `knitr` environment is persistent. If you run a chunk from Zeppelin that changes a variable, then run the same chunk again, the variable has already been changed. Use immutable variables. +``` -* (Note that `%spark.r` and `%r` are two different ways of calling the same interpreter, as are `%spark.knitr` and `%knitr`. By default, Zeppelin puts the R interpreters in the `%spark.` Interpreter Group. +After executing the run type R shiny paragraph, the shiny app will be launched and embedded as Iframe in paragraph. -* Using the `%r` interpreter, if you return a data.frame, HTML, or an image, it will dominate the result. So if you execute three commands, and one is `hist()`, all you will see is the histogram, not the results of the other commands. This is a Zeppelin limitation. +<img class="img-responsive" src="{{BASE_PATH}}/assets/themes/zeppelin/img/docs-img/r_shiny.png" width="800px"/> -* If you return a data.frame (for instance, from calling `head()`) from the `%spark.r` interpreter, it will be parsed by Zeppelin's built-in data visualization system. +### Run multiple shiny app -* Why `knitr` Instead of `rmarkdown`? Why no `htmlwidgets`? In order to support `htmlwidgets`, which has indirect dependencies, `rmarkdown` uses `pandoc`, which requires writing to and reading from disc. This makes it many times slower than `knitr`, which can operate entirely in RAM. +If you want to run multiple shiny app, you can specify `app` in paragraph local property to differentiate shiny app. -* Why no `ggvis` or `shiny`? Supporting `shiny` would require integrating a reverse-proxy into Zeppelin, which is a task. +e.g. -* Max OS X & case-insensitive filesystem. If you try to install on a case-insensitive filesystem, which is the Mac OS X default, maven can unintentionally delete the install directory because `r` and `R` become the same subdirectory. +```r +%r.shiny(type=ui, app=app_1) +``` -* Error `unable to start device X11` with the repl interpreter. Check your shell login scripts to see if they are adjusting the `DISPLAY` environment variable. This is common on some operating systems as a workaround for ssh issues, but can interfere with R plotting. +```r +%r.shiny(type=server, app=app_1) +``` -* akka Library Version or `TTransport` errors. This can happen if you try to run Zeppelin with a SPARK_HOME that has a version of Spark other than the one specified with `-Pspark-1.x` when Zeppelin was compiled. +```r +%r.shiny(type=run, app=app_1) +``` \ No newline at end of file