[Rd] Runnable R packages
Dear all, I’m working as a data scientist in a major tech company. I have been using R for almost 20 years now and there’s one issue that’s been bugging me of late. I apologize in advance if this has been discussed before. R has traditionally been used for running short scripts or data analysis notebooks, but there’s recently been a growing interest in developing full applications in the language. Three examples come to mind: 1) The Shiny web application framework, which facilitates the developent of rich, interactive web applications 2) The httr package, which provides lower-level facilities than Shiny for writing web services 3) Batch jobs run by data scientists according to, say, a cron schedule Compared with other languages, R’s support for such applications is rather poor. The Rscript program is generally used to run an R script or an arbitrary R expression, but I feel it suffers from a few problems: 1) It encourages developers of batch jobs to provide their code in a single R file (bad for code structure and unit-testability) 2) It provides no way to deal with dependencies on other packages 3) It provides no way to "run" an application provided as an R package For example, let’s say I want to run a Shiny application that I provide as an R package (to keep the code modular, to benefit from unit tests, and to declare dependencies properly). I would then need to a) uncompress my R package, b) somehow, ensure my dependencies are installed, and c) call runApp(). This can get tedious, fast. Other languages let the developer package their code in "runnable" artefacts, and let the developer specify the main entry point. The mechanics depend on the language but are remarkably similar, and suggest a way to implement this in R. Through declarations in some file, the developer can often specify dependencies and declare where the program’s "main" function resides. Consider Java: Artefact: .jar file Declarations file: Manifest file Entry point: declared as 'Main-Class' Executed as: java -jar Or Python: Artefact: Python package, typically as .tar.gz source distribution file Declarations file: setup.py (which specifies dependencies) Entry point: special __main__() function Executed as: python -m R has already much of this machinery: Artefact: R package Declarations file: DESCRIPTION Entry point: ? Executed as: ? I feel that R could benefit from letting the developer specify, possibly in DESCRIPTION, how to "run" the package. The package could then be run through, for example, a new R CMD command, for example: R CMD RUN I’m sure there are plenty of wrinkles in this idea that need to be ironed out, but is this something that has ever been considered, or that is on R’s roadmap? Thanks for reading so far, David Lindelöf, Ph.D. +41 (0)79 415 66 41 or skype:david.lindelof http://computersandbuildings.com Follow me on Twitter: http://twitter.com/dlindelof [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
Belated thanks to all who replied to my initial query. In summary, three approaches have been mentioned to run R code "in production": 1) ShinyProxy, mentioned by Tobias, for deploying Shiny applications; 2) Docker-like solutions, mentioned by Gergely and Iñaki; and 3) Solutions based on Rscript or littler, mentioned by Dirk. I can't speak to 1) because I don't currently use Shiny. And it seems to me that Docker-like solutions will still need some "point of entry" for the R application, which will have to be Rscript or littler. In my first email, I observed that Rscript expects a single expression or a single script, which is probably why (in my experience) many data scientists tend to provide their code in a very limited number of files. Gergely disagreed, arguing to the contrary that data scientists are encouraged to provide their application as an R package called by a short script executed by Rscript. But this doesn't happen where I work for several reasons: - it implies installing your package on the production machine(s), including its dependencies, which must be done by hand - some machine learning platforms will simply not accept code provided as an R package - we have some "big data" use cases for which we need Spark; Spark can run R or Python code, but only when it is provided as a single file. (On the other hand, Spark can run applications provided as JAR files) In summary, I'm convinced R would benefit from something similar to Java's `Main-Class` header or Python's `__main__()` function. A new R CMD command would take a package, install its dependencies, and run its "main" function. If we have this machinery available, we could even consider reaching out to Spark (and other tech stacks) developers and make it easier to develop R applications for those platforms. A candid comment from Dirk suggested that I should implement this myself, which I would be happy to do, provided this is the normal procedure. Or is there a more formal process I should follow? Kind regards, David Lindelöf [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
Would you care to share how your package installs its own dependencies? I assume this is done during the call to `main()`? (Last time I checked, R CMD INSTALL would not install a package's dependencies...) On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson < b.rowling...@lancaster.ac.uk> wrote: > > > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof wrote: > >> >> In summary, I'm convinced R would benefit from something similar to Java's >> `Main-Class` header or Python's `__main__()` function. A new R CMD command >> would take a package, install its dependencies, and run its "main" >> function. > > > > I just created and built a very boilerplate R package called "runme". I > can install its dependencies and run its "main" function with: > > $ R CMD INSTALL runme_0.0.0.9000.tar.gz > $ R -e 'runme::main()' > > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with > python and java and C the entrypoint is more tightly specified (__name__ == > "__main__" in python, int main(..) in C and so on). But I don't think > that's much of a problem. > > Does that not satisfy your requirements close enough? If you want it in > one line then: > > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' > > will do the second if the first succeeds (Unix shells). > > You could write a script for $RHOME/bin/RUN which would be a two-liner and > that could mandate the use of "main" as an entry point. But good luck > getting anything into base R. > > Barry > > > > >> If we have this machinery available, we could even consider >> reaching out to Spark (and other tech stacks) developers and make it >> easier >> to develop R applications for those platforms. >> >> > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
@Barry I'm not sure your proposal would work, since `R CMD INSTALL` won't install a package's dependencies. Indeed it will fail with an error unless all the dependencies are met before calling it. Speaking of which, why doesn't R CMD INSTALL install a package's dependencies? Would it make sense to submit this as a desirable feature? Cheers, David On Thu, Jan 31, 2019 at 4:38 PM Barry Rowlingson < b.rowling...@lancaster.ac.uk> wrote: > > > On Thu, Jan 31, 2019 at 3:14 PM David Lindelof wrote: > >> >> In summary, I'm convinced R would benefit from something similar to Java's >> `Main-Class` header or Python's `__main__()` function. A new R CMD command >> would take a package, install its dependencies, and run its "main" >> function. > > > > I just created and built a very boilerplate R package called "runme". I > can install its dependencies and run its "main" function with: > > $ R CMD INSTALL runme_0.0.0.9000.tar.gz > $ R -e 'runme::main()' > > No new R CMDs needed. Now my choice of "main" is arbitrary, whereas with > python and java and C the entrypoint is more tightly specified (__name__ == > "__main__" in python, int main(..) in C and so on). But I don't think > that's much of a problem. > > Does that not satisfy your requirements close enough? If you want it in > one line then: > > R CMD INSTALL runme_0.0.0.9000.tar.gz && R -e 'runme::main()' > > will do the second if the first succeeds (Unix shells). > > You could write a script for $RHOME/bin/RUN which would be a two-liner and > that could mandate the use of "main" as an entry point. But good luck > getting anything into base R. > > Barry > > > > >> If we have this machinery available, we could even consider >> reaching out to Spark (and other tech stacks) developers and make it >> easier >> to develop R applications for those platforms. >> >> > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
I see some value in Duncan’s proposal to implement this as an extra package instead of a change to base R, if only to see if the idea has legs. I’m minded to do so myself using your suggestion, but is there a particular reason why you recommend using the remotes package instead of devtools? The latter seems to have the same functions I would need, and I believe it is more widely installed that remotes? Kind regards, From: Duncan Murdoch Reply: Duncan Murdoch Date: 2 February 2019 at 15:37:16 To: Barry Rowlingson , Abs Spurdle Cc: r-devel Subject: Re: [Rd] Runnable R packages On 02/02/2019 8:27 a.m., Barry Rowlingson wrote: > I don't think anyone denies that you *could* make an EXE to do all > that. The discussion is on *how easy* it should be to create a single > file that contains an initial "main" function plus a set of bundled > code (potentially as a package) and which when run will install its > package code (which is contained in itself, its not in a repo), > install dependencies, and run the main() function. > > Now, I could build a self-executable shar file that bundled a package > together with a script to do all the above. But if there was a "RUN" > command in R, and a convention that a function called "foo::main" > would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much > easier to develop and test. I don't believe the "so much easier" argument that this requires a change to base R. If you put that functionality into a package, then the only extra effort the user would require is to install that other package. After that, they could run Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')" as I suggested before. This is no harder than running R CMD RUN foo_1.1.1.tar.gz The advantage of this from R Core's perspective is that you would be developing and maintaining "yourpackage", you wouldn't be passing the burden on to them. The advantage from your perspective is that you could work with whatever packages you liked. The "remotes" package has almost everything you need so that "yourpackage" could be nearly trivial. You wouldn't need to duplicate it within base R. Duncan Murdoch > > If people think this adds value, then if they want to offer that value > to me as $ or £, I'd consider writing it if their total value was more > than my cost > > Barry > > > On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle wrote: >> >> Further to my previous post, >> it would be possible to create an .exe file, say: >> >> my_r_application.exe >> >> That starts R, loads your R package(s), calls the R function of your choice >> and does whatever else you want. >> >> However, I don't think that it would add much value. >> But feel free to correct me if you think that I'm wrong. >> >> [[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel David Lindelöf, Ph.D. +41 (0)79 415 66 41 or skype:david.lindelof http://computersandbuildings.com Follow me on Twitter: http://twitter.com/dlindelof [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
Yesterday I wrote and submitted to CRAN a package `run`, which implements the ideas discussed in this thread. Given a package tarball foo_0.1.0.tar.gz, users will be able to run Rscript -e "run::run('foo_0.1.0.tar.gz')" which will pull all the dependencies of package `foo`, lookup a function `main` in that package's namespace, and call it. It's an early draft but I'd appreciate any feedback (once its submission is accepted, of course). Thanks all for your help and advice, David On Sat, Feb 2, 2019 at 3:37 PM Duncan Murdoch wrote: > On 02/02/2019 8:27 a.m., Barry Rowlingson wrote: > > I don't think anyone denies that you *could* make an EXE to do all > > that. The discussion is on *how easy* it should be to create a single > > file that contains an initial "main" function plus a set of bundled > > code (potentially as a package) and which when run will install its > > package code (which is contained in itself, its not in a repo), > > install dependencies, and run the main() function. > > > > Now, I could build a self-executable shar file that bundled a package > > together with a script to do all the above. But if there was a "RUN" > > command in R, and a convention that a function called "foo::main" > > would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much > > easier to develop and test. > > I don't believe the "so much easier" argument that this requires a > change to base R. If you put that functionality into a package, then > the only extra effort the user would require is to install that other > package. After that, they could run > > Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')" > > as I suggested before. This is no harder than running > > R CMD RUN foo_1.1.1.tar.gz > > The advantage of this from R Core's perspective is that you would be > developing and maintaining "yourpackage", you wouldn't be passing the > burden on to them. The advantage from your perspective is that you > could work with whatever packages you liked. The "remotes" package has > almost everything you need so that "yourpackage" could be nearly > trivial. You wouldn't need to duplicate it within base R. > > Duncan Murdoch > > > > > If people think this adds value, then if they want to offer that value > > to me as $ or £, I'd consider writing it if their total value was more > > than my cost > > > > Barry > > > > > > On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle wrote: > >> > >> Further to my previous post, > >> it would be possible to create an .exe file, say: > >> > >> my_r_application.exe > >> > >> That starts R, loads your R package(s), calls the R function of your > choice > >> and does whatever else you want. > >> > >> However, I don't think that it would add much value. > >> But feel free to correct me if you think that I'm wrong. > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Runnable R packages
Sure, you can find it here: https://github.com/dlindelof/run On Fri, Feb 8, 2019 at 9:41 AM Rainer M Krug wrote: > Sounds interesting. Do you have it on GitHub or similar? > > Rainer > > On 8 Feb 2019, at 09:09, David Lindelof wrote: > > Yesterday I wrote and submitted to CRAN a package `run`, which implements > the ideas discussed in this thread. Given a package tarball > foo_0.1.0.tar.gz, users will be able to run > > Rscript -e "run::run('foo_0.1.0.tar.gz')" > > which will pull all the dependencies of package `foo`, lookup a function > `main` in that package's namespace, and call it. > > It's an early draft but I'd appreciate any feedback (once its submission is > accepted, of course). > > Thanks all for your help and advice, > > David > > On Sat, Feb 2, 2019 at 3:37 PM Duncan Murdoch > wrote: > > On 02/02/2019 8:27 a.m., Barry Rowlingson wrote: > > I don't think anyone denies that you *could* make an EXE to do all > that. The discussion is on *how easy* it should be to create a single > file that contains an initial "main" function plus a set of bundled > code (potentially as a package) and which when run will install its > package code (which is contained in itself, its not in a repo), > install dependencies, and run the main() function. > > Now, I could build a self-executable shar file that bundled a package > together with a script to do all the above. But if there was a "RUN" > command in R, and a convention that a function called "foo::main" > would be run by `R CMD RUN foo_1.1.1.tar.gz` then it would be so much > easier to develop and test. > > > I don't believe the "so much easier" argument that this requires a > change to base R. If you put that functionality into a package, then > the only extra effort the user would require is to install that other > package. After that, they could run > > Rscript -e "yourpackage::run_main('foo_1.1.1.tar.gz')" > > as I suggested before. This is no harder than running > > R CMD RUN foo_1.1.1.tar.gz > > The advantage of this from R Core's perspective is that you would be > developing and maintaining "yourpackage", you wouldn't be passing the > burden on to them. The advantage from your perspective is that you > could work with whatever packages you liked. The "remotes" package has > almost everything you need so that "yourpackage" could be nearly > trivial. You wouldn't need to duplicate it within base R. > > Duncan Murdoch > > > If people think this adds value, then if they want to offer that value > to me as $ or £, I'd consider writing it if their total value was more > than my cost > > Barry > > > On Sat, Feb 2, 2019 at 12:54 AM Abs Spurdle wrote: > > > Further to my previous post, > it would be possible to create an .exe file, say: > > my_r_application.exe > > That starts R, loads your R package(s), calls the R function of your > > choice > > and does whatever else you want. > > However, I don't think that it would add much value. > But feel free to correct me if you think that I'm wrong. > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > > -- > Rainer M. Krug, PhD (Conservation Ecology, SUN), MSc > (Conservation Biology, UCT), Dipl. Phys. (Germany) > > Department of Evolutionary Biology and Environmental Studies > University of Zürich > Office Y34-J-74 > Winterthurerstrasse 190 > 8075 Zürich > Switzerland > > Office: +41 (0)44 635 47 64 > Cell:+41 (0)78 630 66 57 > email: rainer.k...@uzh.ch > rai...@krugs.de > Skype: RMkrug > > PGP: 0x0F52F982 > > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel