On Sat, Nov 8, 2014 at 12:29 AM, Wolfgang Huber <whu...@embl.de> wrote: > Il giorno Nov 2, 2014, alle ore 16:10 GMT+1, Duncan Murdoch > <murdoch.dun...@gmail.com> ha scritto: > >> On 01/11/2014, 8:44 PM, Martin Morgan wrote: >>> If I understand correctly, all vignettes in a package are built in the same >>> R >>> process. Global options, loaded packages, etc., in an earlier vignette >>> persist >>> in later vignettes. This can introduce user confusion (e.g., when a later >>> vignette builds successfully because a package is require()'ed in an earlier >>> vignette, but not the current one), difficult-to-identify bugs (e.g., when >>> a setting in an earlier vignette influences calculation in a latter >>> vignette), >>> and misleading information about reproducibility (e.g., when the >>> sessionInfo() >>> of a later vignette reflects packages used in earlier vignettes). >>> >>> I believe the relevant code is at >>> >>> src/library/tools/R/Vignettes.R:505 >>> >>> output <- tryCatch({ >>> ## FIXME: run this in a separate process >>> engine$weave(file, quiet = quiet) >>> setwd(startdir) >>> find_vignette_product(name, by = "weave", engine = engine) >>> }, error = function(e) { >>> stop(gettextf("processing vignette '%s' failed with >>> diagnostics:\n%s", >>> file, conditionMessage(e)), domain = NA, call. = FALSE) >>> }) >>> >>> Is building of each vignette in separate processes a reasonable feature >>> request? >> >> I'm not sure. It's not perfect: users may still see different output >> than the package contains, because when they run the vignette it will >> see their system state, but at least it gives them a way to get the >> identical output. On the other hand, they already have a way to do >> that: just build the whole package. Overall I'd say it's probably a >> good idea. > > Let the perfect be the enemy of the good? > Martin’s proposed improvement would eliminate unnecessary complexity and a > lot of potential (and actual) confusion.
I agree that this is likely a good move and will make the reproducibility at bit more solid. If changing, several things has to be considered: 1. Make sure to run using the exact same R executable and architecture. 2. Make sure to use the exact same .libPaths(), which is particularly important under R CMD check where it's composed of a minimum set of temporary paths. 3. Preserve working directory. ...or should also the working directories be unique in order to bulkhead the vignettes from each other? 4. What other settings needs to be set in order to replicate the state of R CMD build/check? 5. How to deal with standard output and standard error? 6. How to propagate conditions such as warnings and errors? 7. Remember that buildVignette[s]() can be called manually too, not only via R CMD build/check. 8. When you build a vignette manually via buildVignette(), should the vignette change the state of R so it's available for troubleshooting/debugging, inspecting variables and so on? 9. Maybe there is suite of vignettes that needs to be run sequentially in order for them to work. For instance, the first vignette preprocesses the data and the second does EDA on it. I don't think this is currently supported, because I don't think the the order that vignettes are processed is guaranteed (depends on locale), but maybe a decision on supporting/not supporting this needs to be made. 10. Related to 9, when building vignettes in separate R processes, it is tempting to also add support for parallel processing of vignettes. If so, what decisions needs to be made already now in order to allow for that? 11. What else? So any change made needs to done with great care. Adding a local=TRUE to buildVignette[s]() could be a way to please both worlds and allow us to move safely forward until success is proven. Other things such as cleaning up after the vignette engine, may be come easier when running in a separate process, e.g. closing stray graphics devices. BTW, an alternative to run in a separate process would be to have a res <- sandbox({ ... }) that resets the state of R to the entry state upon exit. The major hurdle I see for achieving that is the fact that packages cannot be unloaded properly. One implementation of sandbox({ ... }) would probably be to launch a separate R process. /Henrik > > Wolfgang Huber > >> >> I would prefer a way to detect and warn when vignette output depends on >> the state outside the vignette, but that looks hard to do. >> >> Duncan Murdoch >> >> ______________________________________________ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel