[Rd] Is there a way to disable / warn about forking?
Dear R developers, with the inclusion of the package "parallel" in the upcoming release of R, users and package developers are likely to make increasing usage of parallelization features. In part, these features rely on forking the R process. As ?mcfork points out, fork()ing in a GUI process is typically a bad idea. In RKWard, we "only" seem to have problems with signals arriving in the wrong threads, and occasional failure to collect the results from child processes. I haven't entirely given up the hope to fix this, eventually, but in consequence, parallelization based on forking is not currently usable inside an RKWard session. I am somewhat worried that, as library(parallel) gains acceptance, unsuspecting users will increasingly start to run into forking related problems in RKWard and other environments. Therefore, I wish: - The warning from ?mcfork about potential complications should also be visible on the documentation pages for the higher level functions mcparallel(), mclapply(), but also makeForkCluster(). - It would be nice to have a way to tell library(parallel) that forking is a bad idea in the current session, so that - mcfork() could stop with an informative error message, or at least produce a warning; mclapply() could fall back to mc.cores=1 with a warning. - third party packages which wish to use parallelization could check whether it is safe to use forking, or whether another mechanism should be used. I am aware that options(mc.cores=1) will effectively disable forking in mclapply(). However, this would make it look like (local) parallelization is not worth while at all, while actually, parallelization with makePSOCKCluster() works just fine. So, I'm looking for a way to selectively disable the use of forking. Thanks! Thomas signature.asc Description: This is a digitally signed message part. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Finding inter-function dependencies within a package
Following helpful correspondence with Mark Bravington, mvbutils::foodweb and callers.of can do exactly what I wanted very neatly and easily. The trick is to use base::asNamespace to see non-exported objects in the package. base::asNamespace is described as "Internal name space support functions. Not intended to be called directly" so I'll forgive myself for not knowing about it previously, and keep an eye open for a more mainstream way to do the same job. But given that, this does my job perfectly - > library(mvbutils) > ff <- foodweb( funs=find.funs( asNamespace( 'sensory')), where= > asNamespace( 'sensory'), prune='CreateMeanFizz') > callers.of('CreateMeanFizz', ff) --- where 'sensory' is the loaded package I want to search, and 'CreateMeanFizz' is the non-exported function of which I want to find callers. Very nice! Thanks Mark! Keith J "Keith Jewell" wrote in message news:j64058$unj$1...@dough.gmane.org... > Thanks for the suggestions. Just to wrap up this thread... > > Rainer Krug pointed out that Roxygen did have dependency graphs, although > Roxygen2 doesn't. But I guess (probably wrongly!) that I'd need to > process/modify the .R files to use that, and I'm not the package author. > > Duncan Murdoch pointed out codetools::findGlobals which can be used to > find functions called by a target function. But I want to find functions > calling a target function. > > Mark Bravington pointed out mvbutils::foodweb and callers.of which almost > do what I want (I think it was this I half remembered). But this works in > the namespace of the package, and my target function isn't exported so > foodweb doesn't see it! > > Working from Duncan's suggestion I came up with this, not pretty or fast, > could certainly be improved, but it did my one-off job.: > > # return a character vector of names of functions in 'tarPack' (character) > which directly call the function 'tarFunc' (character) > called.by <- function(tarFunc, tarPack){ > require(codetools) > flist <- sapply(lsf.str(tarPack, all=TRUE), c) > names(flist) <- NULL > gotit <- sapply(flist, function(x) tarFunc %in% findGlobals(get(x, > tarPack), FALSE)$functions) > flist[gotit] > } > # e.g. > called.by("CreateMeanFizz", "package:sensory") > -- > > Thanks again for the input. > > Keith Jewell > >>> Hi, >>> >>> I'd like to know which functions in a package call one specific >>> function. >>> I think I've seen a tool for identifying such dependencies, but now I >>> can't find it :-( Searches of help and R site search for keywords >>> like function, call, tree, depend haven't helped :-( >>> >>> Can anyone point me in the right direction? >>> >>> Thanks in advance, >>> >>> Keith Jewell >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Is there a way to disable / warn about forking?
On Oct 4, 2011, at 4:43 AM, Thomas Friedrichsmeier wrote: > Dear R developers, > > with the inclusion of the package "parallel" in the upcoming release of R, > users and package developers are likely to make increasing usage of > parallelization features. In part, these features rely on forking the R > process. As ?mcfork points out, fork()ing in a GUI process is typically a bad > idea. In RKWard, we "only" seem to have problems with signals arriving in the > wrong threads, and occasional failure to collect the results from child > processes. I haven't entirely given up the hope to fix this, eventually, but > in > consequence, parallelization based on forking is not currently usable inside > an RKWard session. > > I am somewhat worried that, as library(parallel) gains acceptance, > unsuspecting users will increasingly start to run into forking related > problems in RKWard and other environments. I don't see why this should be anything new - this is already happening since both packages that were folded into parallel (snow and multicore) are well known and well used. In multicore we were explicitly warning about this and also working around issues where possible (e.g. the Mac GUI, for example). Judging by the widespread use of multicore and the absence of problem reports related to GUIs, my impression would be that this aspect is not really a problem (more below). We get more users confused about the inability to perform side-effects than this, for example. In general, there are two main issues that can be addressed by the GUI: a) shared file descriptors. This is a problem if the GUI uses FDs for communication and they are not closed in the child instance. You don't want both the child and the parent to process those FDs. E.g., closeAll() can be used to work around that issue and with parallel there could be an easier interface for this given that it's in core R. b) event loop. If the GUI hooks into the event loop then, obviously, this is only intended to be run from the master. multicore was already disabling the even loop hook for AQUA, but it was hard to provide a more comprehensive solution since it needed cooperation of R. In parallel it's much easier, because it can modify R to allow the event loop conditionally and thus only in the master process. The whole point of parallel is that it can do more than an external package, so I think you're going about it the wrong way - you should be talking to us much earlier so whatever your constraints in RKWard can be possibly addressed by the infrastructure. Also note that a lot of this should be seamless, a lot of users don't care what the infrastructure is, they just want their task to run in parallel, they don't care about mcfork() and the like - the choices will be made for them, because there is no fork on Windows, for example. > Therefore, I wish: > - The warning from ?mcfork about potential complications should also be > visible on the documentation pages for the higher level functions > mcparallel(), mclapply(), but also makeForkCluster(). > - It would be nice to have a way to tell library(parallel) that forking is a > bad idea in the current session, so that > - mcfork() could stop with an informative error message, or at least produce > a warning; mclapply() could fall back to mc.cores=1 with a warning. > - third party packages which wish to use parallelization could check whether > it is safe to use forking, or whether another mechanism should be used. > > I am aware that options(mc.cores=1) will effectively disable forking in > mclapply(). However, this would make it look like (local) parallelization is > not worth while at all, while actually, parallelization with > makePSOCKCluster() works just fine. So, I'm looking for a way to selectively > disable the use of forking. > Cheers, Simon __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] number of copies - summary
My thanks to Bill Dunlap and Simon Urbanek for clarifying many of the details. This gives me what I need to go forward. Yes, I will likely convert more and more things to .Call over time. This clearly gives the most control over excess memory copies. I am getting more comments from people using survival on huge data sets so memory usage is an issue I'll be spending more thought on. I'm not nearly as negative about .C as Simon. Part of this is long experience with C standalone code: one just gets used to the idea that mismatched args to a subroutine are deadly. A test of all args to .C (via insertion of a browser call) is part of initial code development. Another is that constructing the return argument from .Call (a list with names) is a bit of a pain. So I will sometimes use dup=F. However, the opion of R core about .C is clear, so it behooves me to move away from it. Thanks again for the useful comments. Terry Therneau __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Is there a way to disable / warn about forking?
Hi, On Tuesday 04 October 2011, Simon Urbanek wrote: > I don't see why this should be anything new - this is already happening > since both packages that were folded into parallel (snow and multicore) > are well known and well used. > > In multicore we were explicitly warning about this and also working around > issues where possible (e.g. the Mac GUI, for example). Judging by the > widespread use of multicore and the absence of problem reports related to > GUIs, my impression would be that this aspect is not really a problem > (more below). We get more users confused about the inability to perform > side-effects than this, for example. Well, some users do heed the advice to address their problem reports to the package / GUI maintainers, esp., if they experience that the problem only occurs with the GUI loaded, not in a "plain" R session. We've had a problem report about using mclapply() for a while in the RKWard bug tracker, already. > In general, there are two main issues that can be addressed by the GUI: > > a) shared file descriptors. This is a problem if the GUI uses FDs for > communication and they are not closed in the child instance. You don't > want both the child and the parent to process those FDs. E.g., closeAll() > can be used to work around that issue and with parallel there could be an > easier interface for this given that it's in core R. > > b) event loop. If the GUI hooks into the event loop then, obviously, this > is only intended to be run from the master. multicore was already > disabling the even loop hook for AQUA, but it was hard to provide a more > comprehensive solution since it needed cooperation of R. In parallel it's > much easier, because it can modify R to allow the event loop conditionally > and thus only in the master process. For me the problem set was having multiple threads + mutexes, linking to a library that installs a SIGCHLD handler, code waiting for the "communicator" thread to negotiate something with the frontend, except that thread doesn't exist in the fork()ed child process... After spending the day debugging, I think, I have finally solved the key issues for RKWard. That also means the issue is mostly painless for me, now. However, addressing fork()-related issues is not always a trivial exercise, and I continue to think that it could be useful for maintainers of "problematic" packages to have a way to stop / warn direct and indirect users running mcfork(). > The whole point of parallel is that it can do more than an external > package, so I think you're going about it the wrong way - you should be > talking to us much earlier so whatever your constraints in RKWard can be > possibly addressed by the infrastructure. Also note that a lot of this > should be seamless, a lot of users don't care what the infrastructure is, > they just want their task to run in parallel, they don't care about > mcfork() and the like - the choices will be made for them, because there > is no fork on Windows, for example. Exactly. I want the choice to be made for the user, where reasonably possible. My point is that knowing whether you're on Windows or a Unix is not enough to decide on the technique to use, in this case. Reliably enumerating all corner cases where forking could be a problem on Unix is probably next to impossible. The developers responsible for those corner cases have a decent chance to be aware of the problem, though. And thus, I think it would be a good idea, if they had a standard way of informing library(parallel), and any third party using library(parallel), if there is a problem with forking. Regards Thomas signature.asc Description: This is a digitally signed message part. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] segfault after .C call.
Hi there, I think I'm encountering a bug, and I already reported it here: https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14695 But meanwhile, could you help me by any suggestions about the problem? I'll place the content of the reported bug here. You can find attachments on bugzilla if you were that generous to check it for me. Best and Thanks in Advance, Adrin. Bug report content: I have a C/C++ code and I'm calling a function from that code using .C I did check for memory leak in my C code using valgrind and the output of that is attached. But when I run the same code from R, I get segfault in older versions immediately. In 2.13.2 I will get an exception when I run gc() right after my R script. Reproduction steps: 1. extract the prj.tar.gz 2. call test.R from the src directory inside the package (source("test.R"). That will compile the C code if the .so file is not up to date. 3. even call something like gc(), or call test.R again. To reproduce the valgrind output of the C code itself: 1. compile the code. To do that you can run the following g++ command within the src directory: g++ -m64 -I/usr/include/R -I/usr/local/include -fpic -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic kbest.cpp 2. Run the valgrind from the src directory like this: valgrind --leak-check=full ./a.out ../data/Signs.csv ../data/Pvals.csv 212000 2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R-devel (2.14 alpha) Windows binary
Hello, This question popped up on the bioc-devel list, I'm forwarding it here. I know that sources for R-2.14 alpha can be found here: http://cran.r-project.org/src/base-prerelease/ But the OP (below) is asking about Windows binaries. Dan -- Forwarded message -- From: Stefan McKinnon Høj-Edwards Date: 2011/10/4 Subject: Re: [Bioc-devel] Bioc 2.9 New Package submissions. The October deadlines are fast approaching. To: "bioc-de...@r-project.org" According to the BioC release plan, we should use R-2.14.0 devel-alpha to check our packages. But on cran, we can either choose R 2.13.2 (http://cran.r-project.org/bin/windows/base/) or R 2.15.0 devel (http://cran.r-project.org/bin/windows/base/rdevel.html). Am I missing something? And will it again be possible to get the binaries for Windows of the 2.14.0 devel-alpha? Kind regards Stefan McKinnon Høj-Edwards Dept. of Molecular Biology and Genetics Ph.D. Fellow Aarhus University Blichers Allé 20, Postboks 50 DK-8830 Tjele Tel.: +45 8715 7969 Tel.: +45 8715 6000 Email: stefan.hoj-edwa...@agrsci.dk Web: www.agrsci.dk -Oprindelig meddelelse- Date: Mon, 03 Oct 2011 16:36:15 -0700 From: Marc Carlson To: bioc-de...@r-project.org Subject: [Bioc-devel] Bioc 2.9 New Package submissions. The October deadlines are fast approaching. Message-ID: <4e8a46ef.3030...@fhcrc.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dear Bioconductor developers, This announcement is for package authors who are in the process of developing packages for inclusion in the upcoming 2.9 release of Bioconductor in October. The deadline to make a submission for this release is going to be October 10th, 2011. Packages submitted after the deadline will probably not provide us with sufficient time for inclusion in this release. Earlier submissions are encouraged as it provides more time to correct issues raised during the review process and probably has an improved chance of making it into the release. If you do not have your package ready in time for this deadline, your package will still be included in our development branch of Bioconductor and will be scheduled for inclusion in the subsequent release version of Bioconductor. Other important deadlines to consider are listed on our Release schedule. If you have packages in the repository (and most of you do), it is probably a good idea to follow the following link and see the other deadlines that are approaching. http://www.bioconductor.org/developers/release-schedule/ Sincerely, The Biocore Team ___ bioc-de...@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault after .C call.
The bug is in your code! (I see at least one - many buffer overflows in all char** output arguments). Please don't abuse the bug tracking system for usage questions. You may want to consider using either .Call (if you are familiar with R) or Rcpp (if you are more familiar with C++), .C is not the right tool here. Cheers, Simon On Oct 4, 2011, at 5:49 PM, Adrin wrote: > Hi there, > > I think I'm encountering a bug, and I already reported it here: > https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14695 > > But meanwhile, could you help me by any suggestions about the problem? > > I'll place the content of the reported bug here. You can find > attachments on bugzilla if you were that generous to check it for me. > > Best and Thanks in Advance, > Adrin. > > Bug report content: > > I have a C/C++ code and I'm calling a function from that code using .C > > I did check for memory leak in my C code using valgrind and the output of that > is attached. But when I run the same code from R, I get segfault in older > versions immediately. In 2.13.2 I will get an exception when I run gc() right > after my R script. > > Reproduction steps: > > 1. extract the prj.tar.gz > 2. call test.R from the src directory inside the package (source("test.R"). > That will compile the C code if the .so file is not up to date. > 3. even call something like gc(), or call test.R again. > > To reproduce the valgrind output of the C code itself: > > 1. compile the code. To do that you can run the following g++ command within > the src directory: > > g++ -m64 -I/usr/include/R -I/usr/local/include -fpic -O2 -g -pipe -Wall > -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > --param=ssp-buffer-size=4 -m64 -mtune=generic kbest.cpp > > 2. Run the valgrind from the src directory like this: > > valgrind --leak-check=full ./a.out ../data/Signs.csv ../data/Pvals.csv > 212000 2 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Moderating consequences of garbage collection when in C
Allocating many small objects triggers numerous garbage collections as R grows its memory, seriously degrading performance. The specific use case is in creating a STRSXP of several 1,000,000's of elements of 60-100 characters each; a simplified illustration understating the effects (because there is initially little to garbage collect, in contrast to an R session with several packages loaded) is below. A simple solution is to provide a mechanism for the C programmer to request sufficient memory in advance. R_gc_needed might also be re-used at two other locations in memory.c (2221 and 2361) and could be exposed at the R level via a new argument, with default 0, to gc(). %> time R --vanilla -e "dyn.load('gc.so'); x = .Call('doit', 100, FALSE)" > dyn.load('gc.so'); x = .Call('doit', 100, FALSE) real0m9.865s user0m9.697s sys 0m0.146s %> time R --vanilla -e "dyn.load('gc.so'); x = .Call('doit', 100, TRUE)" > dyn.load('gc.so'); x = .Call('doit', 100, TRUE) real0m6.952s user0m6.802s sys 0m0.132s This is the test code #include #include "Rdefines.h" SEXP doit(SEXP len, SEXP needed) { int i, n = asInteger(len); char **s = Calloc(n, char *); SEXP ans; for (i = 0; i < n; ++i) { s[i] = Calloc(80, char); sprintf(s[i], "%78d", i); } if (asLogical(needed)) R_gc_needed(80 * n); PROTECT(ans = allocVector(STRSXP, n)); for (i = 0; i < n; ++i) SET_STRING_ELT(ans, i, mkChar(s[i])); UNPROTECT(1); return ans; } and a patch Index: src/include/R_ext/Memory.h === --- src/include/R_ext/Memory.h (revision 57169) +++ src/include/R_ext/Memory.h (working copy) @@ -36,6 +36,7 @@ void vmaxset(const void *); void R_gc(void); +void R_gc_needed(size_t); char* R_alloc(size_t, int); char* S_alloc(long, int); Index: src/main/memory.c === --- src/main/memory.c (revision 57169) +++ src/main/memory.c (working copy) @@ -2503,6 +2503,17 @@ R_gc_internal(0); } +void R_gc_needed(R_size_t size_needed) +{ +if (FORCE_GC || NO_FREE_NODES() || VHEAP_FREE() < size_needed) { + R_gc_internal(size_needed); + if (NO_FREE_NODES()) + mem_err_cons(); + if (VHEAP_FREE() < size_needed) + mem_err_heap(0); +} +} + static void R_gc_full(R_size_t size_needed) { num_old_gens_to_collect = NUM_OLD_GENERATIONS; -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R-devel (2.14 alpha) Windows binary
I suspect that the Windows build system just hasn't caught on to the short time between 2.13.2 and the start of run-in for 2.14.0. I think the usual logic is that pre-releases of the next version replace the patch releases of the old one (by definition, there shouldn't be updates to 2.13.2 anyway). -pd On Oct 5, 2011, at 00:46 , Dan Tenenbaum wrote: > Hello, > This question popped up on the bioc-devel list, I'm forwarding it here. > > I know that sources for R-2.14 alpha can be found here: > > http://cran.r-project.org/src/base-prerelease/ > > But the OP (below) is asking about Windows binaries. > > Dan > > > > -- Forwarded message -- > From: Stefan McKinnon Høj-Edwards > Date: 2011/10/4 > Subject: Re: [Bioc-devel] Bioc 2.9 New Package submissions. The > October deadlines are fast approaching. > To: "bioc-de...@r-project.org" > > > According to the BioC release plan, we should use R-2.14.0 devel-alpha > to check our packages. > But on cran, we can either choose R 2.13.2 > (http://cran.r-project.org/bin/windows/base/) or R 2.15.0 devel > (http://cran.r-project.org/bin/windows/base/rdevel.html). Am I missing > something? And will it again be possible to get the binaries for > Windows of the 2.14.0 devel-alpha? > > > Kind regards > Stefan McKinnon Høj-Edwards Dept. of Molecular Biology and > Genetics > Ph.D. Fellow Aarhus University > Blichers Allé 20, Postboks 50 > DK-8830 Tjele > Tel.: +45 8715 7969 Tel.: +45 8715 6000 > Email: stefan.hoj-edwa...@agrsci.dk Web: www.agrsci.dk > > > > > -Oprindelig meddelelse- > Date: Mon, 03 Oct 2011 16:36:15 -0700 > From: Marc Carlson > To: bioc-de...@r-project.org > Subject: [Bioc-devel] Bioc 2.9 New Package submissions. The October >deadlines are fast approaching. > Message-ID: <4e8a46ef.3030...@fhcrc.org> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > Dear Bioconductor developers, > > > This announcement is for package authors who are in the process of > developing packages for inclusion in the upcoming 2.9 release of > Bioconductor in October. > > The deadline to make a submission for this release is going to be > October 10th, 2011. Packages submitted after the deadline will probably > not provide us with sufficient time for inclusion in this release. > Earlier submissions are encouraged as it provides more time to correct > issues raised during the review process and probably has an improved > chance of making it into the release. > > If you do not have your package ready in time for this deadline, your > package will still be included in our development branch of Bioconductor > and will be scheduled for inclusion in the subsequent release version of > Bioconductor. > > > Other important deadlines to consider are listed on our Release > schedule. If you have packages in the repository (and most of you do), > it is probably a good idea to follow the following link and see the > other deadlines that are approaching. > > http://www.bioconductor.org/developers/release-schedule/ > > > > Sincerely, > > > The Biocore Team > > ___ > bioc-de...@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd@cbs.dk Priv: pda...@gmail.com __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel