[Rd] Printing warning messages around R_tryEval
Hi! In RKWard we use R_tryEval() at a number places to run R commands. In order to make sure that any generated warnings become visible close to the problem, we were following up (most of) these calls with Rf_PrintWarnings(). Rf_PrintWarnings() was never available in the headers (as far as I know), and in r61771 the symbol has been hidden from libR.so. So this solution is no longer available. Could you give advice on the best way to print warnings before returning to the top level? Some available options, I am aware of: 1. Call R function warnings(). However, this will always print the latest warning, even if it was generate by some earlier command. I would need a way to print new warnings, only. 2. Use options(warn=1) where applicable. However, in some cases, collecting warnings until some procedure is complete, and printing them then, would be preferrable. 3. I see there is an internal call printDeferredWarnings(), which seems to be almost exactly what I want. However, using an .Internal() does not look like a terribly stable solution, either. Also, having direct access to a similar function from the C-API would be very comfortable. Thanks! Thomas signature.asc Description: This is a digitally signed message part. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Lessons from LibreOffice project
The message below came to me from the Getting Open Source Logic INto Government list. I'm passing it on to the devel list as the infoworld article may have some ideas of relevance to the R project, mainly concerning build and test issues and tracking changes in the code base. While the LibreOffice project is very different from R, there may nevertheless be some tips we can borrow in the best open source tradition. FWIW, the article is quite short. John Nash Original Message Subject: [OTT-GOSLING] Interesting article about LibreOffice project Date: Wed, 06 Mar 2013 04:10:51 + From: gabriel.cosse...@gmail.com Reply-To: GOSLING members in Ottawa To: ottawa-gosl...@list.goslingcommunity.org Hi everyone! Here's a very interesting article about how the LibreOffice project is evolving: What you can learn from the monster LibreOffice project http://www.infoworld.com/print/212908 Have a great day! Gabriel Cossette Coordonnateur de communautés sur les logiciels libres | Open Source Software Communities Coordinator Technologies de l'information | Information Technology Services partagés Canada | Parcs Canada Shared Services Canada | Parks Canada 3, passage du Chien-d'Or, Québec (Québec) G1R 4V7 gabriel.cosse...@spc-ssc.gc.ca Tél. | Tel.: 418 649-8163 Cell. | Cell.: 418 254-8558 Gouvernement du Canada | Government of Canada ___ Ottawa-gosling mailing list ottawa-gosl...@list.goslingcommunity.org http://list.goslingcommunity.org/mailman/listinfo/ottawa-gosling __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] predict.loess() segfaults for large n?
Thanks. This is in the netlib loess code: the size is used in Fortran (and an INTEGER) so we cannot increase it. I've added a test and thrown an error if the dimension is too large. On 01/03/2013 11:27, Hiroyuki Kawakatsu wrote: Hi, I am segfaulting when using predict.loess() (checked with r62092). I've traced the source with the help of valgrind (output pasted below) and it appears that this is due to int overflow when allocating an int work array in loess_workspace(): liv = 50 + ((int)pow((double)2, (double)D) + 4) * nvmax + 2 * N; where liv is an (global) int. For D=1 (one x variable), this overflows at approx N = 4089 where N is the fitted sample size (not prediction sample size). I am aware that you are in the process of introducing long vectors but a quick fix would be to error when predict.loess(..., se=TRUE) and N is too large. (Ideally, one would use long int but does fortran portably support long int?) The threshold N value may depend on surface type (above is for surface=="interpolate"). The following sample code does not result in segfault but when run with valgrind, it produces the warning about large range. (In the code that segfaults N is about 77,000). set.seed(1) n = 5000 # n=4000 seems ok x = rnorm(n) y = x + rnorm(n) yf = loess(y~x, span=0.75, control=loess.control(trace.hat="approximate")) print( predict(yf, data.frame(x=1), se=TRUE) ) ##---valgrid output with segfault (abridged): test4() ==30841== Warning: set address range perms: large range [0x3962a040, 0x5fb42608) (defined) ==30841== Warning: set address range perms: large range [0x5fb43040, 0xf8c8e130) (defined) ==30841== Invalid write of size 4 ==30841==at 0xCD719F0: ehg139_ (loessf.f:1444) ==30841==by 0xCD72E0C: ehg131_ (loessf.f:467) ==30841==by 0xCD73A5A: lowesb_ (loessf.f:1530) ==30841==by 0xCD2C774: loess_ise (loessc.c:219) ==30841==by 0x486C7F: do_dotCode (dotcode.c:1744) ==30841==by 0x4AB040: bcEval (eval.c:4544) ==30841==by 0x4B6B3F: Rf_eval (eval.c:498) ==30841==by 0x4BAD87: Rf_applyClosure (eval.c:960) ==30841==by 0x4B6D5E: Rf_eval (eval.c:611) ==30841==by 0x4B7A1E: do_eval (eval.c:2193) ==30841==by 0x4AB040: bcEval (eval.c:4544) ==30841==by 0x4B6B3F: Rf_eval (eval.c:498) ==30841== Address 0xf8cd4144 is not stack'd, malloc'd or (recently) free'd ==30841== *** caught segfault *** address 0xf8cd4144, cause 'memory not mapped' Traceback: 1: predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se) 2: eval(expr, envir, enclos) 3: eval(substitute(expr), data, enclos = parent.frame()) 4: with.default(object, predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se)) 5: with(object, predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se)) 6: predict.loess(y2, data.frame(hours = xmin), se = TRUE) 7: predict(y2, data.frame(hours = xmin), se = TRUE) 8: test4() aborting ... ==30841== -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
Thanks David, I've looked into them both a bit, and I don't think the provide an approach for R (or Perl, for that matter) library management, which is the wicket I'm trying to get less sticky now. They could be useful to manage the various installations of version of R and analysis files (we're talking allot of NextGenSequencing, so, bowtie, tophat, and friends) quite nicely similarly in service of an approach to enabling reproducible results. THanks for you thoughts, and, if you know of others similar to dotkit/modules I'd be keen to here of them. ~Malcolm .-Original Message- .From: Lapointe, David [mailto:david.lapoi...@umassmed.edu] .Sent: Wednesday, March 06, 2013 7:46 AM .To: Cook, Malcolm; 'Paul Gilbert' .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 'r-discuss...@listserv.stowers.org' .Subject: RE: [BioC] [Rd] enabling reproducible research & R package management & install.package.version & BiocLite . .There are utilities ( e.g. dotkit, and modules) which facilitate version management, basically creating on the fly PATH and env setups, if .you are comfortable keeping all that around. . .David . .-Original Message- .From: bioconductor-boun...@r-project.org [mailto:bioconductor-boun...@r-project.org] On Behalf Of Cook, Malcolm .Sent: Tuesday, March 05, 2013 6:08 PM .To: 'Paul Gilbert' .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 'r-discuss...@listserv.stowers.org' .Subject: Re: [BioC] [Rd] enabling reproducible research & R package management & install.package.version & BiocLite . .Paul, . .I think your balanced and reasoned approach addresses all my current concerns. Nice! I will likely adopt your methods. Let me .ruminate. Thanks for this. . .~ Malcolm . . .-Original Message- . .From: Paul Gilbert [mailto:pgilbert...@gmail.com] . .Sent: Tuesday, March 05, 2013 4:34 PM . .To: Cook, Malcolm . .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 'r-discuss...@listserv.stowers.org' . .Subject: Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite . . .(More on the original question further below.) . . .On 13-03-05 09:48 AM, Cook, Malcolm wrote: . .> All, . .> . .> What got me started on this line of inquiry was my attempt at .> balancing the advantages of performing a periodic (daily or .weekly) .> update to the 'release' version of locally installed R/Bioconductor .> packages on our institute-wide installation of R with .the .> disadvantages of potentially changing the result of an analyst's .> workflow in mid-project. . . . .I have implemented a strategy to try to address this as follows: . . . .1/ Install a new version of R when it is released, and packages in the R .version's site-library with package versions as available at the .time .the R version is installed. Only upgrade these package versions in the .case they are severely broken. . . . .2/ Install the same packages in site-library-fresh and upgrade these .package versions on a regular basis (e.g. daily). . . . .3/ When a new version of R is released, freeze but do not remove the old .R version, at least not for a fairly long time, and freeze ..site-library-fresh for the old version. Begin with the new version as in .1/ and 2/. The old version remains available, so "reverting" is .trivial. . . . . . .The analysts are then responsible for choosing the R version they use, .and the library they use. This means they do not have to .change R and .package version mid-project, but they can if they wish. I think the .above two libraries will cover most cases, but it is .possible that a few .projects will need their own special library with a combination of .package versions. In this case the user could .create their own library, .or you might prefer some more official mechanism. . . . .The idea of the above strategy is to provide the stability one might .want for an ongoing project, and the possibility of an upgraded .package .if necessary, but not encourage analysts to remain indefinitely with old .versions (by say, putting new packages in an old R .version library). . . . .This strategy has been implemented in a set of make files in the project .RoboAdmin available at http://automater.r-forge.r- .project.org/. It can .be done entirely automatically with a cron job. Constructive comments .are always appreciated. . . . .(IT departments sometimes think that there should be only one version of .everything available, which they test and approve. So .the initial .reaction to this approach could be negative. I think they have not .really thought about the advantages. They usually .cannot test/approve an .upgrade without user input, and timing is often extremely complicate .because of ongoing user needs. This .strategy is simply shifting .responsibility and timing to the users, or user departments, that ca
[Rd] lapply coerce data.frame to a list
Hi R-devel, When using lapply upon data.frame, I notice lapply coerces data.frame to list before calling internal lapply function. R> lapply function (X, FUN, ...) { FUN <- match.fun(FUN) if (!is.vector(X) || is.object(X)) X <- as.list(X) .Internal(lapply(X, FUN)) } df <- data.frame(V1=seq(100*1024*1024), V2=rnorm(100*1024)) R> is.vector(df) # btw, list is a vector, data.frame is a list, but data.frame is NOT a vector, something is not consistent ?? [1] FALSE X <- as.list(X) is executed and takes time for large data.frame R> object.size(df) 1258291976 bytes R> system.time(as.list(df)) user system elapsed 1.396 0.472 1.885 The question is: Given that data.frame is a list, is it necessary to coerce the data.frame to a list for lapply? Would the following logic do the same but more efficient for lapply to run on data.frame? if (!is.vector(X) && !is.list(X) || is.object(X)) X <- as.list(X) .Internal(lapply(X, FUN)) Thanks, Qin [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] do_fileinfo / file.info test for file IS directory during package load pointlessly stresses NIS by getting username / group info
*Summary: * During package loading the library function calls file.info to determine if a file is a directory. This uselessly invokes getpwuid and getgrgid which can be expensive if the user/group database are held on a network. Note that file_test ALSO uses file.info for the same purpose Suggest rebuilding file_test to use stat based call for directory search, and using file_test in function library. Note that functions like R_unlink uses stat calls to determine if something is a directory. -Alex Brown *Detail:* While developing an application using Shiny my (large fortune500) company started to have network issues and NIS performance issues in particular. Shiny will relatively frequently restart R, which entails loading a small handful of packages. I noticed that R startup time went down the drain, and my shiny server started failing with timeouts in other parts of the server (apache mod_proxy and shinys node components have 60s timeouts) I narrowed this down to long latency calls on NIS due to calls to libcs getpwent and getgrgid; always to the same user but why? *Strace:* bind(5, {sa_family=AF_INET, sin_port=htons(994), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EACCES (Permission denied) setsockopt(5, SOL_IP, IP_RECVERR, [1], 4) = 0 close(4)= 0 sendto(5, "_*n\230\0\0\0\0\0\0\0\2\0\1\206\244\0\0\0\2\0\0\0\3\0\0\0\0\0\0\0\0"..., 76, 0, {sa_family=AF_INET, sin_port=htons(776), sin_addr=inet_addr("10.3.147.16")}, 16) = 76 poll([{fd=5, events=POLLIN}], 1, 5000) = 1 ([{fd=5, revents=POLLIN}]) *gdb (break on bind)* #0 0x76ee3c00 in bind () from /lib64/libc.so.6 #1 0x76f069b3 in bindresvport () from /lib64/libc.so.6 #2 0x76f08a0f in __libc_clntudp_bufcreate_internal () from /lib64/libc.so.6 #3 0x758ddd37 in yp_bind_client_create () from /lib64/libnsl.so.1 #4 0x758dde06 in yp_bind_file () from /lib64/libnsl.so.1 #5 0x758de043 in __yp_bind () from /lib64/libnsl.so.1 #6 0x758de47c in do_ypcall () from /lib64/libnsl.so.1 #7 0x758de569 in do_ypcall_tr () from /lib64/libnsl.so.1 #8 0x758df0a2 in yp_match () from /lib64/libnsl.so.1 #9 0x75af5f79 in _nss_nis_getpwuid_r () from /lib64/libnss_nis.so.2 #10 0x76eb040c in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6 #11 0x76eafc6f in getpwuid () from /lib64/libc.so.6 #12 0x77905262 in do_fileinfo (call=, op=, args=, rho=) at platform.c:946 #13 0x77894f62 in bcEval (body=, rho=, useCache=) at eval.c: *R: trace(file.info,browser) where* where 7: file.info(lib.loc)$isdir %in% TRUE where 8: library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, warn.conflicts = warn.conflicts, quietly = quietly) *R: library* lib.loc <- lib.loc[file.info(lib.loc)$isdir %in% TRUE] if (!character.only) -Alex Brown [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
There are utilities ( e.g. dotkit, and modules) which facilitate version management, basically creating on the fly PATH and env setups, if you are comfortable keeping all that around. David -Original Message- From: bioconductor-boun...@r-project.org [mailto:bioconductor-boun...@r-project.org] On Behalf Of Cook, Malcolm Sent: Tuesday, March 05, 2013 6:08 PM To: 'Paul Gilbert' Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 'r-discuss...@listserv.stowers.org' Subject: Re: [BioC] [Rd] enabling reproducible research & R package management & install.package.version & BiocLite Paul, I think your balanced and reasoned approach addresses all my current concerns. Nice! I will likely adopt your methods. Let me ruminate. Thanks for this. ~ Malcolm .-Original Message- .From: Paul Gilbert [mailto:pgilbert...@gmail.com] .Sent: Tuesday, March 05, 2013 4:34 PM .To: Cook, Malcolm .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; 'r-discuss...@listserv.stowers.org' .Subject: Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite . .(More on the original question further below.) . .On 13-03-05 09:48 AM, Cook, Malcolm wrote: .> All, .> .> What got me started on this line of inquiry was my attempt at .> balancing the advantages of performing a periodic (daily or weekly) .> update to the 'release' version of locally installed R/Bioconductor .> packages on our institute-wide installation of R with the .> disadvantages of potentially changing the result of an analyst's .> workflow in mid-project. . .I have implemented a strategy to try to address this as follows: . .1/ Install a new version of R when it is released, and packages in the R .version's site-library with package versions as available at the time .the R version is installed. Only upgrade these package versions in the .case they are severely broken. . .2/ Install the same packages in site-library-fresh and upgrade these .package versions on a regular basis (e.g. daily). . .3/ When a new version of R is released, freeze but do not remove the old .R version, at least not for a fairly long time, and freeze .site-library-fresh for the old version. Begin with the new version as in .1/ and 2/. The old version remains available, so "reverting" is trivial. . . .The analysts are then responsible for choosing the R version they use, .and the library they use. This means they do not have to change R and .package version mid-project, but they can if they wish. I think the .above two libraries will cover most cases, but it is possible that a few .projects will need their own special library with a combination of .package versions. In this case the user could create their own library, .or you might prefer some more official mechanism. . .The idea of the above strategy is to provide the stability one might .want for an ongoing project, and the possibility of an upgraded package .if necessary, but not encourage analysts to remain indefinitely with old .versions (by say, putting new packages in an old R version library). . .This strategy has been implemented in a set of make files in the project .RoboAdmin available at http://automater.r-forge.r-project.org/. It can .be done entirely automatically with a cron job. Constructive comments .are always appreciated. . .(IT departments sometimes think that there should be only one version of .everything available, which they test and approve. So the initial .reaction to this approach could be negative. I think they have not .really thought about the advantages. They usually cannot test/approve an .upgrade without user input, and timing is often extremely complicate .because of ongoing user needs. This strategy is simply shifting .responsibility and timing to the users, or user departments, that can .actually do the testing and approving.) . .Regarding NFS mounts, it is relatively robust. There can be occasional .problems, especially for users that have a habit of keeping an R session .open for days at a time and using site-library-fresh packages. In my .experience this did not happen often enough to worry about a "blackout .period". . .Regarding the original question, I would like to think it could be .possible to keep enough information to reproduce the exact environment, .but I think for potentially sensitive numerical problems that is .optimistic. As others have pointed out, results can depend not only on R .and package versions, configuration, OS versions, and library and .compiler versions, but also on the underlying hardware. You might have .some hope using something like an Amazon core instance. (BTW, this .problem is not specific to R.) . .It is true that restricting to a fixed computing environment at your .institution may ease things somewhat, but if you occasionally upgrade .hardware or the OS then you will
Re: [Rd] [BioC] enabling reproducible research & R package management & install.package.version & BiocLite
seen QuasR (and/or gmapR, Rsubread, etc.)? one can run BowTie, gsnap, etc. from R this certainly makes it easier for me to remember how I did some ChIP-seq or BS-seq or RNA-seq processing a year ago, when it turns out I need to add a sample or samples and carry on with an existing analysis pipeline On Wed, Mar 6, 2013 at 10:17 AM, Cook, Malcolm wrote: > Thanks David, I've looked into them both a bit, and I don't think the > provide an approach for R (or Perl, for that matter) library management, > which is the wicket I'm trying to get less sticky now. > > They could be useful to manage the various installations of version of R > and analysis files (we're talking allot of NextGenSequencing, so, bowtie, > tophat, and friends) quite nicely similarly in service of an approach to > enabling reproducible results. > > THanks for you thoughts, and, if you know of others similar to > dotkit/modules I'd be keen to here of them. > > ~Malcolm > > > .-Original Message- > .From: Lapointe, David [mailto:david.lapoi...@umassmed.edu] > .Sent: Wednesday, March 06, 2013 7:46 AM > .To: Cook, Malcolm; 'Paul Gilbert' > .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; ' > r-discuss...@listserv.stowers.org' > .Subject: RE: [BioC] [Rd] enabling reproducible research & R package > management & install.package.version & BiocLite > . > .There are utilities ( e.g. dotkit, and modules) which facilitate version > management, basically creating on the fly PATH and env setups, if > .you are comfortable keeping all that around. > . > .David > . > .-Original Message- > .From: bioconductor-boun...@r-project.org [mailto: > bioconductor-boun...@r-project.org] On Behalf Of Cook, Malcolm > .Sent: Tuesday, March 05, 2013 6:08 PM > .To: 'Paul Gilbert' > .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; ' > r-discuss...@listserv.stowers.org' > .Subject: Re: [BioC] [Rd] enabling reproducible research & R package > management & install.package.version & BiocLite > . > .Paul, > . > .I think your balanced and reasoned approach addresses all my current > concerns. Nice! I will likely adopt your methods. Let me > .ruminate. Thanks for this. > . > .~ Malcolm > . > . .-Original Message- > . .From: Paul Gilbert [mailto:pgilbert...@gmail.com] > . .Sent: Tuesday, March 05, 2013 4:34 PM > . .To: Cook, Malcolm > . .Cc: 'r-devel@r-project.org'; 'bioconduc...@r-project.org'; ' > r-discuss...@listserv.stowers.org' > . .Subject: Re: [Rd] [BioC] enabling reproducible research & R package > management & install.package.version & BiocLite . > . .(More on the original question further below.) . > . .On 13-03-05 09:48 AM, Cook, Malcolm wrote: > . .> All, > . .> > . .> What got me started on this line of inquiry was my attempt at .> > balancing the advantages of performing a periodic (daily or > .weekly) .> update to the 'release' version of locally installed > R/Bioconductor .> packages on our institute-wide installation of R with > .the .> disadvantages of potentially changing the result of an analyst's > .> workflow in mid-project. > . . > . .I have implemented a strategy to try to address this as follows: > . . > . .1/ Install a new version of R when it is released, and packages in the > R .version's site-library with package versions as available at the > .time .the R version is installed. Only upgrade these package versions > in the .case they are severely broken. > . . > . .2/ Install the same packages in site-library-fresh and upgrade these > .package versions on a regular basis (e.g. daily). > . . > . .3/ When a new version of R is released, freeze but do not remove the > old .R version, at least not for a fairly long time, and freeze > ..site-library-fresh for the old version. Begin with the new version as > in .1/ and 2/. The old version remains available, so "reverting" is > .trivial. > . . > . . > . .The analysts are then responsible for choosing the R version they use, > .and the library they use. This means they do not have to > .change R and .package version mid-project, but they can if they wish. I > think the .above two libraries will cover most cases, but it is > .possible that a few .projects will need their own special library with > a combination of .package versions. In this case the user could > .create their own library, .or you might prefer some more official > mechanism. > . . > . .The idea of the above strategy is to provide the stability one might > .want for an ongoing project, and the possibility of an upgraded > .package .if necessary, but not encourage analysts to remain > indefinitely with old .versions (by say, putting new packages in an old R > .version library). > . . > . .This strategy has been implemented in a set of make files in the > project .RoboAdmin available at http://automater.r-forge.r- > .project.org/. It can .be done entirely automatically with a cron job. > Constructive comments