[Rd] plot.POSIXct uses wrong x axis (PR#14016)
Full_Name: Karl Ove Hufthammer Version: 2.10.0 beta OS: Windows Submission from: (NULL) (93.124.134.66) When plotting a single POSIXct variable, 'plot' uses a nonsensical x axis. Here is some example code: set.seed(1) x=seq(1,1e8,length=100)+round(runif(100)*1e8) y=as.POSIXct(x,origin="2001-01-01") plot(y) The y axis correctly shows appropriate labels (years 2002 to 2006), but the x axis show the single time '59:58' in the lower left corner. Expected behaviour: The indices should be shown on the x axis, just like for plot(x), where x is the x variable in the above example code. Additional notes: While ?plot.POSIXct does not explicitly say that the second variable ('y') is optional, the help for the generic, ?plot, does. And it seems reasonable that it should be. Also plot(POSIXct.variable) does produce a 'correct' plot, except for the labels on the x axis. Output of sessionInfo(): R version 2.10.0 beta (2009-10-17 r50136) i386-pc-mingw32 locale: [1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 [2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252 [3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252 [4] LC_NUMERIC=C [5] LC_TIME=Norwegian-Nynorsk_Norway.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_2.10.0 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] R on Windows crashes when using certain characters in strings in data frames (PR#14125)
Full_Name: Karl Ove Hufthammer Version: 2.10.0 OS: Windows XP Submission from: (NULL) (93.124.134.66) I have found a rather strange bug in R 2.10.0 on Windows, where the choice of characters used in a string make R crash (i.e., Windows shows a dialogue saying that the application has a problem, and must be closed). I can reproduce the bug on two separate systems running Windows XP, and with both R 2.10.0 and the latest R.2.10.1 RC. The following commands trigger the crash for me: n=1e5 k=10 x=sample(k,n,replace=TRUE) y=sample(k,n,replace=TRUE) xy=paste(x,y,sep=" × ") z=sample(n) d=data.frame(xy,z) The last step takes very long time, and R crashes before it's finished. Note that if I reduce n, the problem disappears. Also, if I change the × (a multiplication symbol) to a x (a letter), the problem also disappears (and the last command takes almost no time to run). I originally discovered this (or a related?) bug while using 'unique' on a data frame similar to the 'd' data frame defined above, where R would often, but not always, crash. > sessionInfo() R version 2.10.0 (2009-10-26) i386-pc-mingw32 locale: [1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 [2] LC_CTYPE=Norwegian-Nynorsk_Norway.1252 [3] LC_MONETARY=Norwegian-Nynorsk_Norway.1252 [4] LC_NUMERIC=C [5] LC_TIME=Norwegian-Nynorsk_Norway.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] segfault on functions with 'source' attribute set to a boolean or a number (PR#10437)
Full_Name: Karl Ove Hufthammer Version: 2.6.0 OS: Linux (Fedora 7) Submission from: (NULL) (129.177.61.84) When viewing a function that has its 'source' attribute set to a boolean or a numeric, R crashes with a segfault. (Setting 'source' to a character vector does not make R crash, however.) Steps to reproduce: > attr(lm,"source")=FALSE > lm *** caught segfault *** address 0x18, cause 'memory not mapped' __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] (PR#10437) segfault on functions with 'source' attribute
For the record: The reason I used attr(myfun, "source") = FALSE, is that I misread the example 'Tidying R Code' in 'Writing R Extensions', which calls for attr(myfun, "source") = NULL. Somehow setting 'source' to FALSE seems more natural to me than setting it to NULL. [EMAIL PROTECTED]: > I am not sure why you would want to do that, but the C code does assume > source attributes were put there by R, and changing tests from !isNull to > isString in a few places will fix that. -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] colnames(tapply(...)) (PR#8539)
I would like to bring to your attention the following error message which didn't appear on previous versions (long time ago?) Thanks for all your effort Karl Version 2.2.1 Patched (2006-01-21 r37153) > f <- rep(c(1,2),each=5) > x <- tapply(f,f,sum) > colnames(x) Error in dn[[2]] : subscript out of bounds --- Karl Thomaseth, Ph.D. Research Director National Research Council Institute of Biomedical Engineering ISIB-CNR Corso Stati Uniti 4 35127 Padova, ITALY http://www.isib.cnr.it/~karl/ tel.: (+39) 049 8295762, fax: (+39) 049 8295763 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fwd: warning or error upon type/storage mode coercion?
-- Forwarded message -- From: Karl Forner Date: Wed, Sep 15, 2010 at 10:14 AM Subject: Re: [Rd] warning or error upon type/storage mode coercion? To: Stefan Evert I'm a Perl fan, and I really really miss the "use strict" feature. IMHO it's very error-prone not to have thios safety net. Best, On Wed, Sep 15, 2010 at 9:54 AM, Stefan Evert wrote: > > On 15 Sep 2010, at 03:23, Benjamin Tyner wrote: > > > 2. So, assuming the answer to (1) is a resounding "no", does anyone care > to state an opinion regarding the philosophical or historical rationale for > why this is the case in R/S, whereas certain other interpreted languages > offer the option to perform strict type checking? Basically, I'm trying to > explain to someone from a perl background why the (apparent) lack of a "use > strict; use warnings;" equivalent is not a hindrance to writing bullet-proof > R code. > > If they're from a Perl background, you might also want to point out to them > that (base) Perl doesn't do _any_ type checking at all, and converts types > as needed. As in ... > > $x = "0.0"; > if ($x) ... # true > if ($x+0) ... # false > > AFAIK, that's one of the main complaints that people have about Perl. "use > strict" will just make sure that all variables have to be declared before > they're used, so you can't mess up by mistyping variable names. Which is > something I'd very much like to have in R occasionally ... > > Best, > Stefan > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Best way to manage configuration for openMP support
Thanks a lot, I have implemented the configure stuff and it works perfectly !! Exactly what I was looking for. I just added AC_PREREQ([2.62]) because the AC_OPENMP was only supported from this version, and AC_MSG_WARN([NO OpenMP support detected. You should should use gcc >= 4.2 !!!]) when no openmp support was detected. Maybe this could be put into the Writing R Extensions manual. Thanks again, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Possible bug or annoyance with library.dynam.unload()
Hello, I have a package with a namespace. Because I use Roxygen that overwrites the NAMESPACE file each time it is run, I use a R/zzz.R file with an .onLoad() and .onUnload() functions to take care of loading and unloading my shared library. The problem: if I load my library from a local directory, then the unloading of the package fails, e.g: # loads fine >library(Foo, lib.loc=".Rcheck") >unloadNamespace("Foo") Warning message: .onUnload failed in unloadNamespace() for 'Foo', details: call: library.dynam.unload("Foo", libpath) error: shared library 'Foo' was not loaded # I traced it a little: >library.dynam.unload("Foo", ".Rcheck/Foo") Error in library.dynam.unload("Foo", ".Rcheck/Foo") : shared library 'Foo' was not loaded # using an absolute path works >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo") So from what I understand, the problem is either that the relative libpath is sent to the .onUnload() function instead of the absolute one, or that library.dynam.unload() should be modified to handle the relative paths. Am I missing something ? What should I do ? Thanks, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible bug or annoyance with library.dynam.unload()
Hello, I got no reply on this issue. It is not critical and I could think of work-around, but it really looks like a bug to me. Should I file a bug-report instead of posting in this list ? Thanks, Karl On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner wrote: > Hello, > > I have a package with a namespace. Because I use Roxygen that overwrites > the NAMESPACE file each time it is run, I use a R/zzz.R file with > an .onLoad() and .onUnload() functions to take care of loading and > unloading my shared library. > > The problem: if I load my library from a local directory, then the > unloading of the package fails, e.g: > > # loads fine > >library(Foo, lib.loc=".Rcheck") > > >unloadNamespace("Foo") > Warning message: > .onUnload failed in unloadNamespace() for 'Foo', details: > call: library.dynam.unload("Foo", libpath) > error: shared library 'Foo' was not loaded > > # I traced it a little: > >library.dynam.unload("Foo", ".Rcheck/Foo") > Error in library.dynam.unload("Foo", ".Rcheck/Foo") : > shared library 'Foo' was not loaded > > # using an absolute path works > >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo") > > > So from what I understand, the problem is either that the relative libpath > is sent to the .onUnload() function instead of the absolute one, > or that library.dynam.unload() should be modified to handle the relative > paths. > > Am I missing something ? What should I do ? > > Thanks, > > > Karl > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible bug or annoyance with library.dynam.unload()
Thanks Duncan for your suggestion. I could not find any package using dynamic library, namespaces and not the useDynLib pragma so I created a minimalistic package to demonstrate the problem. Please find attached a very small package foo (8.8k) Steps to reproduce the problem: * unarchive it ( tar zxvf foo_0.1.tar.gz ) * cd foo * install it locally ( mkdir local; R CMD INSTALL -l local . ) * R > library(foo, lib.loc="local/") >.dynLibs() # there you should be able to see the foo.so lib, in my case /x05/people/m160508/workspace/foo/local/foo/libs/foo.so > unloadNamespace("foo") .onUnload, libpath= local/fooWarning message: .onUnload failed in unloadNamespace() for 'foo', details: call: library.dynam.unload("foo", libpath) error: shared library 'foo' was not loaded #The libpath that the .onUnload() gets is "local/foo". #This fails: >library.dynam.unload("foo", "local/foo") Error in library.dynam.unload("foo", "local/foo") : shared library 'foo' was not loaded # but if you use the absolute path it works: >library.dynam.unload("foo", "/x05/people/m160508/workspace/foo/local/foo") Karl On Tue, Sep 21, 2010 at 5:33 PM, Duncan Murdoch wrote: > On 21/09/2010 10:38 AM, Karl Forner wrote: > >> Hello, >> >> I got no reply on this issue. >> It is not critical and I could think of work-around, but it really looks >> like a bug to me. >> Should I file a bug-report instead of posting in this list ? >> > > I'd probably post instructions for a reproducible example first. Pick some > CRAN package, tell us what to do with it to trigger the error, and then we > can see if it's something special about your package or Roxygen or a general > problem. > > Duncan Murdoch > > Thanks, >> >> Karl >> >> On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner >> wrote: >> >> > Hello, >> > >> > I have a package with a namespace. Because I use Roxygen that >> overwrites >> > the NAMESPACE file each time it is run, I use a R/zzz.R file with >> > an .onLoad() and .onUnload() functions to take care of loading and >> > unloading my shared library. >> > >> > The problem: if I load my library from a local directory, then the >> > unloading of the package fails, e.g: >> > >> > # loads fine >> > >library(Foo, lib.loc=".Rcheck") >> > >> > >unloadNamespace("Foo") >> > Warning message: >> > .onUnload failed in unloadNamespace() for 'Foo', details: >> >call: library.dynam.unload("Foo", libpath) >> >error: shared library 'Foo' was not loaded >> > >> > # I traced it a little: >> > >library.dynam.unload("Foo", ".Rcheck/Foo") >> > Error in library.dynam.unload("Foo", ".Rcheck/Foo") : >> >shared library 'Foo' was not loaded >> > >> > # using an absolute path works >> > >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo") >> > >> > >> > So from what I understand, the problem is either that the relative >> libpath >> > is sent to the .onUnload() function instead of the absolute one, >> > or that library.dynam.unload() should be modified to handle the >> relative >> > paths. >> > >> > Am I missing something ? What should I do ? >> > >> > Thanks, >> > >> > >> > Karl >> > >> >>[[alternative HTML version deleted]] >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > foo_0.1.tar.gz Description: GNU Zip compressed data __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Possible bug or annoyance with library.dynam.unload()
> Your package depends on Rcpp, so I didn't try it in the alpha version of 2.12.0 It's a mistake, in fact it does not depend anymore. You can safely delete the src/Makevars file. Duncan Murdoch > > > Steps to reproduce the problem: >> >> * unarchive it ( tar zxvf foo_0.1.tar.gz ) >> * cd foo >> * install it locally ( mkdir local; R CMD INSTALL -l local . ) >> * R >> > library(foo, lib.loc="local/") >> >.dynLibs() >> # there you should be able to see the foo.so lib, in my case >> /x05/people/m160508/workspace/foo/local/foo/libs/foo.so >> >> > unloadNamespace("foo") >> .onUnload, libpath= local/fooWarning message: >> .onUnload failed in unloadNamespace() for 'foo', details: >> call: library.dynam.unload("foo", libpath) >> error: shared library 'foo' was not loaded >> >> #The libpath that the .onUnload() gets is "local/foo". >> >> #This fails: >> >library.dynam.unload("foo", "local/foo") >> Error in library.dynam.unload("foo", "local/foo") : >> shared library 'foo' was not loaded >> >> # but if you use the absolute path it works: >> >library.dynam.unload("foo", >> "/x05/people/m160508/workspace/foo/local/foo") >> >> Karl >> >> On Tue, Sep 21, 2010 at 5:33 PM, Duncan Murdoch> >wrote: >> >> > On 21/09/2010 10:38 AM, Karl Forner wrote: >> > >> >> Hello, >> >> >> >> I got no reply on this issue. >> >> It is not critical and I could think of work-around, but it really >> looks >> >> like a bug to me. >> >> Should I file a bug-report instead of posting in this list ? >> >> >> > >> > I'd probably post instructions for a reproducible example first. Pick >> some >> > CRAN package, tell us what to do with it to trigger the error, and then >> we >> > can see if it's something special about your package or Roxygen or a >> general >> > problem. >> > >> > Duncan Murdoch >> > >> > Thanks, >> >> >> >> Karl >> >> >> >> On Thu, Sep 16, 2010 at 6:11 PM, Karl Forner >> >> wrote: >> >> >> >> > Hello, >> >> > >> >> > I have a package with a namespace. Because I use Roxygen that >> >> overwrites >> >> > the NAMESPACE file each time it is run, I use a R/zzz.R file with >> >> > an .onLoad() and .onUnload() functions to take care of loading and >> >> > unloading my shared library. >> >> > >> >> > The problem: if I load my library from a local directory, then the >> >> > unloading of the package fails, e.g: >> >> > >> >> > # loads fine >> >> > >library(Foo, lib.loc=".Rcheck") >> >> > >> >> > >unloadNamespace("Foo") >> >> > Warning message: >> >> > .onUnload failed in unloadNamespace() for 'Foo', details: >> >> > call: library.dynam.unload("Foo", libpath) >> >> > error: shared library 'Foo' was not loaded >> >> > >> >> > # I traced it a little: >> >> > >library.dynam.unload("Foo", ".Rcheck/Foo") >> >> > Error in library.dynam.unload("Foo", ".Rcheck/Foo") : >> >> > shared library 'Foo' was not loaded >> >> > >> >> > # using an absolute path works >> >> > >library.dynam.unload("Foo", "/home/toto/.Rcheck/Foo") >> >> > >> >> > >> >> > So from what I understand, the problem is either that the relative >> >> libpath >> >> > is sent to the .onUnload() function instead of the absolute one, >> >> > or that library.dynam.unload() should be modified to handle the >> >> relative >> >> > paths. >> >> > >> >> > Am I missing something ? What should I do ? >> >> > >> >> > Thanks, >> >> > >> >> > >> >> > Karl >> >> > >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> __ >> >> R-devel@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> > >> > >> >> > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] checking user interrupts in C(++) code
Hello, My problem is that I have an extension in C++ that can be quite time-consuming. I'd like to make it interruptible. The problem is that if I use the recommended R_CheckUserInterrupt() method I have no possibility to cleanup (e.g. free the memory). I've seen an old thread about this, but I wonder if there's a new and definitive answer. I just do not understand why a simple R_CheckUserInterrupt() like method returning a boolean could not be used. Please enlighten me ! Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] checking user interrupts in C(++) code
Hi, Thanks for your reply, There are several ways in which you can make your code respond to interrupts > properly - which one is suitable depends on your application. Probably the > most commonly used for interfacing foreign objects is to create an external > pointer with a finalizer - that makes sure the object is released even if > you pass it on to R later. For memory allocated within a call you can either > use R's transient memory allocation (see Salloc) or use the on.exit handler > to cleanup any objects you allocated manually and left over. > Using R's transient memory allocation is not really an option when you use some code, like a library, not developed for R. Moreover what about c++ and the new operator ? One related question: if the code is interrupted, are C++ local objects freed ? Otherwise it is very very complex to attack all allocated objects, moreover it depends on where happens the interruption Best, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] dendrogram plot does not draw long labels ?
Hello, It seems that the plot function for dendrograms does not draw labels when they are too long. > hc <- hclust(dist(USArrests), "ave") > dend1 <- as.dendrogram(hc) > dend2 <- cut(dend1, h=70) > dd <- dend2$lower[[1]] > plot(dd) # first label is drawn > attr(dd[[1]], "label") <- "aa" > plot(dd) # first label is NOT drawn Is this expected ? Is it possible to force the drawing ? Thank you, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] dendrogram plot does not draw long labels ?
Hi Tobias and thank you for your reply, Using your insight I managed to work-around the issue (with some help) by increasing the "mai" option of par(). For example a "mai" with first coordinate (bottom) set to 5 allows to display ~ 42 letters. We tried to change the xpd value in the text() call that you mentioned, but it did not seem to fix the problem. But I think this is very annoying: the dendrogram plot is meant to be the common unique plotting for all clustering stuff and suddenly if your labels are just too long, nothing get displayed, without even a warning. I suppose that the margins should be dynamically set based on the max label text drawn length... The hclust plot seemed to handle very nicely these long labels, but I need to display colored labels and the only way I found is to use the plot.dendrogram for this. Best, Karl On Tue, Jan 25, 2011 at 12:17 PM, Tobias Verbeke < tobias.verb...@openanalytics.eu> wrote: > Hi Karl, > > > On 01/25/2011 11:27 AM, Karl Forner wrote: > > It seems that the plot function for dendrograms does not draw labels when >> they are too long. >> >> hc<- hclust(dist(USArrests), "ave") >>> dend1<- as.dendrogram(hc) >>> dend2<- cut(dend1, h=70) >>> dd<- dend2$lower[[1]] >>> plot(dd) # first label is drawn >>> attr(dd[[1]], "label")<- "aa" >>> plot(dd) # first label is NOT drawn >>> >> >> Is this expected ? >> > > Reading the code of stats:::plotNode, yes. > > Clipping to the figure region is hard-coded. > > You can see it is clipping to the figure region as follows: > > > hc <- hclust(dist(USArrests), "ave") > dend1 <- as.dendrogram(hc) > dend2 <- cut(dend1, h=70) > dd <- dend2$lower[[1]] > op <- par(oma = c(8,4,4,2)+0.1, xpd = NA) > > plot(dd) # first label is drawn > attr(dd[[1]], "label") <- "abcdefghijklmnopqrstuvwxyz" > > plot(dd) # first label is NOT drawn > box(which = "figure") > par(op) > > > Is it possible to force the drawing ? >> > > These are (from very quick reading -- not verified) > the culprit lines in plotNode, I think: > > text(xBot, yBot + vln, nodeText, xpd = TRUE, # <- clipping hard-coded > cex = lab.cex, col = lab.col, font = lab.font) > > Best, > Tobias > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Possible bug in cut.dendrogram when there are only 2 leaves in the tree ?
Hello, I noticed a behavior ot the cut() function that does not seem right. In a dendrogram with only 2 leaves in one cluster, if you cut() at a height above this cluster, you end up with 2 cut clusters, one for each leaf, instead of one. But it seems to work fine for dendrograms with more than 2 objects. For instance: library(stats) m <- matrix(c(0,0.1,0.1,0),nrow=2, ncol=2) dd <- as.dendrogram(hclust(as.dist(m))) #plot(dd) print(cut(dd, 0.2)) # 2 clusters in $lower m2 <- matrix(c(0,0.1,0.5,0.1,0,0.5,0.5,0.5,0),nrow=3, ncol=3) dd <- as.dendrogram(hclust(as.dist(m2))) print(cut(dd, 0.2)) # here 2 clusters in $lower, as expected So the question is: is it expected behavior that the whole tree is not reported in the $lower if it is itself under the threshold ? Thank you, Karl FORNER [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Error in svg() : cairo-based devices are not supported on this build
Hello, Sorry if it is not the right place.. I installed R-2.13.0 on a x86_64 linux server. All went fine, but the svg() function yells: > svg() Error in svg() : cairo-based devices are not supported on this build I have the Cairo, cairoDevice, RSvgDevice packages installed, and running. > Cairo.capabilities() png jpeg tiff pdf svgps x11 win TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE I tried to google around unsuccessfully. The only thing I noticed in config.log is: r_cv_has_pangocairo=no r_cv_cairo_works=yes r_cv_has_cairo=yes #define HAVE_WORKING_CAIRO 1 #define HAVE_CAIRO_PDF 1 #define HAVE_CAIRO_PS 1 #define HAVE_CAIRO_SVG 1 So what can be wrong ?? Thank you Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fwd: Error in svg() : cairo-based devices are not supported on this build
Check what configure is saying when you build R and config.log. You may be > simply missing something like pango-dev - Cairo doesn't use pango while R > does - but it is usually optional (it works on my Mac without pango) so > there may be more to it - config.log will tell you. > I managed to compile it successfully with pango-cairo support by editing the configure script and adding the pangoxft module to the pkg-config list: %diff -c configure.bak configure *** configure.bak 2011-05-31 16:16:55.0 +0200 --- configure 2011-05-31 16:17:21.0 +0200 *** *** 31313,31319 $as_echo "$r_cv_has_pangocairo" >&6; } if test "x${r_cv_has_pangocairo}" = "xyes"; then modlist="pangocairo" ! for module in cairo-xlib cairo-png; do if "${PKGCONF}" --exists ${module}; then modlist="${modlist} ${module}" fi --- 31313,31319 $as_echo "$r_cv_has_pangocairo" >&6; } if test "x${r_cv_has_pangocairo}" = "xyes"; then modlist="pangocairo" ! for module in cairo-xlib cairo-png pangoxft; do if "${PKGCONF}" --exists ${module}; then modlist="${modlist} ${module}" fi I do not know if it is an error in the configure script or just a peculiarity of my installation. All these libs (pango, cairo, gtk, glib) have been installed manually from tarballs. Best, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] mcparallel (parallel:::mcexit) does not call finalizers
Hello, In the context of trying to cover a package code that use parallelized tests using the covr package, I realized that code executed using mcparallel() was not covered, cf https://github.com/jimhester/covr/issues/189#issuecomment-226492623 >From my understanding, it seems that the package finalizer set by covr (cf https://github.com/jimhester/covr/blob/79f7e0434f3d14a48c6fea994b67b9814b34e4e5/R/covr.R#L348) is not called, because the forked process exits using parallel:::mcexit, which is a non standard exit and does not call some of the cleanup code (e.g. the R_CleanUp function is not called). I was wondering if a modification of the parallel mcexit could be considered, to make it call the finalizers, possibly triggered by a parameter or an option, or if there are solid reasons not to do so. Regards, Karl Forner [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] weird dir() behavior with broken symlinks
I encountered very weird behavior of the dir() function, that I just can not understand. Reproducible example: docker run -ti rocker/r-base R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) > # setup > tmp <- tempfile() > dir.create(tmp) > setwd(tmp) > file.symlink('from', 'to') # First weirdness, the behavior of the recursive argument > dir() [1] "to" > dir(recursive=TRUE) character(0) # include.dirs make it work again. The doc states: Should subdirectory names be included in # recursive listings? (They always are in non-recursive ones). >dir(recursive=TRUE, include.dirs=TRUE) [1] "to" Best, Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] weird dir() behavior with broken symlinks
another strange behavior of list.dirs(), that seems related: docker run -ti rocker/r-base > setwd(tempdir()) > file.symlink('from', 'to') [1] TRUE > list.dirs(recursive=FALSE) [1] "./to" > file.symlink('C/non_existing.doc', 'broken.txt') [1] TRUE > list.dirs(recursive=FALSE) [1] "./broken.txt" On Tue, Oct 18, 2016 at 3:08 PM, Karl Forner wrote: > I encountered very weird behavior of the dir() function, that I just can > not understand. > > Reproducible example: > > docker run -ti rocker/r-base > R version 3.3.1 (2016-06-21) -- "Bug in Your Hair" > Copyright (C) 2016 The R Foundation for Statistical Computing > Platform: x86_64-pc-linux-gnu (64-bit) > > # setup > > tmp <- tempfile() > > dir.create(tmp) > > setwd(tmp) > > file.symlink('from', 'to') > > # First weirdness, the behavior of the recursive argument > > dir() > [1] "to" > > dir(recursive=TRUE) > character(0) > > # include.dirs make it work again. The doc states: Should subdirectory > names be included in > # recursive listings? (They always are in non-recursive ones). > >dir(recursive=TRUE, include.dirs=TRUE) > [1] "to" > > Best, > Karl > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Bug in order function
Dear R-devel(opers), I wanted to draw your attention to a small problem with the order function in base. According to the documentation, radix sort supports different orders for each argument. This breaks when one of the arguments is an object. Please have a look to this stackoverflow question: https://stackoverflow.com/questions/39737871/r-order-method-on-multiple-columns-gives-error-argument-lengths-differ It describes the problem well and suggests a solution. Although it is a niche case, it's a very easy thing to fix :) Best regards, Karl Nordström __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] bug in package.skeleton(), and doc typo.
Hi all, I think there's a bug in package.skeleton(), when using the environment argument: Example: env <- new.env() env$hello <- function() { print('hello') } package.skeleton(name='mypkg', environment=env) ==> does not create any source in mypkg/R/* By the way, package.skeleton(name='mypkg', environment=env, list="hello") does not work either. According to the documentation: >The arguments list, environment, and code_files provide alternative ways to initialize the package. > If code_files is supplied, the files so named will be sourced to form the environment, then used to generate the package skeleton. >Otherwise list defaults to the non-hidden files in environment (those whose name does not start with .), but can be supplied to select a subset of the objects in that environment. I believe to have found the problem: in package.skeleton() body, the two calls to dump(): > dump(internalObjs, file = file.path(code_dir, sprintf("%s-internal.R", name))) > dump(item, file = file.path(code_dir, sprintf("%s.R", list0[item]))) should use the extra argument: envir=environment There's also a typo in the doc: The sentence: > Otherwise list defaults to the non-hidden **files** in environment (those whose name does not start with .) should be > Otherwise list defaults to the non-hidden **objects** in environment (those whose name does not start with .) Best, Karl Forner > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rj_1.1.3-1 loaded via a namespace (and not attached): [1] rj.gd_1.1.3-1 tools_3.0.1 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] sys.source() does not provide the parsing info to eval()
Hello, It seems that the parsing information attached to expressions parsed by the parse() function when keep.source=TRUE is not provided to the eval() function. Please consider this code: path <- tempfile() code <- '(function() print( str( sys.calls() ) ))()' writeLines(code, path) sys.source(path, envir=globalenv(), keep.source=TRUE) > OUTPUT: Dotted pair list of 4 $ : language sys.source(path, envir = globalenv(), keep.source = TRUE) $ : language eval(i, envir) $ : language eval(expr, envir, enclos) $ : language (function() print(str(sys.calls(() NULL then: eval(parse(text=code)) > OUTPUT: Dotted pair list of 3 $ : language eval(parse(text = code)) $ : language eval(expr, envir, enclos) $ :length 1 (function() print(str(sys.calls(() ..- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 1 1 42 1 42 1 1 .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' As you can see, when using eval() directly, the expression/call has the parsing information available in the "srcref" attribute, but not when using sys.source() Looking at sys.source() implementation, this seems to be caused by this line: for (i in exprs) eval(i, envir) The attribute "srcref" is not available anymore when "exprs" is subsetted, as illustred by the code below: ex <- parse( text="1+1; 2+2") attr(ex, 'srcref') print(str(ex)) # length 2 expression(1 + 1, 2 + 2) # - attr(*, "srcref")=List of 2 # ..$ :Class 'srcref' atomic [1:8] 1 1 1 3 1 3 1 1 # .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # ..$ :Class 'srcref' atomic [1:8] 1 6 1 8 6 8 1 1 # .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # - attr(*, "wholeSrcref")=Class 'srcref' atomic [1:8] 1 0 2 0 0 0 1 2 # .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # NULL print( str(ex[[1]])) # language 1 + 1 # NULL print( str(ex[1])) # length 1 expression(1 + 1) # - attr(*, "srcref")=List of 1 # ..$ :Class 'srcref' atomic [1:8] 1 1 1 3 1 3 1 1 # .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' # NULL I suppose that the line "for (i in exprs) eval(i, envir)" could be replaced by "eval(exprs, envir)" ? Best, Karl Forner P.S > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) ... [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Comments requested on "changedFiles" function
Hi Duncan, I think this functionality would be much easier to use and understand if you split it up the functionality of taking snapshots and comparing them into separate functions. In addition, the 'timestamp' functionality seems both confusing and brittle to me. I think it would be better to store file modification times in the snapshot and use those instead of an external file. Maybe: # Take a snapshot of the files. takeFileSnapshot(directory, file.info = TRUE, md5sum = FALSE, full.names = FALSE, recursive = TRUE, ...) # Take a snapshot using the same options as used for snapshot. retakeFileSnapshot(snapshot, directory = snapshot$directory) { takeFileSnapshot)(directory, file.info = snapshot$file.info, md5sum = snapshot$md5sum, etc) } compareFileSnapshots(snapshot1, snapshot2) - or - getNewFiles(snapshat1, snapshot2) # These names are probably too generic getDeletedFiles(snapshot1, snapshot2) getUpdatedFiles(snapshot1, snapshot2) -or- setdiff(snapshot1, snapshot2) # Unclear how this should treat updated files This approach does have the difficulty that users could attempt to compare snapshots that were taken with different options and that can't be compared, but that should be an easy error to detect. Karl On Wed, Sep 4, 2013 at 10:53 AM, Duncan Murdoch wrote: > In a number of places internal to R, we need to know which files have > changed (e.g. after building a vignette). I've just written a general > purpose function "changedFiles" that I'll probably commit to R-devel. > Comments on the design (or bug reports) would be appreciated. > > The source for the function and the Rd page for it are inline below. > > - changedFiles.R: > changedFiles <- function(snapshot, timestamp = tempfile("timestamp"), > file.info = NULL, > md5sum = FALSE, full.names = FALSE, ...) { > dosnapshot <- function(args) { > fullnames <- do.call(list.files, c(full.names = TRUE, args)) > names <- do.call(list.files, c(full.names = full.names, args)) > if (isTRUE(file.info) || (is.character(file.info) && length( > file.info))) { > info <- file.info(fullnames) > rownames(info) <- names > if (isTRUE(file.info)) > file.info <- c("size", "isdir", "mode", "mtime") > } else > info <- data.frame(row.names=names) > if (md5sum) > info <- data.frame(info, md5sum = tools::md5sum(fullnames)) > list(info = info, timestamp = timestamp, file.info = file.info, > md5sum = md5sum, full.names = full.names, args = args) > } > if (missing(snapshot) || !inherits(snapshot, "changedFilesSnapshot")) { > if (length(timestamp) == 1) > file.create(timestamp) > if (missing(snapshot)) snapshot <- "." > pre <- dosnapshot(list(path = snapshot, ...)) > pre$pre <- pre$info > pre$info <- NULL > pre$wd <- getwd() > class(pre) <- "changedFilesSnapshot" > return(pre) > } > > if (missing(timestamp)) timestamp <- snapshot$timestamp > if (missing(file.info) || isTRUE(file.info)) file.info <- snapshot$ > file.info > if (identical(file.info, FALSE)) file.info <- NULL > if (missing(md5sum))md5sum <- snapshot$md5sum > if (missing(full.names)) full.names <- snapshot$full.names > > pre <- snapshot$pre > savewd <- getwd() > on.exit(setwd(savewd)) > setwd(snapshot$wd) > > args <- snapshot$args > newargs <- list(...) > args[names(newargs)] <- newargs > post <- dosnapshot(args)$info > prenames <- rownames(pre) > postnames <- rownames(post) > > added <- setdiff(postnames, prenames) > deleted <- setdiff(prenames, postnames) > common <- intersect(prenames, postnames) > > if (length(file.info)) { > preinfo <- pre[common, file.info] > postinfo <- post[common, file.info] > changes <- preinfo != postinfo > } > else changes <- matrix(logical(0), nrow = length(common), ncol = 0, >dimnames = list(common, character(0))) > if (length(timestamp)) > changes <- cbind(changes, Newer = file_test("-nt", common, > timestamp)) > if (md5sum) { > premd5 <- pre[common, "md5sum"] > postmd5 <- post[common, "md5sum"] > changes <- cbind(changes, md5sum = premd5 != postmd5) > } > changes1 <- changes[rowSums(changes, na.rm = TRUE) > 0, , drop = FALSE] > changed <-
Re: [Rd] Comments requested on "changedFiles" function
Comments inline: On Wed, Sep 4, 2013 at 6:10 PM, Duncan Murdoch wrote: > > On 13-09-04 8:02 PM, Karl Millar wrote: >> >> Hi Duncan, >> >> I think this functionality would be much easier to use and understand if >> you split it up the functionality of taking snapshots and comparing them >> into separate functions. > > > Yes, that's another possibility. Some more comment below... > > > > In addition, the 'timestamp' functionality >> >> seems both confusing and brittle to me. I think it would be better to >> store file modification times in the snapshot and use those instead of >> an external file. Maybe: > > > You can do that, using file.info = "mtime", but the file.info snapshots are > quite a bit slower than using the timestamp file (when looking at a big > recursive directory of files). Sorry, I completely failed to explain what I was thinking here. There are a number of issues here, but the biggest one is that you're implicitly assuming that files that get modified will have mtimes that come after the timestamp file was created. This isn't always true, with the most notable exception being if you download a package from CRAN and untar it, the mtimes are usually well in the past (at least with GNU tar on a linux system), so this code won't notice that the files have changed. It may be a good idea to store the file sizes as well, which would help prevent false negatives in the (rare IIRC) cases where the contents have changed but the mtime values have not. Since you already need to call file.info() to get the mtime, this shouldn't increase the runtime, and the extra memory needed is fairly modest. >> >> # Take a snapshot of the files. >> takeFileSnapshot(directory, file.info <http://file.info> = TRUE, md5sum >> >> = FALSE, full.names = FALSE, recursive = TRUE, ...) >> >> # Take a snapshot using the same options as used for snapshot. >> retakeFileSnapshot(snapshot, directory = snapshot$directory) { >> takeFileSnapshot)(directory, file.info <http://file.info> = >> snapshot$file.info <http://file.info>, md5sum = snapshot$md5sum, etc) >> >> } >> >> compareFileSnapshots(snapshot1, snapshot2) >> - or - >> getNewFiles(snapshat1, snapshot2) # These names are probably too >> generic >> getDeletedFiles(snapshot1, snapshot2) >> getUpdatedFiles(snapshot1, snapshot2) >> -or- >> setdiff(snapshot1, snapshot2) # Unclear how this should treat updated files >> >> >> This approach does have the difficulty that users could attempt to >> compare snapshots that were taken with different options and that can't >> be compared, but that should be an easy error to detect. > > > I don't want to add too many new functions. The general R style is to have > functions that do a lot, rather than have a lot of different functions to > achieve different parts of related tasks. This is better for interactive use > (fewer functions to remember, a simpler help system to navigate), though it > probably results in less readable code. This is somewhat more nuanced and not particular to interactive use IMHO. Having functions that do a lot is good, _as long as the semantics are always consistent_. For example, lm() does a huge amount and has a wide variety of ways that you can specify your data, but it basically does the same thing no matter how you use it. On the other hand, if you have a function that does different things depending on how you call it (e.g. reshape()) then it's easy to remember the function name, but much harder to remember how to call it correctly, harder to understand the documentation and less readable. > > I can see an argument for two functions (a get and a compare), but I don't > think there are many cases where doing two gets and comparing the snapshots > would be worth the extra runtime. (It's extra because file.info is only a > little faster than list.files, and it would be unavoidable to call both twice > in that version. Using the timestamp file avoids one of those calls, and > replaces the other with file_test, which takes a similar amount of time. So > overall it's about 20-25% faster.) It also makes the code a bit more > complicated, i.e. three calls (get, get, compare) instead of two (get, > compare). I think a 'snapshotDirectory' and 'compareDirectoryToSnapshot' combination might work well. Thanks, Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Comments requested on "changedFiles" function
Hi Duncan, I like the interface of this version a lot better, but there's still a bunch of implementation details that need fixing: * As previously mentioned, there are important cases where the mtime values change in ways that this code doesn't detect. * If the timestamp file (which is usually in the temp directory) gets deleted (which can happen after a moderate amount of time of inactivity on some systems), then the file_test('-nt', ...) will always return false, even if the file has changed. * If files get added or deleted between the two calls to list.files in fileSnapshot, it will fail with an error. * If the path is on a remote file system, tempdir is local, and there's significant clock skew, then you can get incorrect results. Unfortunately, these aren't just theoretical scenarios -- I've had the misfortune to run up against all of them in the past. I've attached code that's loosely based on your implementation that solves these problems AFAICT. Alternatively, Hadley's code handles all of these correctly, with the exception that compare_state doesn't handle the case where safe_digest returns NA very well. Regards, Karl On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch wrote: > On 13-09-06 7:40 PM, Scott Kostyshak wrote: >> >> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch >> wrote: >>> >>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote: >>>> >>>> >>>> I have now put the code into a temporary package for testing; if anyone >>>> is interested, for a few days it will be downloadable from >>>> >>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz >>> >>> >>> >>> Sorry, error in the URL. It should be >>> >>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz >> >> >> Works well. A couple of things I noticed: >> >> (1) >> md5sum is being called on directories, which causes warnings. (If this >> is not viewed as undesirable, please ignore the rest of this comment.) >> Should this be the responsibility of the user (by passing arguments to >> list.files)? In the example, changing >> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE) >> to >> fileSnapshot(dir, file.info=TRUE, md5sum=TRUE, include.dirs=FALSE, >> recursive=TRUE") >> >> gets rid of the warnings. But perhaps the user just wants to exclude >> directories for the md5sum calculations. This can't be controlled from >> fileSnapshot. > > > I don't see the warnings, I just get NA values. I'll try to see why there's > a difference. (One possibility is my platform (Windows); another is that > I'm generally testing in R-patched and R-devel rather than the 3.0.1 release > version.) I would rather suppress the warnings than make the user avoid > them. > > >> >> Or, should the "if (md5sum)" chunk subset "fullnames" using file_test >> or file.info to exclude directories (and then fill in the directories >> with NA)? >> >> (2) >> If I run example(changedFiles) several times, sometimes I get: >> >> chngdF> changedFiles(snapshot) >> File changes: >>mtime md5sum >> file2 TRUE TRUE >> >> and other times I get: >> >> chngdF> changedFiles(snapshot) >> File changes: >>md5sum >> file2 TRUE >> >> I wonder why. > > > Sometimes the example runs so quickly that the new version has exactly the > same modification time as the original. That's the risk of the mtime check. > If you put a delay between, you'll get consistent results. > > Duncan Murdoch > > >> >> Scott >> >>> sessionInfo() >> >> R Under development (unstable) (2013-08-31 r63780) >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> locale: >> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C >> [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 >> [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 >> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C >> [9] LC_ADDRESS=C LC_TELEPHONE=C >> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C >> >> attached base packages: >> [1] stats graphics grDevices utils datasets methods base >> >> other attached packages: >> [1] testpkg_1.0 >> >> loaded via a namespace (and not attached): >> [1] tools_3.1.0 >>> >>> >> >> >> -- >> Scott Kostyshak >> Economics PhD Candidate >> Princeton University >> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Comments requested on "changedFiles" function
On Fri, Sep 6, 2013 at 7:03 PM, Duncan Murdoch wrote: > On 13-09-06 9:21 PM, Karl Millar wrote: >> >> Hi Duncan, >> >> I like the interface of this version a lot better, but there's still a >> bunch of implementation details that need fixing: >> >> * As previously mentioned, there are important cases where the mtime >> values change in ways that this code doesn't detect. >> * If the timestamp file (which is usually in the temp directory) gets >> deleted (which can happen after a moderate amount of time of >> inactivity on some systems), then the file_test('-nt', ...) will >> always return false, even if the file has changed. > > > If that happened without user intervention, I think it would break other > things in R -- the temp directory is supposed to last for the whole session. > But I should be checking anyway. Yes, it does break other things in R -- my experience has been that the help system seems to be the one that is impacted the most by this. FWIW, I've never seen the entire R temp directory deleted, just individual files and subdirectories in it, but even that probably depends on how the machine is configured. I suspect only a few users ever notice this, but my R use is probably somewhat anomalous and I think it only happens to R sessions that I haven't used for a few days. >> * If files get added or deleted between the two calls to list.files in >> fileSnapshot, it will fail with an error. > > > Yours won't work if path contains more than one directory. This is probably > a reasonable restriction, but it's inconsistent with list.files, so I'd like > to avoid it if I can find a way. I'm currently unsure what the behaviour when comparing snapshots with multiple directories should be. Presumably we should have the property that (horribly abusing notation for succinctness): compareSnapshots(c(a1, a2), c(a1, a2)) is the same as concatenating (in some form) compareSnapshots(a1, a1) and compareSnapshots(a2, a2) and there's a bunch of ways we could concatenate -- we could return a list of results, or a single result where each of the 'added, deleted, modified' fields are a list, or where we concatenate the 'added, deleted, modified' fields together into three simple vectors. Concatenating the vectors together like this is appealing, but unless you're using the full names, it doesn't include the information of which directory the changes are in, and using the full names doesn't work in the case where you're comparing different sets of directories, e.g. compareSnapshots(c(a1, a2), c(b1, b2)), where there is no sensible choice for a full name. The list options don't have this problem, but are harder to work with, particularly for the common case where there's only a single directory. You'd also have to be somewhat careful with filenames that occur in both directories. Maybe I'm just being dense, but I don't see a way to do this thats clear, easy to use and wouldn't confuse users at the moment. Karl > Duncan Murdoch > > >> * If the path is on a remote file system, tempdir is local, and >> there's significant clock skew, then you can get incorrect results. >> >> Unfortunately, these aren't just theoretical scenarios -- I've had the >> misfortune to run up against all of them in the past. >> >> I've attached code that's loosely based on your implementation that >> solves these problems AFAICT. Alternatively, Hadley's code handles >> all of these correctly, with the exception that compare_state doesn't >> handle the case where safe_digest returns NA very well. >> >> Regards, >> >> Karl >> >> On Fri, Sep 6, 2013 at 5:40 PM, Duncan Murdoch >> wrote: >>> >>> On 13-09-06 7:40 PM, Scott Kostyshak wrote: >>>> >>>> >>>> On Fri, Sep 6, 2013 at 3:46 PM, Duncan Murdoch >>>> >>>> wrote: >>>>> >>>>> >>>>> On 06/09/2013 2:20 PM, Duncan Murdoch wrote: >>>>>> >>>>>> >>>>>> >>>>>> I have now put the code into a temporary package for testing; if >>>>>> anyone >>>>>> is interested, for a few days it will be downloadable from >>>>>> >>>>>> fisher.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz >>>>> >>>>> >>>>> >>>>> >>>>> Sorry, error in the URL. It should be >>>>> >>>>> http://www.stats.uwo.ca/faculty/murdoch/temp/testpkg_1.0.tar.gz >>>
Re: [Rd] Using long long types in C++
Romain, Can you use int64_t and uint_t64 instead? IMHO that would be more useful than long long anyway. Karl On Sep 19, 2013 5:33 PM, "Patrick Welche" wrote: > On Fri, Sep 20, 2013 at 12:51:52AM +0200, rom...@r-enthusiasts.com wrote: > > In Rcpp we'd like to do something useful for types such as long long > > and unsigned long long. > ... > > But apparently this is still not enough and on some versions of gcc > > (e.g. 4.7 something), -pedantic still generates the warnings unless > > we also use -Wno-long-long > > Can you also add -std=c++0x or is that considered as bad as adding > -Wno-long-long? > > (and why not use autoconf's AC_TYPE_LONG_LONG_INT and > AC_TYPE_UNSIGNED_LONG_LONG_INT for the tests?) > > Cheers, > > Patrick > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Possible problem with namespaceImportFrom() and methods for generic primitive functions
Hi all, I have a problem with a package that imports two other packages which both export a method for the `[` primitive function. I set up a reproducible example here: https://github.com/kforner/namespaceImportFrom_problem.git Basically, the testPrimitiveImport package imports testPrimitiveExport1 and testPrimitiveExport2, which both export a S4 class and a `[` method for the class. Then: R CMD INSTALL -l lib testPrimitiveExport1 R CMD INSTALL -l lib testPrimitiveExport2 The command: R CMD INSTALL -l lib testPrimitiveImport gives me: Error in namespaceImportFrom(self, asNamespace(ns)) : trying to get slot "package" from an object of a basic class ("function") with no slots I get the same message if I check the package (with R CMD check), or even if I try to load it using devtools::load_all() I tried to investigate the problem, and I found that the error arises in the base::namespaceImportFrom() function, and more precisely in this block: for (n in impnames) if (exists(n, envir = impenv, inherits = FALSE)) { if (.isMethodsDispatchOn() && methods:::isGeneric(n, ns)) { genNs <- get(n, envir = ns)@package Here n is '[', and the get(n, envir = ns) expression returns .Primitive("["), which is a function and has no @package slot. This will only occur if exists(n, envir = impenv, inherits = FALSE) returns TRUE, i.e. if the '[' symbol is already in the imports env of the package. In my case, the first call to namespaceImportFrom() is for the first import of testPrimitiveExport1, which runs fine and populate the imports env with '['. But for the second call, exists(n, envir = impenv, inherits = FALSE) will be TRUE, so that the offending line will be called. I do not know if the problem is on my side, e.g. from a misconfiguration of the NAMESPACE file, or if it is a bug and in which case what should be done. Any feedback appreciated. Karl Forner [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] unloadNamespace, getPackageName and "Created a package name xxx " warning
Dear all, Consider this code: >library("data.table") >unloadNamespace('data.table') It produces some warnings Warning in FUN(X[[1L]], ...) : Created a package name, 2013-10-29 17:05:51, when none found Warning in FUN(X[[1L]], ...) : Created a package name, 2013-10-29 17:05:51, when none found ... The warning is produced by the getPackageName() function. e.g. getPackageName(parent.env(getNamespace('data.table'))) I was wondering what could be done to get rid of these warnings, which I believe in the case "unloadNamespace" case are irrelevant. The stack of calls is: # where 3: sapply(where, getPackageName) # where 4: findClass(what, classWhere) # where 5: .removeSuperclassBackRefs(cl, cldef, searchWhere) # where 6: methods:::cacheMetaData(ns, FALSE, ns) # where 7: unloadNamespace(pkgname) So for instance: >findClass('data.frame', getNamespace('data.table')) generates a warning which once again seems irrelevant. On the top of my head, I could imagine adding an extra argument to getPackageName, say warning = TRUE, which would be set to FALSE in the getPackageName call in findClass() body. I also wonder if in the case of import namespaces, getPackageName() could not find a more appropriate name: >parent.env(getNamespace('data.table')) attr(,"name") [1] "imports:data.table" This namespace has a name that might be used to generate the package name. My question is: what should be done ? Thanks for your attention. Karl Forner [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] problem using rJava with parallel::mclapply
Dear all, I got an issue trying to parse excel files in parallel using XLConnect, the process hangs forever. Martin Studer, the maintainer of XLConnect kindly investigated the issue, identified rJava as a possible cause of the problem: This does not work (hangs): library(parallel) require(rJava) .jinit() res <- mclapply(1:2, function(i) { J("java.lang.Runtime")$getRuntime()$gc() 1 }, mc.cores = 2) but this works: library(parallel) res <- mclapply(1:2, function(i) { require(rJava) .jinit() J("java.lang.Runtime")$getRuntime()$gc() 1 }, mc.cores = 2) To cite Martin, it seems to work with mclapply when the JVM process is initialized after forking. Is this a bug or a limitation of rJava ? Or is there a good practice for rJava clients to avoid this problem ? Best, Karl P.S. > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.1 [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] problem using rJava with parallel::mclapply
Thanks Malcolm, But it does seem to solve the problem. On Mon, Nov 11, 2013 at 6:48 PM, Cook, Malcolm wrote: > Karl, > > I have the following notes to self that may be pertinent: > > options(java.parameters= > ## Must preceed `library(XLConnect)` in order to prevent "Java > ## requested System.exit(130), closing R." which happens when > ## rJava quits R upon trapping INT (control-c), as is done by > ## XLConnect (and playwith?), below. (c.f.: > ## https://www.rforge.net/bugzilla/show_bug.cgi?id=237) > "-Xrs") > > > ~Malcolm > > > > >-Original Message- > >From: r-devel-boun...@r-project.org [mailto: > r-devel-boun...@r-project.org] On Behalf Of Karl Forner > >Sent: Monday, November 11, 2013 11:41 AM > >To: r-devel@r-project.org > >Cc: Martin Studer > >Subject: [Rd] problem using rJava with parallel::mclapply > > > >Dear all, > > > >I got an issue trying to parse excel files in parallel using XLConnect, > the > >process hangs forever. > >Martin Studer, the maintainer of XLConnect kindly investigated the issue, > >identified rJava as a possible cause of the problem: > > > >This does not work (hangs): > >library(parallel) > >require(rJava) > >.jinit() > >res <- mclapply(1:2, function(i) { > > J("java.lang.Runtime")$getRuntime()$gc() > > 1 > > }, mc.cores = 2) > > > >but this works: > >library(parallel) > >res <- mclapply(1:2, function(i) { > > require(rJava) > > .jinit() > > J("java.lang.Runtime")$getRuntime()$gc() > > 1 > >}, mc.cores = 2) > > > >To cite Martin, it seems to work with mclapply when the JVM process is > >initialized after forking. > > > >Is this a bug or a limitation of rJava ? > >Or is there a good practice for rJava clients to avoid this problem ? > > > >Best, > >Karl > > > >P.S. > >> sessionInfo() > >R version 3.0.1 (2013-05-16) > >Platform: x86_64-unknown-linux-gnu (64-bit) > > > >locale: > > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > > [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 > > [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 > > [7] LC_PAPER=C LC_NAME=C > > [9] LC_ADDRESS=C LC_TELEPHONE=C > >[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > > >attached base packages: > >[1] stats graphics grDevices utils datasets methods base > > > >loaded via a namespace (and not attached): > >[1] tools_3.0.1 > > > > [[alternative HTML version deleted]] > > > >__ > >R-devel@r-project.org mailing list > >https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How to catch warnings sent by arguments of s4 methods ?
Hello, I apologized if this had already been addressed, and I also submitted this problem on SO: http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection Example code: setGeneric('my_method', function(x) standardGeneric('my_method') ) setMethod('my_method', 'ANY', function(x) invisible()) withCallingHandlers(my_method(warning('argh')), warning = function(w) { stop('got warning:', w) }) # this does not catch the warning It seems that the warnings emitted during the evaluation of the arguments of S4 methods can not get caught using withCallingHandlers(). Is this expected ? Is there a work-around ? Best, Karl Forner __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to catch warnings sent by arguments of s4 methods ?
Hi, Just to add some information and to clarify why I feel this is an important issue. If you have a S4 method with a default argument, it seems that you can not catch the warnings emitted during their evaluation. It matters because on some occasions those warnings carry an essential information, that your code needs to use. Martin Morgan added some information about this issue on: http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection Basically the C function R_dispatchGeneric uses R_tryEvalSilent to evaluate the method arguments, that seems no to use the calling handlers. Best, Karl On Fri, Nov 29, 2013 at 11:30 AM, Karl Forner wrote: > Hello, > > I apologized if this had already been addressed, and I also submitted > this problem on SO: > http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection > > Example code: > setGeneric('my_method', function(x) standardGeneric('my_method') ) > setMethod('my_method', 'ANY', function(x) invisible()) > > withCallingHandlers(my_method(warning('argh')), warning = function(w) > { stop('got warning:', w) }) > # this does not catch the warning > > It seems that the warnings emitted during the evaluation of the > arguments of S4 methods can not get caught using > withCallingHandlers(). > > Is this expected ? Is there a work-around ? > > Best, > Karl Forner __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Status of reserved keywords and builtins
According to http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Reserved-words if else repeat while function for in next break TRUE FALSE NULL Inf NaN NA NA_integer_ NA_real_ NA_complex_ NA_character_ ... ..1 ..2 etc. are all reserved keywords. However, in R 3.0.2 you can do things like: `if` <- function(cond, val1, val2) val2 after which if(TRUE) 1 else 2 returns 2. Similarly, users can change the implementation of `<-`, `(`, `{`, `||` and `&&`. Two questions: - Is this intended behaviour? - If so, would it be a good idea to change the language definition to prevent this? Doing so would both have the benefits that users could count on keywords having their normal interpretation, and allow R implementations to implement these more efficiently, including not having to lookup the symbol each time. It'd break any code that assumes that this is valid, but hopefully there's little or no code that does. Thanks Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [PATCH] Code coverage support proof of concept
Hello, I submit a patch for review that implements code coverage tracing in the R interpreter. It records the lines that are actually executed and their associated frequency for which srcref information is available. I perfectly understands that this patch will not make its way inside R as it is, that they are many concerns of stability, compatibility, maintenance and so on. I would like to have the code reviewed, and proper guidance on how to get this feature available at one point in R, in base R or as a package or patch if other people are interested. Usage Rcov_start() # your code to trace here res <- Rcov_stop() res is currently a hashed env, with traced source filenames associated with 2-columns matrices holding the line numbers and their frequencies. How it works - I added a test in getSrcref(), that records the line numbers if code coverage is started. The overhead should be minimal since for a given file, subsequent covered lines will be stored in constant time. I use a hased env to store the occurrences by file. I added two entry points in the utils package (Rcov_start() and Rcov_stop()) Example - * untar the latest R-devel and cd into it * patch -p1 < rdev-cov-patch.txt * ./configure [... ] && make && [sudo] make install * install the devtools package * run the following script using Rscript library(methods) library(devtools) pkg <- download.packages('testthat', '.', repos = "http://stat.ethz.ch/CRAN";) untar(pkg[1, 2]) Rcov_start() test('testthat') env <- Rcov_stop() res <- lapply(ls(env), get, envir = env) names(res) <- ls(env) print(res) This will hopefully output something like: $`.../testthat/R/auto-test.r` [,1] [,2] [1,] 331 [2,] 801 $`.../testthat/R/colour-text.r` [,1] [,2] [1,] 181 [2,] 19 106 [3,] 20 106 [4,] 22 106 [5,] 23 106 [6,] 401 [7,] 591 [8,] 701 [9,] 71 106 ... Karl Forner Disclaimer - There are probably bugs and ugly statements, but this is just a proof of concept. This is untested and only run on a linux x86_64 diff -ruN R-devel/src/library/utils/man/Rcov_start.Rd R-devel-cov/src/library/utils/man/Rcov_start.Rd --- R-devel/src/library/utils/man/Rcov_start.Rd 1970-01-01 01:00:00.0 +0100 +++ R-devel-cov/src/library/utils/man/Rcov_start.Rd 2014-03-05 16:07:45.907596276 +0100 @@ -0,0 +1,26 @@ +% File src/library/utils/man/Rcov_start.Rd +% Part of the R package, http://www.R-project.org +% Copyright 1995-2010 R Core Team +% Distributed under GPL 2 or later + +\name{Rcov_start} +\alias{Rcov_start} +\title{Start Code Coverage analysis of R's Execution} +\description{ + Start Code Coverage analysis of the execution of \R expressions. +} +\usage{ +Rcov_start(nb_lines = 1L, growth_rate = 2) +} +\arguments{ + \item{nb_lines}{ +Initial max number of lines per source file. + } + \item{growth_rate}{ +growth factor of the line numbers vectors per filename. +If a reached line number L is greater than nb_lines, the vector will +be reallocated with provisional size of growth_rate * L. + } +} + +\keyword{utilities} diff -ruN R-devel/src/library/utils/man/Rcov_stop.Rd R-devel-cov/src/library/utils/man/Rcov_stop.Rd --- R-devel/src/library/utils/man/Rcov_stop.Rd 1970-01-01 01:00:00.0 +0100 +++ R-devel-cov/src/library/utils/man/Rcov_stop.Rd 2014-03-03 16:14:25.883440716 +0100 @@ -0,0 +1,20 @@ +% File src/library/utils/man/Rcov_stop.Rd +% Part of the R package, http://www.R-project.org +% Copyright 1995-2010 R Core Team +% Distributed under GPL 2 or later + +\name{Rcov_stop} +\alias{Rcov_stop} +\title{Start Code Coverage analysis of R's Execution} +\description{ + Start Code Coverage analysis of the execution of \R expressions. +} +\usage{ +Rcov_stop() +} + +\value{ + a named list of integer vectors holding occurrences counts (line number, frequency) + , named after the covered source file names. +} +\keyword{utilities} diff -ruN R-devel/src/library/utils/NAMESPACE R-devel-cov/src/library/utils/NAMESPACE --- R-devel/src/library/utils/NAMESPACE 2013-09-10 03:04:59.0 +0200 +++ R-devel-cov/src/library/utils/NAMESPACE 2014-03-03 16:18:48.407430952 +0100 @@ -1,7 +1,7 @@ # Refer to all C routines by their name prefixed by C_ useDynLib(utils, .registration = TRUE, .fixes = "C_") -export("?", .DollarNames, CRAN.packages, Rprof, Rprofmem, RShowDoc, +export("?", .DollarNames, CRAN.packages, Rcov_start, Rcov_stop, Rprof, Rprofmem, RShowDoc, RSiteSearch, URLdecode, URLencode, View, adist, alarm, apropos, aregexec, argsAnywhere, assignInMyNamespace, assignInNamespace, as.roman, as.person, as.personList, as.relistable, aspell, diff -ruN R-devel/src/library/utils/R/Rcov.R R-devel-cov/src/library/utils/R/Rcov.R --- R-devel/src/library/utils/R/Rcov.R
Re: [Rd] [PATCH] Code coverage support proof of concept
Here's an updated version of the patch that fixes a stack imbalance bug. N.B: the patch seems to work fine with R-3.0.2 too. On Wed, Mar 5, 2014 at 5:16 PM, Karl Forner wrote: > Hello, > > I submit a patch for review that implements code coverage tracing in > the R interpreter. > It records the lines that are actually executed and their associated > frequency for which srcref information is available. > > I perfectly understands that this patch will not make its way inside R > as it is, that they are many concerns of stability, compatibility, > maintenance and so on. > I would like to have the code reviewed, and proper guidance on how to > get this feature available at one point in R, in base R or as a > package or patch if other people are interested. > > Usage > > Rcov_start() > # your code to trace here > res <- Rcov_stop() > > res is currently a hashed env, with traced source filenames associated > with 2-columns matrices holding the line numbers and their > frequencies. > > > How it works > - > I added a test in getSrcref(), that records the line numbers if code > coverage is started. > The overhead should be minimal since for a given file, subsequent > covered lines will be stored > in constant time. I use a hased env to store the occurrences by file. > > I added two entry points in the utils package (Rcov_start() and Rcov_stop()) > > > Example > - > * untar the latest R-devel and cd into it > * patch -p1 < rdev-cov-patch.txt > * ./configure [... ] && make && [sudo] make install > * install the devtools package > * run the following script using Rscript > > library(methods) > library(devtools) > pkg <- download.packages('testthat', '.', repos = "http://stat.ethz.ch/CRAN";) > untar(pkg[1, 2]) > > Rcov_start() > test('testthat') > env <- Rcov_stop() > > res <- lapply(ls(env), get, envir = env) > names(res) <- ls(env) > print(res) > > > This will hopefully output something like: > $`.../testthat/R/auto-test.r` > [,1] [,2] > [1,] 331 > [2,] 801 > > $`.../testthat/R/colour-text.r` > [,1] [,2] > [1,] 181 > [2,] 19 106 > [3,] 20 106 > [4,] 22 106 > [5,] 23 106 > [6,] 401 > [7,] 591 > [8,] 701 > [9,] 71 106 > ... > > > Karl Forner > > > Disclaimer > - > There are probably bugs and ugly statements, but this is just a proof > of concept. This is untested and only run on a linux x86_64 diff -urN -x '.*' R-devel/src/library/utils/man/Rcov_start.Rd R-develcov/src/library/utils/man/Rcov_start.Rd --- R-devel/src/library/utils/man/Rcov_start.Rd 1970-01-01 01:00:00.0 +0100 +++ R-develcov/src/library/utils/man/Rcov_start.Rd 2014-03-07 18:41:33.117646470 +0100 @@ -0,0 +1,26 @@ +% File src/library/utils/man/Rcov_start.Rd +% Part of the R package, http://www.R-project.org +% Copyright 1995-2010 R Core Team +% Distributed under GPL 2 or later + +\name{Rcov_start} +\alias{Rcov_start} +\title{Start Code Coverage analysis of R's Execution} +\description{ + Start Code Coverage analysis of the execution of \R expressions. +} +\usage{ +Rcov_start(nb_lines = 1L, growth_rate = 2) +} +\arguments{ + \item{nb_lines}{ +Initial max number of lines per source file. + } + \item{growth_rate}{ +growth factor of the line numbers vectors per filename. +If a reached line number L is greater than nb_lines, the vector will +be reallocated with provisional size of growth_rate * L. + } +} + +\keyword{utilities} diff -urN -x '.*' R-devel/src/library/utils/man/Rcov_stop.Rd R-develcov/src/library/utils/man/Rcov_stop.Rd --- R-devel/src/library/utils/man/Rcov_stop.Rd 1970-01-01 01:00:00.0 +0100 +++ R-develcov/src/library/utils/man/Rcov_stop.Rd 2014-03-07 18:41:33.117646470 +0100 @@ -0,0 +1,20 @@ +% File src/library/utils/man/Rcov_stop.Rd +% Part of the R package, http://www.R-project.org +% Copyright 1995-2010 R Core Team +% Distributed under GPL 2 or later + +\name{Rcov_stop} +\alias{Rcov_stop} +\title{Start Code Coverage analysis of R's Execution} +\description{ + Start Code Coverage analysis of the execution of \R expressions. +} +\usage{ +Rcov_stop() +} + +\value{ + a named list of integer vectors holding occurrences counts (line number, frequency) + , named after the covered source file names. +} +\keyword{utilities} diff -urN -x '.*' R-devel/src/library/utils/NAMESPACE R-develcov/src/library/utils/NAMESPACE --- R-devel/src/library/utils/NAMESPACE 2013-09-10 03:04:59.0 +0200 +++ R-develcov/src/library/utils/NAMESPACE 2014-03-07 18:41:33.121646470 +0100 @@ -1,7 +1,7 @@ #
Re: [Rd] [RFC] A case for freezing CRAN
I think what you really want here is the ability to easily identify and sync to CRAN snapshots. The easy way to do this is setup a CRAN mirror, but back it up with version control, so that it's easy to reproduce the exact state of CRAN at any given point in time. CRAN's not particularly large and doesn't churn a whole lot, so most version control systems should be able to handle that without difficulty. Using svn, mod_dav_svn and (maybe) mod_rewrite, you could setup the server so that e.g.: http://my.cran.mirror/repos/2013-01-01/ is a mirror of how CRAN looked at midnight 2013-01-01. Users can then set their repository to that URL, and will have a stable snapshot to work with, and can have all their packages built with that snapshot if they like. For reproducibility purposes, all users need to do is to agree on the same date to use. For publication purposes, the date of the snapshot should be sufficient. We'd need a version of update.packages() that force-syncs all the packages to the version in the repository, even if they're downgrades, but otherwise it ought to be fairly straight-forward. FWIW, we do something similar internally at Google. All the packages that a user has installed come from the same source control revision, where we know that all the package versions are mutually compatible. It saves a lot of headaches, and users can rollback to any previous point in time easily if they run into problems. On Wed, Mar 19, 2014 at 7:45 PM, Jeroen Ooms wrote: > On Wed, Mar 19, 2014 at 6:55 PM, Michael Weylandt > wrote: >> Reading this thread again, is it a fair summary of your position to say >> "reproducibility by default is more important than giving users access to >> the newest bug fixes and features by default?" It's certainly arguable, but >> I'm not sure I'm convinced: I'd imagine that the ratio of new work being >> done vs reproductions is rather high and the current setup optimizes for >> that already. > > I think that separating development from released branches can give us > both reliability/reproducibility (stable branch) as well as new > features (unstable branch). The user gets to pick (and you can pick > both!). The same is true for r-base: when using a 'released' version > you get 'stable' base packages that are up to 12 months old. If you > want to have the latest stuff you download a nightly build of r-devel. > For regular users and reproducible research it is recommended to use > the stable branch. However if you are a developer (e.g. package > author) you might want to develop/test/check your work with the latest > r-devel. > > I think that extending the R release cycle to CRAN would result both > in more stable released versions of R, as well as more freedom for > package authors to implement rigorous change in the unstable branch. > When writing a script that is part of a production pipeline, or sweave > paper that should be reproducible 10 years from now, or a book on > using R, you use stable version of R, which is guaranteed to behave > the same over time. However when developing packages that should be > compatible with the upcoming release of R, you use r-devel which has > the latest versions of other CRAN and base packages. > > >> What I'm trying to figure out is why the standard "install the following >> list of package versions" isn't good enough in your eyes? > > Almost nobody does this because it is cumbersome and impractical. We > can do so much better than this. Note that in order to install old > packages you also need to investigate which versions of dependencies > of those packages were used. On win/osx, users need to manually build > those packages which can be a pain. All in all it makes reproducible > research difficult and expensive and error prone. At the end of the > day most published results obtain with R just won't be reproducible. > > Also I believe that keeping it simple is essential for solutions to be > practical. If every script has to be run inside an environment with > custom libraries, it takes away much of its power. Running a bash or > python script in Linux is so easy and reliable that entire > distributions are based on it. I don't understand why we make our > lives so difficult in R. > > In my estimation, a system where stable versions of R pull packages > from a stable branch of CRAN will naturally resolve the majority of > the reproducibility and reliability problems with R. And in contrast > to what some people here are suggesting it does not introduce any > limitations. If you want to get the latest stuff, you either grab a > copy of r-devel, or just enable the testing branch and off you go. > Debian 'testing' works in a similar way, see > http://www.debian.org/devel/testing. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/l
Re: [Rd] The case for freezing CRAN
Given the version / dated snapshots of CRAN, and an agreement that reproducibility is the responsibility of the study author, the author simply needs to sync all their packages to a chosen date, run the analysis and publish the chosen date. It is true that this doesn't include compilers, OS, system packages etc, but in my experience those are significantly more stable than CRAN packages. Also, my previous description of how to serve up a dated CRAN was way too complicated. Since most of the files on CRAN never change, they don't need version control. Only the metadata about which versions are current really needs to be tracked, and that's small enough that it could be stored in static files. On Thu, Mar 20, 2014 at 6:32 AM, Dirk Eddelbuettel wrote: > > No attempt to summarize the thread, but a few highlighted points: > > o Karl's suggestion of versioned / dated access to the repo by adding a >layer to webaccess is (as usual) nice. It works on the 'supply' side. > But >Jeroen's problem is on the demand side. Even when we know that an >analysis was done on 20xx-yy-zz, and we reconstruct CRAN that day, it > only >gives us a 'ceiling' estimate of what was on the machine. In production >or lab environments, installations get stale. Maybe packages were > already >a year old? To me, this is an issue that needs to be addressed on the >'demand' side of the user. But just writing out version numbers is not >good enough. > > o Roger correctly notes that R scripts and packages are just one issue. >Compilers, libraries and the OS matter. To me, the natural approach > these >days would be to think of something based on Docker or Vagrant or (if > you >must, VirtualBox). The newer alternatives make snapshotting very cheap >(eg by using Linux LXC). That approach reproduces a full environemnt as >best as we can while still ignoring the hardware layer (and some readers >may recall the infamous Pentium bug of two decades ago). > > o Reproduciblity will probably remain the responsibility of study >authors. If an investigator on a mega-grant wants to (or needs to) > freeze, >they do have the tools now. Requiring the need of a few to push work on >those already overloaded (ie CRAN) and changing the workflow of > everybody >is a non-starter. > > o As Terry noted, Jeroen made some strong claims about exactly how flawed >the existing system is and keeps coming back to the example of 'a JSS >paper that cannot be re-run'. I would really like to see empirics on >this. Studies of reproducibility appear to be publishable these days, > so >maybe some enterprising grad student wants to run with the idea of >actually _testing_ this. We maybe be above Terry's 0/30 and nearer to >Kevin's 'low'/30. But let's bring some data to the debate. > > o Overall, I would tend to think that our CRAN standards of releasing with >tests, examples, and checks on every build and release already do a much >better job of keeping things tidy and workable than in most if not all >other related / similar open source projects. I would of course welcome >contradictory examples. > > Dirk > > -- > Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Fwd: [RFC] A case for freezing CRAN
Interesting and strategic topic indeed. One other point is that reproducibility (and backwards compatibility) is also very important in the industry. To get acceptance it can really help if you can easily reproduce results. Concerning the arguments that I read in this discussion: - "do it yourself" The point is to discuss to find the best way for the community, and thinking collectively about this general problems can never hurt. Once a consensus is reached we can think about the resources. - "don't think the effort is worth it, instead install a specific version of package" + "new sessionInfoPlus()": This could work, meaning achieving the same result, but not at the same price for users, because it would require each script writer to include its sessionInfo(), to store them along the scripts in repositories. And prior to running the scripts, you would have to install the snapshot of packages, not mentioning install problems and so on. - "versions automatically at package build time (n DESCRIPTION)": does not really solve the problems, because if package A is submitted with dependency B-1.0 and package C with dependency B-2 and do you do ? - "exact deps versions": will put a lot of burden of the developer. - "I do not want to wait a year to get a new (or updated package)", "access to bug fixes": Installed packages are already setup as libraries. By default you have the library inside the R installation, that contains base packages + those installed by install.packages() if you have the proper permissions, the personal library otherwise. Why not organizing these libraries so that: - normal CRAN versions associated with the R version gets installed along the base packages - "critical updates", meaning important bugs found in normal CRAN versions installed in the critical/ library - additional packages and updated package in another library. This way, using the existing .libPaths() mechanism, or equivalently the lib.loc option of library, one could easily switch between the library that will ensure full compatibility and reproducibility with the R version, or add critical updates, or use the newer or updated packages. - new use case. Here in Quartz bio we have two architectures, so two R installations for each R version. It is quite cumbersome to keep them consistent because the installed version depends on the moment you perform the install.packages(). So I second the Jeroen proposal to have a snapshot of packages versions tied to a given R version, well tested altogether. This implies as stated by Herve to keep all package source versions, and will solve the bioC reproducibility issue. Best, Karl Forner On Tue, Mar 18, 2014 at 9:24 PM, Jeroen Ooms wrote: > This came up again recently with an irreproducible paper. Below an > attempt to make a case for extending the r-devel/r-release cycle to > CRAN packages. These suggestions are not in any way intended as > criticism on anyone or the status quo. > > The proposal described in [1] is to freeze a snapshot of CRAN along > with every release of R. In this design, updates for contributed > packages treated the same as updates for base packages in the sense > that they are only published to the r-devel branch of CRAN and do not > affect users of "released" versions of R. Thereby all users, stacks > and applications using a particular version of R will by default be > using the identical version of each CRAN package. The bioconductor > project uses similar policies. > > This system has several important advantages: > > ## Reproducibility > > Currently r/sweave/knitr scripts are unstable because of ambiguity > introduced by constantly changing cran packages. This causes scripts > to break or change behavior when upstream packages are updated, which > makes reproducing old results extremely difficult. > > A common counter-argument is that script authors should document > package versions used in the script using sessionInfo(). However even > if authors would manually do this, reconstructing the author's > environment from this information is cumbersome and often nearly > impossible, because binary packages might no longer be available, > dependency conflicts, etc. See [1] for a worked example. In practice, > the current system causes many results or documents generated with R > no to be reproducible, sometimes already after a few months. > > In a system where contributed packages inherit the r-base release > cycle, scripts will behave the same across users/systems/time within a > given version of R. This severely reduces ambiguity of R behavior, and > has the potential of making reproducibility a natural part of the > language, rather than a tedious exercise. > > ## Repository Management > > Just like scripts suffer
Re: [Rd] Fwd: [RFC] A case for freezing CRAN
> On Fri, Mar 21, 2014 at 12:08 PM, Karl Forner wrote: > [...] > > - "exact deps versions": >> will put a lot of burden of the developer. >> > > Not really, in my opinion, if you have the proper tools. Most likely when > you develop any given version of your package you'll use certain versions > of other packages, probably the most recent at that time. > > If there is a build tool that just puts these version numbers into the > DESCRIPTION file, you don't need to do anything extra. > I of course assumed that this part was automatic. > > In fact, it is easier for the developer, because if you work on your > release for a month, at the end you don't have to make sure that your > package works with packages that were updated in the meanwhile. > Hmm, what if your package depends on packages A and B, and that A depends on C v1.0 and B depends on C v1.1 ? This is just an example but I imagine that will lead to a lot of complexities. > > Gabor > > [...] > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Fwd: [RFC] A case for freezing CRAN
On Fri, Mar 21, 2014 at 6:27 PM, Gábor Csárdi wrote: > On Fri, Mar 21, 2014 at 12:40 PM, Karl Forner wrote: > [...] > >> Hmm, what if your package depends on packages A and B, and that A depends >> on C v1.0 and B depends on C v1.1 ? This is just an example but I imagine >> that will lead to a lot of complexities. >> > > You'll have to be able to load (but not attach, of course!) multiple > versions of the same package at the same time. The search paths are set up > so that A imports v1.0 of C, B imports v1.1. This is possible to support > with R's namespaces and imports mechanisms, I believe. > not really: I think there are still cases (unfortunately) where you have to use depends, e.g. when defining S4 methods for classes implemented in other packages. But my point is that you would need really really smart tools, AND to be able to install precise versions of packages. > It requires quite some work, though, so I am obviously not saying to > switch to it tomorrow. Having a CRAN-devel seems simpler. > Indeed. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Rjulia: a package for R call Julia through Julia C API
Excellent. By any chance are you aware of a julia way to perform the opposite: call R from julia ? Thanks On Fri, Jun 6, 2014 at 7:23 AM, Yu Gong wrote: > hello everyone,recently I write a package for R call Julia through Julia C > API > https://github.com/armgong/RJulia > now the package can do > 1 finish basic typemapping, now int boolean double R vector to julia > 1d-array is ok,and julia int32 int64 float64 bool 1D array to R vector is > also ok. > 2 R STRSXP to julia string 1D array and Julia string array to STRSXP is > written but not sure it is correct or not? > 3 Can Set Julia gc disable at initJulia. > to build Rjulia need git master branch julia and R. > the package now only finish very basic function, need more work to finish. > so any comments and advice is welcome. > currently it can be use on unix and windows console,on windows gui it > crashed. > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] regression bug with getParseData and/or parse in R-3.1.0
Hi, With R-3.1.0 I get: > getParseData(parse(text = "{1}", keep.source = TRUE)) line1 col1 line2 col2 id parent token terminal text 7 11 13 7 9 exprFALSE 1 11 11 1 7 '{' TRUE{ 2 12 12 2 3 NUM_CONST TRUE1 3 12 12 3 5 exprFALSE 4 13 13 4 7 '}' TRUE} Which has two problems: 1) the parent of the first expression (id=7) should be 0 2) the parent of the expression with id=3 should be 7 For reference, with R-3.0.2: > getParseData(parse(text = "{1}", keep.source = TRUE)) line1 col1 line2 col2 id parent token terminal text 7 11 13 7 0 exprFALSE 1 11 11 1 7 '{' TRUE{ 2 12 12 2 3 NUM_CONST TRUE1 3 12 12 3 7 exprFALSE 4 13 13 4 7 '}' TRUE} which is correct. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] regression bug with getParseData and/or parse in R-3.1.0
Thank you Duncan. I confirm: R version 3.1.0 Patched (2014-06-11 r65921) -- "Spring Dance" > getParseData(parse(text = "{1}", keep.source = TRUE)) line1 col1 line2 col2 id parent token terminal text 7 11 13 7 0 exprFALSE 1 11 11 1 7 '{' TRUE{ 2 12 12 2 3 NUM_CONST TRUE1 3 12 12 3 7 exprFALSE 4 13 13 4 7 '}' TRUE} Karl On Thu, Jun 12, 2014 at 2:39 PM, Duncan Murdoch wrote: > On 12/06/2014, 7:37 AM, Karl Forner wrote: > > Hi, > > > > With R-3.1.0 I get: > >> getParseData(parse(text = "{1}", keep.source = TRUE)) > > line1 col1 line2 col2 id parent token terminal text > > 7 11 13 7 9 exprFALSE > > 1 11 11 1 7 '{' TRUE{ > > 2 12 12 2 3 NUM_CONST TRUE1 > > 3 12 12 3 5 exprFALSE > > 4 13 13 4 7 '}' TRUE} > > > > Which has two problems: > > 1) the parent of the first expression (id=7) should be 0 > > 2) the parent of the expression with id=3 should be 7 > > I believe this has been fixed in R-patched. Could you please check? > > The problem was due to an overly aggressive optimization introduced in > R-devel in June, 2013. It assumed a vector was initialized to zeros, > but in some fairly common circumstances it wasn't, so the parent > calculation was wrong. > > Luckily 3.1.1 has been delayed by incompatible schedules of various > people, or this fix might have missed that too. As with some other > fixes in R-patched, this is a case of a bug that sat there for most of a > year before being reported. Please people, test pre-release versions. > > Duncan Murdoch > > > > > > For reference, with R-3.0.2: > > > >> getParseData(parse(text = "{1}", keep.source = TRUE)) > > line1 col1 line2 col2 id parent token terminal text > > 7 11 13 7 0 exprFALSE > > 1 11 11 1 7 '{' TRUE{ > > 2 12 12 2 3 NUM_CONST TRUE1 > > 3 12 12 3 7 exprFALSE > > 4 13 13 4 7 '}' TRUE} > > > > which is correct. > > > > [[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] isOpen() misbehaviour
Hello, >From the doc, it says: "isOpen returns a logical value, whether the connection is currently open." But actually it seems to die on closed connections: > con <- file() > isOpen(con) [1] TRUE > close(con) > isOpen(con) Error in isOpen(con) : invalid connection Is it expected ? Tested on R-3.0.2 and R version 3.1.0 Patched (2014-06-11 r65921) on linux x86_64 Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] isOpen() misbehaviour
Thanks Joris, it makes sense now, though the doc is a bit misleading. On Thu, Jun 19, 2014 at 3:22 PM, Joris Meys wrote: > Hi Karl, > > that is expected. The moment you close a connection, it is destroyed as well > (see ?close). A destroyed connection cannot be tested. In fact, I've used > isOpen() only in combination with the argument rw. > >> con <- file("clipboard",open="r") >> isOpen(con,"write") > [1] FALSE > > cheers > > > On Thu, Jun 19, 2014 at 3:10 PM, Karl Forner wrote: >> >> Hello, >> >> >From the doc, it says: >> "isOpen returns a logical value, whether the connection is currently >> open." >> >> But actually it seems to die on closed connections: >> > con <- file() >> > isOpen(con) >> [1] TRUE >> > close(con) >> > isOpen(con) >> Error in isOpen(con) : invalid connection >> >> Is it expected ? >> Tested on R-3.0.2 and R version 3.1.0 Patched (2014-06-11 r65921) on >> linux x86_64 >> >> Karl >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Patch for R to fix some buffer overruns and add a missing PROTECT().
This patch is against current svn and contains three classes of fix: - Ensure the result is properly terminated after calls to strncpy() - Replace calls of sprintf() with snprintf() - Added a PROTECT() call in do_while which could cause memory errors if evaluating the condition results in a warning. Thanks, Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Patch for R to fix some buffer overruns and add a missing PROTECT().
Bug submitted. Thanks. On Tue, Sep 23, 2014 at 12:42 PM, Duncan Murdoch wrote: > On 23/09/2014 3:20 PM, Karl Millar wrote: >> >> This patch is against current svn and contains three classes of fix: >> - Ensure the result is properly terminated after calls to strncpy() >> - Replace calls of sprintf() with snprintf() >> - Added a PROTECT() call in do_while which could cause memory >> errors if evaluating the condition results in a warning. > > > Nothing was attached. > > Generally fixes like this are best sent to bugs.r-project.org, and they > receive highest priority if accompanied by code demonstrating why they are > needed, i.e. crashes or incorrect results in current R. Those will likely > be incorporated as regression tests. > > Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Making parent.env<- an error for package namespaces and package imports
I'd like to propose a change to the R language so that calling 'parent.env<-' on a package namespace or package imports is a runtime error. Currently the documentation warns that it's dangerous behaviour and might go away: The replacement function ‘parent.env<-’ is extremely dangerous as it can be used to destructively change environments in ways that violate assumptions made by the internal C code. It may be removed in the near future. This change would both eliminate some potential dangerous behaviours, and make it significantly easier for runtime compilation systems to optimize symbol lookups for code in packages. The following patch against current svn implements this functionality. It allows calls to 'parent.env<-' only until the namespace is locked, allowing the namespace to be built correctly while preventing user code from subsequently messing with it. I'd also like to make calling parent.env<- on an environment on the call stack an error, for the same reasons, but it's not so obvious to me how to implement that efficiently right now. Could we at least document that as being 'undefined behaviour'? Thanks, Karl Index: src/main/builtin.c === --- src/main/builtin.c (revision 66783) +++ src/main/builtin.c (working copy) @@ -356,6 +356,24 @@ return( ENCLOS(arg) ); } +static Rboolean R_IsImportsEnv(SEXP env) +{ +if (isNull(env) || !isEnvironment(env)) +return FALSE; +if (ENCLOS(env) != R_BaseNamespace) +return FALSE; +SEXP name = getAttrib(env, R_NameSymbol); +if (!isString(name) || length(name) != 1) +return FALSE; + +const char *imports_prefix = "imports:"; +const char *name_string = CHAR(STRING_ELT(name, 0)); +if (!strncmp(name_string, imports_prefix, strlen(imports_prefix))) +return TRUE; +else +return FALSE; +} + SEXP attribute_hidden do_parentenvgets(SEXP call, SEXP op, SEXP args, SEXP rho) { SEXP env, parent; @@ -371,6 +389,10 @@ error(_("argument is not an environment")); if( env == R_EmptyEnv ) error(_("can not set parent of the empty environment")); +if (R_EnvironmentIsLocked(env) && R_IsNamespaceEnv(env)) + error(_("can not set the parent environment of a namespace")); +if (R_EnvironmentIsLocked(env) && R_IsImportsEnv(env)) + error(_("can not set the parent environment of package imports")); parent = CADR(args); if (isNull(parent)) { error(_("use of NULL environment is defunct"));  __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] MAX_NUM_DLLS too low ?
Hello, My problem is that I hit the hard-coded MAX_NUM_DLLS (100) limit of the number of loaded DLLs. I have a number of custom packages which interface and integrate a lot of CRAN and Bioconductor packages. For example, on my installation: Rscript -e 'library(crlmm);print(length(getLoadedDLLs()))' gives 28 loaded DLLs. I am currently trying to work-around that by putting external packages in Suggests: instead of Imports:, and lazy-load them, but still I am wondering if that threshold value of 100 is still relevant nowadays, or would it be possible to increase it. Thanks, Karl Forner [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call in R
Hi, A probably very naive remark, but I believe that the probability of sum( runif(1) ) >= 5 is exactly 0.5. So why not just test that, and generate the uniform values only if needed ? Karl Forner On Thu, Nov 17, 2011 at 6:09 PM, Raymond wrote: > Hi R developers, > >I am new to this forum and hope someone can help me with .Call in R. > Greatly appreciate any help! > >Say, I have a vector called "vecA" of length 1, I generate a vector > called "vecR" with elements randomly generated from Uniform[0,1]. Both vecA > and vecR are of double type. I want to replace elements vecA by elements in > vecR only if sum of elements in vecR is greater than or equal to 5000. > Otherwise, vecR remain unchanged. This is easy to do in R, which reads >vecA<-something; >vecR<-runif(1); >if (sum(vecR)>=5000)){ > vecA<-vecR; >} > > >Now my question is, if I am going to do the same thing in R using .Call. > How can I achieve it in a more efficient way (i.e. less computation time > compared with pure R code above.). My c code (called "change_vecA.c") > using > .Call is like this: > >SEXP change_vecA(SEXP vecA){ > int i,vecA_len; > double sum,*res_ptr,*vecR_ptr,*vecA_ptr; > > vecA_ptr=REAL(vecA); > vecA_len=length(vecA); > SEXP res_vec,vecR; > > PROTECT(res_vec=allocVector(REALSXP, vec_len)); > PROTECT(vecR=allocVector(REALSXP, vec_len)); > res_ptr=REAL(res_vec); > vecR_ptr=REAL(vecR); > GetRNGstate(); > sum=0.0; > for (i=0;i vecR_ptr[i]=runif(0,1); > sum+=vecR_ptr[i]; > } > if (sum>=5000){ >/*copy vecR to the vector to be returned*/ >for (i=0;i res_ptr[i]=vecR_ptr[i]; >} > } > else{ >/*copy vecA to the vector to be returned*/ >for (i=0;i res_ptr[i]=vecA_ptr[i]; >} > } > > PutRNGstate(); > UNPROTECT(2); > resturn(res); > } > My R wrapper function is >change_vecA<-function(vecA){ > dyn.load("change_vecA.so"); > .Call("change_vecA",vecA); >} > > Now my question is, due to two loops (one generates the random > vector and one determines the vector to be returned), can .Call still be > faster than pure R code (only one loop to copy vecR to vecA given condition > is met)? Or, how can I improve my c code to avoid redundant loops if any. > My > concern is if vecA is large (say of length 100 or even bigger), loops > in > C code can slow things down. Thanks for any help! > > > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html > Sent from the R devel mailing list archive at Nabble.com. > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call in R
Yes indeed. My mistake. On Fri, Nov 18, 2011 at 4:45 PM, Joris Meys wrote: > Because if you calculate the probability and then make uniform values, > nothing guarantees that the sum of those uniform values actually is > larger than 50,000. You only have 50% chance it is, in fact... > Cheers > Joris > > On Fri, Nov 18, 2011 at 4:08 PM, Karl Forner > wrote: > > Hi, > > > > A probably very naive remark, but I believe that the probability of sum( > > runif(1) ) >= 5 is exactly 0.5. So why not just test that, and > > generate the uniform values only if needed ? > > > > > > Karl Forner > > > > On Thu, Nov 17, 2011 at 6:09 PM, Raymond > wrote: > > > >> Hi R developers, > >> > >>I am new to this forum and hope someone can help me with .Call in R. > >> Greatly appreciate any help! > >> > >>Say, I have a vector called "vecA" of length 1, I generate a > vector > >> called "vecR" with elements randomly generated from Uniform[0,1]. Both > vecA > >> and vecR are of double type. I want to replace elements vecA by > elements in > >> vecR only if sum of elements in vecR is greater than or equal to 5000. > >> Otherwise, vecR remain unchanged. This is easy to do in R, which reads > >>vecA<-something; > >>vecR<-runif(1); > >>if (sum(vecR)>=5000)){ > >> vecA<-vecR; > >>} > >> > >> > >>Now my question is, if I am going to do the same thing in R using > .Call. > >> How can I achieve it in a more efficient way (i.e. less computation time > >> compared with pure R code above.). My c code (called "change_vecA.c") > >> using > >> .Call is like this: > >> > >>SEXP change_vecA(SEXP vecA){ > >> int i,vecA_len; > >> double sum,*res_ptr,*vecR_ptr,*vecA_ptr; > >> > >> vecA_ptr=REAL(vecA); > >> vecA_len=length(vecA); > >> SEXP res_vec,vecR; > >> > >> PROTECT(res_vec=allocVector(REALSXP, vec_len)); > >> PROTECT(vecR=allocVector(REALSXP, vec_len)); > >> res_ptr=REAL(res_vec); > >> vecR_ptr=REAL(vecR); > >> GetRNGstate(); > >> sum=0.0; > >> for (i=0;i >> vecR_ptr[i]=runif(0,1); > >> sum+=vecR_ptr[i]; > >> } > >> if (sum>=5000){ > >>/*copy vecR to the vector to be returned*/ > >>for (i=0;i >> res_ptr[i]=vecR_ptr[i]; > >>} > >> } > >> else{ > >>/*copy vecA to the vector to be returned*/ > >>for (i=0;i >> res_ptr[i]=vecA_ptr[i]; > >>} > >> } > >> > >> PutRNGstate(); > >> UNPROTECT(2); > >> resturn(res); > >> } > >> My R wrapper function is > >>change_vecA<-function(vecA){ > >> dyn.load("change_vecA.so"); > >> .Call("change_vecA",vecA); > >>} > >> > >> Now my question is, due to two loops (one generates the random > >> vector and one determines the vector to be returned), can .Call still be > >> faster than pure R code (only one loop to copy vecR to vecA given > condition > >> is met)? Or, how can I improve my c code to avoid redundant loops if > any. > >> My > >> concern is if vecA is large (say of length 100 or even bigger), > loops > >> in > >> C code can slow things down. Thanks for any help! > >> > >> > >> > >> > >> > >> -- > >> View this message in context: > >> http://r.789695.n4.nabble.com/Call-in-R-tp4080721p4080721.html > >> Sent from the R devel mailing list archive at Nabble.com. > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > >[[alternative HTML version deleted]] > > > > __ > > R-devel@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-devel > > > > > > -- > Joris Meys > Statistical consultant > > Ghent University > Faculty of Bioscience Engineering > Department of Mathematical Modelling, Statistics and Bio-Informatics > > tel : +32 9 264 59 87 > joris.m...@ugent.be > --- > Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] OpenMP and random number generation
Hello, For your information, I plan to release "soon" a package with a fast and multithreaded aware RNG for C++ code in R packages. It is currently part of one of my (not yet accepted) packages and I want to extract it into its own package. I plan to do some quick benchmarks too. Of course I can not define exactly when it will be ready. Best, Karl On Wed, Feb 22, 2012 at 9:23 AM, Mathieu Ribatet < mathieu.riba...@math.univ-montp2.fr> wrote: > Dear all, > > Now that R has OpenMP facilities, I'm trying to use it for my own package > but I'm still wondering if it is safe to use random number generation > within a OpenMP block. I looked at the R writing extension document both > on the OpenMP and Random number generation but didn't find any information > about that. > > Could someone tell me if it is safe or not please ? > > Best, > Mathieu > > - > I3M, UMR CNRS 5149 > Universite Montpellier II, > 4 place Eugene Bataillon > 34095 Montpellier cedex 5 France > http://www.math.univ-montp2.fr/~ribatet > Tel: + 33 (0)4 67 14 41 98 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] RcppProgress: progress monitoring and interrupting c++ code, request for comments
Hello, I just created a little package, RcppProgress, to display a progress bar to monitor the execution status of a C++ code loop, possibly multihreaded with OpenMP. I also implemented the possibility to check for user interruption, using the work-around by Simon Urbanek. I just uploaded the package on my R-forge project, so you should be able to get the package from https://r-forge.r-project.org/scm/viewvc.php/pkg/RcppProgress/?root=gwas-bin-tests * The progress bar is displayed using REprintf, so that it works also in the eclipse StatET console, provided that you disable the scroll lock. * You should be able to nicely interrupt the execution by typing CTRL+C in the R console, or by clicking the "cancel current task" in the StatET console. * I tried to write a small documentation, included in the package, but basically you use it like this: The main loop: Progress p(max, display_progress); // create the progress monitor #pragma omp parallel for schedule(dynamic) for (int i = 0; i < max; ++i) { if ( ! p.is_aborted() ) { // the only way to exit an OpenMP loop long_computation(nb); p.increment(); // update the progress } } and in your computation intensive function: void long_computation(int nb) { double sum = 0; for (int i = 0; i < nb; ++i) { if ( Progress::check_abort() ) return; for (int j = 0; j < nb; ++j) { sum += Rf_dlnorm(i+j, 0.0, 1.0, 0); } } } I provided two small R test functions so that you can see how it looks, please see the doc. I would be extremely grateful if you could give me comments, criticisms and other suggestions. I try to release this in order to reuse this functionality in my other packages. Best regards, Karl Forner [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] portable parallel seeds project: request for critiques
> Some of the random number generators allow as a seed a vector, > not only a single number. This can simplify generating the seeds. > There can be one seed for each of the 1000 runs and then, > the rows of the seed matrix can be > > c(seed1, 1), c(seed1, 2), ... > c(seed2, 1), c(seed2, 2), ... > c(seed3, 1), c(seed3, 2), ... > ... > > There could be even only one seed and the matrix can be generated as > > c(seed, 1, 1), c(seed, 1, 2), ... > c(seed, 2, 1), c(seed, 2, 2), ... > c(seed, 3, 1), c(seed, 3, 2), ... > > If the initialization using the vector c(seed, i, j) is done > with a good quality hash function, the runs will be independent. > > What is your opinion on this? > > An advantage of seeding with a vector is also that there can > be significantly more initial states of the generator among > which we select by the seed than 2^32, which is the maximum > for a single integer seed. > > Hello, I would be also in favor for using multiple seeds based on (seed, task_number) for convenience (i.e. avoiding storing the seeds) and with the possibility of having a dynamic number of tasks, but I am mot sure it is theoretically correct. But I can refer you to this article: http://www.agner.org/random/ran-instructions.pdf , section 6.1 where the author states: For example, if we make 100 streams of 10^10 random numbers each from an > SFMT > generator with cycle length Ï = 2^11213, we have a probability of overlap > p â 10^3362. > What do you think ? I am very concerned by the correctness of this approach so would appreciate any advice on that matter. Thanks Karl [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] c/c++ Random Number Generators Benchmarks using OpenMP
Dear R gurus, I am interested in permutations-based cpu-intensive methods so I had to pay a little attention to Random Number Generators (RNG). For my needs, RNGs have to: 1) be fast. I profiled my algorithms, and for some the bottleneck was the RNG. 2) be scalable. Meaning that I want the RNG to remain fast as I add threads. 3) offer a long cycle length. Some basic generators have a cycle length so low that in a few seconds you can finish it, making further computations useless and redundant 4) be able to give reproducible results independent of the number of threads used, i.e. I want my program to give the very same exact results using one or 10 threads ( 4) "be good" of course ) I found an implementation that seems to meet my criterion and made a preliminary package to test it. In the meantime Petr Savicky contacted saying he was about to release a similar package called rngOpenMP. So I decided to perform some quick benchmarks. The benchmark code is available as a R package "rngBenchmarks" here: https://r-forge.r-project.org/scm/viewvc.php/pkg/?root=gwas-bin-tests but it depends on some unpublished package, like rngOpenMP, and my preliminary package, yet available from the same URL. As a benchmark I implemented a Monte-Carlo computation of PI. I tried to use the exact same computation method, using a template argument for the RNG, and providing wrappers for the different available RNGs, except for the rngOpenMP that is not instantiable, so I adapted specifically the code. I included in the benchmark: - the c implementation used by the R package Rlecuyer - the (GNU) random_r RNG available on GNU/linux systems and that is reentrant - my RcppRandomSFMT,wrapping a modified version of the SIMD-oriented Fast Mersenne Twister (SFMT) Random Number Generator provided by http://www.agner.org/random Randomc - rngOpenMP I tried to include the rsprng RNG, but could not manage to use it in my code. My conclusions: - all the implementations work, meaning that the computed values converge towards PI with the number of iterations - all the implementations are scalable. - RcppRandomSFMT and random_r are an order of magnitude faster than rlecuyer and rngOpenMP - actually RcppRandomSFMT and random_r have very similar performance. The problem with random_r is that its cycle length according to my manpage is ~ 3E10, enabling for instance only 3 millions permutations of a vector of 10,000 elements, to be compared with Leaving the RcppRandomSFMT as best candidate. This implementation also allows multiple seeds, solving my requisite number 4, reproducible results independent of the number of threads, if I use as second seed the task identifier. Of course I am probably biased, so please tell me if you have some better ideas of benchmarks, tests of correctness, if you'd like some other implementations to be included. People interested in this topic couldcontact me in order that we collaboratively propose an implementation suiting all needs. Thanks, Karl Forner Annex: I ran the benchmarks on a linux Intel(R) Xeon(R) with 2 cpus of 4 cores each ( CPU E5520 @ 2.27GHz). type threads nerrortime time_per_chunk 1 lecuyer 1 1e+07 2.105472e-04 1.538 0.00153800 2 lecuyer 1 1e+08 4.441492e-05 15.265 0.00152650 3 lecuyer 1 1e+09 2.026819e-05 153.209 0.00153209 4 lecuyer 2 1e+07 3.182633e-04 0.821 0.00082100 5 lecuyer 2 1e+08 7.375036e-05 7.751 0.00077510 6 lecuyer 2 1e+09 9.290323e-06 76.476 0.00076476 7 lecuyer 4 1e+07 9.630351e-05 0.401 0.00040100 8 lecuyer 4 1e+08 1.263486e-05 3.887 0.00038870 9 lecuyer 4 1e+09 1.151515e-06 38.618 0.00038618 10lecuyer 8 1e+07 1.239703e-05 0.241 0.00024100 11lecuyer 8 1e+08 7.894518e-05 2.133 0.00021330 12lecuyer 8 1e+09 6.782041e-06 20.420 0.00020420 13 random_r 1 1e+07 7.898746e-05 0.137 0.00013700 14 random_r 1 1e+08 4.748343e-05 1.290 0.00012900 15 random_r 1 1e+09 1.685692e-05 12.844 0.00012844 16 random_r 2 1e+07 4.757590e-06 0.095 0.9500 17 random_r 2 1e+08 7.389450e-05 0.663 0.6630 18 random_r 2 1e+09 2.913732e-05 6.469 0.6469 19 random_r 4 1e+07 1.664590e-04 0.037 0.3700 20 random_r 4 1e+08 1.138106e-04 0.330 0.3300 21 random_r 4 1e+09 3.734717e-05 3.209 0.3209 22 random_r 8 1e+07 1.034678e-04 0.051 0.5100 23 random_r 8 1e+08 4.733472e-05 0.167 0.1670 24 random_r 8 1e+09 1.985413e-05 1.694 0.1694 25 rng_openmp 1 1e+07 2.097492e-04 1.231 0.00123100 26 rng_openmp 1 1e+08 7.580436e-05 12.155 0.00121550 27 rng_openmp 1 1e+09 2.772810e-05 120.712 0.00120712 28 rng_openmp 2 1
Re: [Rd] portable parallel seeds project: request for critiques
Thanks for your quick reply. About the rngSetSeed package: is it usable at c/c++ level ? The same can be said about initializations. Initialization is a random > number generator, whose output is used as the initial state of some > other generator. There is no proof that a particular initialization cannot > be distinguished from truly random numbers in a mathematical sense for > the same reason as above. > > A possible strategy is to use a cryptographically strong hash function > for the initialization. This means to transform the seed to the initial > state of the generator using a function, for which we have a good > guarantee that it produces output, which is computationally hard to > distinguish from truly random numbers. For this purpose, i suggest > to use the package rngSetSeed provided currently at > > http://www.cs.cas.cz/~savicky/randomNumbers/ > > It is based on AES and Fortuna similarly as "randaes", but these > components are used only for the initialization of Mersenne-Twister. > When the generator is initialized, then it runs on its usual speed. > > In the notation of > > http://www.agner.org/random/ran-instructions.pdf > > using rngSetSeed for initialization of Mersenne-Twister is Method 4 > in Section 6.1. > Hmm I had not paid attention to the last paragraph: > The seeding procedure used in the > present software use*s a separate random number* generator of a different > design in order to > avoid any interference. An extra feature is the RandomInitByArray function > which makes > it possible to initialize the random number generator with multiple seeds. > We can make sure > that the streams have different starting points by using the thread id as > one of the seeds. > So it means that I am already using this solution ! (in the RcppRandomSFTM, see other post). and that I should be reasonably safe. > > I appreciate comments. > > Petr Savicky. > > P.S. I included some more comments on the relationship of provably good > random number generators and P ?= NP question to the end of the page > > http://www.cs.cas.cz/~savicky/randomNumbers/ Sorry but it's too involved for me. > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] weird bug with parallel, RSQlite and tcltk
Hello, I spent a lot of a time on a weird bug, and I just managed to narrow it down. In parallel code (here with parallel::mclappy, but I got it doMC/multicore too), if the library(tcltk) is loaded, R hangs when trying to open a DB connection. I got the same behaviour on two different computers, one dual-core, and one 2 xeon quad-core. Here's the code: library(parallel) library(RSQLite) library(tcltk) #unloadNamespace("tcltk") res <- mclapply(1:2, function(x) { db <- DBI::dbConnect("SQLite", ":memory:") }, mc.cores=2) print("Done") When I execute it (R --vanilla < test_parallel_db.R), it hangs forever, and I have to type several times CTRL+C to interrupt it. I then get this message: Warning messages: 1: In selectChildren(ac, 1) : error 'Interrupted system call' in select 2: In selectChildren(ac, 1) : error 'Interrupted system call' in select Then, just remove library(tcltk), or uncomment unloadNamespace("tcltk"), and it works fine again. I guess there's a bug somewhere, but where exactly ? Best, Karl Forner Further info: R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-unknown-linux-gnu (64-bit) ubuntu 12.04 and 12.10 ubuntu package tk8.5 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] weird bug with parallel, RSQlite and tcltk
Hello, The point is that I do not use tcltk, it gets loaded probably as a dependency of a dependency of a package. When I unload it all work perfectly fine. I just found it because one of my computer did not have tk8.5 installed, and did not exhibit the mentioned bug. So I really think something should be done about this. Maybe the "gui loop" should not be run a the the loading of the tcltk package, but at the first function ran, or something like this. As you can see in my example code, the in-memory database is opened in the parallel code... Best, Karl On Mon, Dec 31, 2012 at 10:58 PM, Simon Urbanek wrote: > > On Dec 31, 2012, at 1:08 PM, Karl Forner wrote: > >> Hello, >> >> I spent a lot of a time on a weird bug, and I just managed to narrow it down. >> > > First, tcltk and multicore don't mix well, see the warning in the > documentation (it mentions GUIs and AFAIR tcltk fires up a GUI event loop > even if you don't actually create GUI elements). Second, using any kind of > descriptors in parallel code is asking for trouble since those will be owned > by multiple processes. If you use databases files, etc. they must be opened > in the parallel code, they cannot be shared by multiple workers. The latter > is ok in your code so you're probably bitten by the former. > > Cheers, > Simon > > > >> In parallel code (here with parallel::mclappy, but I got it >> doMC/multicore too), if the library(tcltk) is loaded, R hangs when >> trying to open a DB connection. >> I got the same behaviour on two different computers, one dual-core, >> and one 2 xeon quad-core. >> >> Here's the code: >> >> library(parallel) >> library(RSQLite) >> library(tcltk) >> #unloadNamespace("tcltk") >> >> res <- mclapply(1:2, function(x) { >> db <- DBI::dbConnect("SQLite", ":memory:") >> }, mc.cores=2) >> print("Done") >> >> When I execute it (R --vanilla < test_parallel_db.R), it hangs >> forever, and I have to type several times CTRL+C to interrupt it. I >> then get this message: >> >> Warning messages: >> 1: In selectChildren(ac, 1) : error 'Interrupted system call' in select >> 2: In selectChildren(ac, 1) : error 'Interrupted system call' in select >> >> Then, just remove library(tcltk), or uncomment >> unloadNamespace("tcltk"), and it works fine again. >> >> I guess there's a bug somewhere, but where exactly ? >> >> Best, >> >> Karl Forner >> >> Further info: >> >> >> R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" >> Copyright (C) 2012 The R Foundation for Statistical Computing >> ISBN 3-900051-07-0 >> Platform: x86_64-unknown-linux-gnu (64-bit) >> >> ubuntu 12.04 and 12.10 >> >> ubuntu package tk8.5 >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] weird bug with parallel, RSQlite and tcltk
Hello and thank you. Indeed gsubfn is responsible for loading tcltk in my case. On Thu, Jan 3, 2013 at 12:14 PM, Gabor Grothendieck wrote: > options(gsubfn.engine = "R") __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Problem using raw vectors with inline cfunction
Hello, >From what I understood from the documentation I found, when using the inline cfunction with convention=".C", R raw vectors should be given as unsigned char* to the C function. But consider the following script: library(inline) testRaw <- cfunction(signature(raw='raw', len='integer') , body=' int l = *len; int i = 0; Rprintf("sizeof(raw[0])=%i\\n", sizeof(raw[0])); for (i = 0; i < l; ++i) Rprintf("%i, ", (int)raw[i]); for (i = 0; i < l; ++i) raw[i] = i*10; ' , convention=".C", language='C', verbose=TRUE ) tt <- as.raw(1:10) testRaw(tt, length(tt)) When I execute it: $ R --vanilla --quiet < work/inline_cfunction_raw_bug.R sizeof(raw[0])=1 192, 216, 223, 0, 0, 0, 0, 0, 224, 214, *** caught segfault *** address (nil), cause 'unknown' Traceback: 1: .Primitive(".C")(, raw = as.character(raw), len = as.integer(len)) 2: testRaw(tt, length(tt)) aborting ... Segmentation fault (core dumped) I was expecting to get in the C function a pointer on a byte array of values (1,2,3,4,5,6,7,8,9,10). Apparently that is not the case. I guess that the "raw = as.character(raw)," printed in the traceback is responsible for the observed behavior. If it is expected behavior, how can I get a pointer on my array of bytes ? Thanks. Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How to avoid using gridextra via Depends instead of Imports in a package ?
Hello, I really need some insight on a problem we encountered using grid, lattice and gridExtra. I tried to reduce the problem, so the plot make no sense. we have a package: gridextrabug with: DESCRIPTION -- Package: gridextrabug Title: gridextrabug Version: 0.1 Author: toto Maintainer: toto Description: gridextrabug Imports: grid, gridExtra, lattice, latticeExtra, reshape, Depends: R (>= 2.15), methods Suggests: testthat, devtools License: GPL (>= 3) Collate: 'zzz.R' 'plotFDR.R' R/plotFDR.R plot_fdr <- function(dt,qvalue_col,pvalue_col, zoom_x=NULL, zoom_y=NULL, fdrLimit=0,overview_plot=FALSE,...) { frm <- as.formula(paste(qvalue_col,"~ rank(",pvalue_col,")")) plt <- xyplot( frm , data=dt, abline=list(h=fdrLimit,lty="dashed"), pch=16,cex=1, type="p", panel=panelinplot2, subscripts= TRUE, ) return(plt) } panelinplot2 <- function(x,y,subscripts,cex,type,...){ panel.xyplot(x,y,subscripts=subscripts, ylim=c(0,1), type=type, cex=cex,...) pltoverview <- xyplot(y~x,xlab=NULL, ylab=NULL, type="l", par.settings=qb_theme_nopadding(), scales=list(draw=FALSE), cex=0.6,...) gr <- grob(p=pltoverview, ..., cl="lattice") grid.draw(gr) # <--- problematic call } NAMESPACE -- export(panelinplot2) export(plot_fdr) importFrom(grid,gpar) importFrom(grid,grid.draw) importFrom(grid,grid.rect) importFrom(grid,grid.text) importFrom(grid,grob) importFrom(grid,popViewport) importFrom(grid,pushViewport) importFrom(grid,unit) importFrom(grid,viewport) importFrom(gridExtra,drawDetails.lattice) importFrom(lattice,ltext) importFrom(lattice,panel.segments) importFrom(lattice,panel.xyplot) importFrom(lattice,stripplot) importFrom(lattice,xyplot) importFrom(latticeExtra,as.layer) importFrom(latticeExtra,layer) importFrom(reshape,sort_df) Then if you execute this script: without_extra.R -- library(gridextrabug) p <- seq(10^-10,1,0.001) p <- p[sample(1:length(p))] q <- p.adjust(p, "BH") df <- data.frame(p,q) plt <- plot_fdr(df,qvalue_col= "q", pvalue_col="p", zoom_x=c(0,20), fdrLimit=0.6, overview_plot=TRUE) X11() print(plt) you will not have the second plot corresponding the call to panelinplot2 If you execute this one: with_extra.R -- library(gridextrabug) p <- seq(10^-10,1,0.001) p <- p[sample(1:length(p))] q <- p.adjust(p, "BH") df <- data.frame(p,q) plt <- plot_fdr(df,qvalue_col= "q", pvalue_col="p", zoom_x=c(0,20), fdrLimit=0.6, overview_plot=TRUE) X11() library(gridExtra) print(plt) you will have the second plot. >From what I understood, the last line of panelinplot2(), " grid.draw(x)", dispatches to grid:::grid.draw.grob(), which in turn calls grid:::drawGrob(), which calls grid::drawDetails() which is a S3 generic. The gridExtra package defines the method drawDetails.lattice(). When the package is loaded in the search() path, the "grid.draw(x)" call dispatches to gridExtra:::drawDetails.lattice(). We would rather avoid messing with the search path, which is a best practice if I'm not mistaken, so we tried hard to solve it using Imports. But I came to realize that the problem was in the grid namespace, not in our package namespace. I tested it with the following work-around: parent.env(parent.env(getNamespace('grid'))) <- getNamespace('gridExtra') which works. So my questions are: * did we miss something obvious ? * what is the proper way to handle this situation ? Thanks in advance for your wisdom. Karl Forner __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] parallel::mclapply does not return try-error objects with mc.preschedule=TRUE
Hello, Consider this: 1) library(parallel) res <- mclapply(1:2, stop) #Warning message: #In mclapply(1:2, stop) : # all scheduled cores encountered errors in user code is(res[[1]], 'try-error') #[1] FALSE 2) library(parallel) res <- mclapply(1:2, stop, mc.preschedule=FALSE) #Warning message: #In mclapply(1:2, stop, mc.preschedule = FALSE) : # 2 function calls resulted in an error is(res[[1]], 'try-error') #[1] TRUE The documentation states that: 'Each forked process runs its job inside try(..., silent = TRUE) so if errors occur they will be stored as class "try-error" objects in the return value and a warning will be given.' Is this a bug ? Thanks Karl > sessionInfo() R version 2.15.3 (2013-03-01) Platform: x86_64-unknown-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods [8] base loaded via a namespace (and not attached): [1] tools_2.15.3 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] parallel::mclapply does not return try-error objects with mc.preschedule=TRUE
> >> Is this a bug ? >> > > Not in parallel. Something else has changed, and I am about to commit a > different version that still works as documented. > > Thanks for replying. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Catch SIGINT from user in backend C++ code
Hello, I once wrote a package called RcppProgress, that you can find here: https://r-forge.r-project.org/R/?group_id=1230 I did not try it for a long time, but it was developed to solve this exact problem. You can have a look the its companion package: RcppProgressExample. Here's a link to the original announcement: http://tolstoy.newcastle.edu.au/R/e17/devel/12/02/0443.html Hope it helps. Karl Forner Quartz Bio On Thu, May 2, 2013 at 1:50 AM, Jewell, Chris wrote: > Hi, > > I was wondering if anybody knew how to trap SIGINTs (ie Ctrl-C) in backend > C++ code for R extensions? I'm writing a package that uses the GPU for some > hefty matrix operations in a tightly coupled parallel algorithm implemented > in CUDA. > > The problem is that once running, the C++ module cannot apparently be > interrupted by a SIGINT, leaving the user sat waiting even if they realise > they've launched the algorithm with incorrect settings. Occasionally, the > SIGINT gets through and the C++ module stops. However, this leaves the CUDA > context hanging, meaning that if the algorithm is launched again R dies. If > I could trap the SIGINT, then I could make sure a) that the algorithm stops > immediately, and b) that the CUDA context is destructed nicely. > > Is there a "R-standard" method of doing this? > > Thanks, > > Chris > > > -- > Dr Chris Jewell > Lecturer in Biostatistics > Institute of Fundamental Sciences > Massey University > Private Bag 11222 > Palmerston North 4442 > New Zealand > Tel: +64 (0) 6 350 5701 Extn: 3586 > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] umlaut in path name (PR#14119)
Full_Name: Karl Schilling Version: 2.10.0 patched OS: Win XP Submission from: (NULL) (131.220.251.8) I am running R 2.10.0 patched under WinXP (German version). When I use the command file.choose() and try to navigate to a target with an umlaut (Ä, Ö, Ü) in the pathway, I get an error message "file not found". Also, in the path name reproduced in the error message, the umlauts are replaced by sign combinations. If I tray to target files with no umlauts in the path name, everything is ok. Any suggestions? Thank you so much for your attention to this. KArl Schilling __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] match function causing bad performance when using table function on factors with multibyte characters on Windows
[I originally posted this on the R-help mailing list, and it was suggested that R-devel would be a better place to dicuss it.] Running ‘table’ on a factor with levels containing non-ASCII characters seems to result in extremely bad performance on Windows. Here’s a simple example with benchmark results (I’ve reduced the number of replications to make the function finish within reasonable time): library(rbenchmark) x.num=sample(1:2, 10^5, replace=TRUE) x.fac.ascii=factor(x.num, levels=1:2, labels=c("A","B")) x.fac.nascii=factor(x.num, levels=1:2, labels=c("Æ","Ø")) benchmark( table(x.num), table(x.fac.ascii), table(x.fac.nascii), table(unclass(x.fac.nascii)), replications=20 ) test replications elapsed relative user.self sys.self user.child sys.child 4 table(unclass(x.fac.nascii)) 201.53 4.636364 1.51 0.01 NANA 2 table(x.fac.ascii) 200.33 1.00 0.33 0.00 NANA 3 table(x.fac.nascii) 20 146.67 444.454545 38.52 81.74 NANA 1 table(x.num) 201.55 4.696970 1.53 0.01 NANA sessionInfo() R version 2.12.1 (2010-12-16) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 LC_CTYPE=Norwegian-Nynorsk_Norway.1252 LC_MONETARY=Norwegian-Nynorsk_Norway.1252 [4] LC_NUMERIC=C LC_TIME=Norwegian-Nynorsk_Norway.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] rbenchmark_0.3 The timings are from R 2.12.1, but I also get comparable results on the latest prelease (R 2.13.0 2011-01-18 r54032). Running the same test (100 replications) on a Linux system with R.12.1 Patched results in essentially no difference between the performance on ASCII factors and non-ASCII factors: test replications elapsed relative user.self sys.self user.child sys.child 4 table(unclass(x.fac.nascii)) 100 4.607 3.096102 4.455 0.092 0 0 2 table(x.fac.ascii) 100 1.488 1.00 1.459 0.028 0 0 3 table(x.fac.nascii) 100 1.616 1.086022 1.560 0.051 0 0 1 table(x.num) 100 4.504 3.026882 4.403 0.079 0 0 sessionInfo() R version 2.12.1 Patched (2011-01-18 r54033) Platform: i686-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=nn_NO.UTF-8 LC_NUMERIC=C LC_TIME=nn_NO.UTF-8 [4] LC_COLLATE=nn_NO.UTF-8 LC_MONETARY=C LC_MESSAGES=nn_NO.UTF-8 [7] LC_PAPER=nn_NO.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=nn_NO.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rbenchmark_0.3 Profiling the ‘table’ function indicates almost all the time if spent in the ‘match’ function, which is used when ‘factor’ is used on a ‘factor’ inside ‘table’. Indeed, ‘x.fac.nascii = factor(x.fac.nascii)’ by itself is extremely slow. Is there any theoretical reason ‘factor’ on ‘factor’ with non-ASCII characters must be so slow? And why doesn’t this happen on Linux? Perhaps a fix for ‘table’ might be calculating the ‘table’ statistics *including* all levels (not using the ‘factor’ function anywhere), and then removing the ‘exclude’ levels in the end. For example, something along these lines: res = table.modified.to.not.use.factor(...) ind = lapply(dimnames(res), function(x) !(x %in% exclude)) do.call("[", c(list(res), ind, drop=FALSE)) (I haven’t tested this very much, so there may be issues with this way of doing things.) -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] table on numeric vector with exclude argument containing value missing from vector causes warning + "NaN" levels incorrectly removed from factors
I *think* the following may be considered a bug or two, but would appreciate any comments before (not) filing an official bug report. Possible bug 1: ‘table’ on numeric vector with ‘exclude’ argument containing value missing from vector causes warning Possible bug 2: ‘table’ incorrectly tries to remove "NaN" levels The help page for ‘table’ says the the first argument is ‘one or more objects which can be interpreted as factors (including character strings) […]’. Does this include numeric vectors? Numeric vectors seems to work fine. Example: x = sample(1:3, 100, replace=TRUE) table(x) The ‘exclude’ argument explicitly mentions factor levels, but seems to work fine for other objects too. Example: table(x, exclude=2) It’s actually not clear from the help page what is meant by ‘levels to remove from all factors in ...’, but it seems like a character vector is expected. And indeed the following also works: table(x, exclude="2") However, setting the ‘exclude’ argument to a value not contained in the vector to be tabulated, table(x, exclude="foo") causes the following warning: In as.vector(exclude, typeof(x)) : NAs introduced by coercion’: The correct results is produced, though. Note that all of the following does *not* cause any warning: table(x, exclude=NA) table(x, exclude=NaN) table(factor(x), exclude="foo") table(as.character(x), exclude="foo") I also wonder about the inclusion of ‘NaN’ in the definition of ‘table’: table(..., exclude = if (useNA == "no") c(NA, NaN), useNA = c("no", "ifany", "always"), dnn = list.names(...), deparse.level = 1) A factor can’t include a NaN level, as the levels values are always strings or NA. And having the above definition causes "NaN" (string) levels to mysteriously disappear when run through ‘table’. Example: table(factor(c("NA",NA,"NcN","NbN", "NaN"))) Result: NA NbN NcN 1 1 1 (The missing NA is not a bug; it’s caused by useNA="no".) sessionInfo() R version 2.12.1 Patched (2011-01-20 r54056) Platform: i686-pc-linux-gnu (32-bit) locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows
Matthew Dowle wrote: > I'm not sure, but note the difference in locale between > Linux (UTF-8) and Windows (non UTF-8). As far as I > understand it R much prefers UTF-8, which Windows doesn't > natively support. Otherwise you could just change your > Windows locale to a UTF-8 locale to make R happier. > [...] > > If anybody knows a way to trick R on Linux into thinking it has > an encoding similar to Windows then I may be able to take a > look if I can reproduce the problem in Linux. Changing the locale to an ISO 8859-1 locale, i.e.: export LC_ALL="en_US.ISO-8859-1" export LANG="en_US.ISO-8859-1" I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII factor as it is on the ASCII factor. -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows
Simon Urbanek wrote: >> I could *not* reproduce it; that is, ‘table’ is as fast on the non-ASCII >> factor as it is on the ASCII factor. > > Strange - are you sure you get the right locale names? Make sure it's > listed in locale -a. Yes, I managed to reproduce it now, using a locale listed in ‘locale -a’. There is a performance hit, though *much* smaller than on Windows. > FWIW if you care about speed you should use tabulate() instead - it's much > faster and incurs no penalty: Yes, that the solution I ended up using: res = tabulate(x, nbins=nlevels(x)) # nbins needed for levels that don’t occur names(res) = levels(x) res (Though I’m not sure it’s *guaranteed* that factors are internally stored in a way that make this works, i.e., as the numbers 1, 2, ... for level 1, 2 ...) Anyway, do you think it’s worth trying to change the ‘table’ function the way I outlined in my first post¹? This should eliminate the performance hit on all platforms. However, it will introduce a performance hit (CPU and memory use) if the elements of ‘exclude’ make up a large part of the factor(s). ¹ http://permalink.gmane.org/gmane.comp.lang.r.devel/26576 -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] match function causing bad performance when using tablefunction on factors with multibyte characters on Windows
Karl Ove Hufthammer wrote: > Anyway, do you think it’s worth trying to change the ‘table’ function the > way I outlined in my first post¹? This should eliminate the performance > hit on all platforms. Some additional notes: ‘table’ uses ‘factor’ directly, but also indirectly, in ‘addNA’. The definition of ‘addNA’ ends with: if (!any(is.na(ll))) ll <- c(ll, NA) factor(x, levels = ll, exclude = NULL) Which is slow for non-ASCII levels. One *could* fix this by changing the last line to attr(x, "levels")=ll But one soon ends up changing every function that uses ‘factor’ in this way, which seems like the wrong approach. The problems lies inside ‘factor’, and that’s where it should be fixed, if feasible. BTW, the defintion of ‘addNA’ looks suboptimal in a different way. The last line is always executed, even if the factor *does* contain NA values (and of course NA levels). For this case, basically it’s doing nothing, just taking a very long time doing it (at least on Windows). Moving the last line inside the ‘if’ clause, and adding a ‘else return(x)’ would fix this (correct me if I’m wrong). -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] How to get R to compile with PNG support
Dear R devel list, Good morning; I'm with the Sage (http://www.sagemath.org) project. (Some of you might have seen my talk on this at last summer's useR conference). We have some rudimentary support for using R graphics in various cases, which has proved useful to many of our users who want to go back and forth between R and other capabilities within Sage. Unfortunately, the way we originally implemented this was using the png and plot functions in R itself, which perhaps isn't the best (i.e., everyone uses ggplot now? but I digress). That means that when people download a binary of ours, or compile their own, whether R's plot and png functions work depends heavily on the rather obscure (to users) issue of exactly what headers are present on the compiling machine. Unfortunately, it is *very* unclear what actually needs to be present! There are innumerable places where this has come up for us, but http://trac.sagemath.org/sage_trac/ticket/8868 and http://ask.sagemath.org/question/192/compiling-r-with-png-support are two of the current places where people have compiled information. The FAQ says, "Unless you do not want to view graphs on-screen you need ‘X11’ installed, including its headers and client libraries. For recent Fedora distributions it means (at least) ‘libX11’, ‘libX11-devel’, ‘libXt’ and ‘libXt-devel’. On Debian we recommend the meta-package ‘xorg-dev’. If you really do not want these you will need to explicitly configure R without X11, using --with-x=no." Well, we don't actually need to view graphs on-screen, but we do need to be able to generate them and save them (as pngs, for instance) to the correct directory in Sage for viewing. But we have people who've tried to do this in Ubuntu, with libpng and xorg-dev installed, and the file /usr/include/X11/Xwindows.h exists, but all to no avail. There are almost as many solutions people have found as there are computers out there, it seems - slight hyperbole, but that's what it feels like. We've posted more than once (I think) to the r-help list, but have gotten no useful feedback. Is there *anywhere* that the *exact* requirements R has for having capabilities("png") png FALSE come out TRUE are documented? Then, not only could we be smarter in how we compile R (currently somewhat naively searching for /usr/include/X11/Xwindows.h to determine whether we'll try for png support), but we would be able to tell users something very precise to do (e.g., apt-get foo) if they currently have R without PNG support in Sage. Again, I emphasize that apparently getting xorg-dev doesn't always do the trick. We do realize that for most people wanting to use just R, it's best to download a binary, which will behave nicely; Sage's "batteries included" philosophy means that we are asking for more specialized info from upstream, and for that I apologize in advance. I also apologize if I said something silly above, because I don't actually know what all these files are - I've just looked into enough support requests to have a decent idea of what's required.We are trying not to have to parse the makefile to figure all this out, and possibly making some mistake there as well. Thank you SO much for any help with this, Karl-Dieter Crisman for the Sage team __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to get R to compile with PNG support
> Message: 12 > Date: Wed, 20 Apr 2011 02:09:23 -0700 (PDT) > From: Sharpie > To: r-devel@r-project.org > Subject: Re: [Rd] How to get R to compile with PNG support > Message-ID: <1303290563237-3462502.p...@n4.nabble.com> > Content-Type: text/plain; charset=UTF-8 > > > Dear R devel list, > > Good morning; I'm with the Sage (http://www.sagemath.org) project. > (Some of you might have seen my talk on this at last summer's useR > conference). > > Thanks for stoping by Karl! I have to say that I am a big fan of the Sage > project---it is a very good idea and I really appreciate all the time you > guys put into it. I may not be able to answer all of your questions > concerning PNG support, but hopefully some of the following pointers will be > useful. Good morning, Charlie et al., Thanks for your words. We like R, too! We need to advertise it more, and this thread is part of making sure that happens in the long run. To the issue at hand. Our main concern is just not to have to spend hours reading the configuration and makefile to figure out exactly where things happen. >> >> We have some rudimentary support for using R graphics in various >> cases, which has proved useful to many of our users who want to go >> back and forth between R and other capabilities within Sage. >> Unfortunately, the way we originally implemented this was using the >> png and plot functions in R itself, which perhaps isn't the best >> (i.e., everyone uses ggplot now? but I digress). >> > > One important distinction to make is between R graphics functions such as > plot and ggplot, and R graphics *devices*, such as png. The devices provide > back ends that take the R-level function calls and actually execute the > low-level "draw line from a to b, clip to rectangle A, insert left-justified > text at x,y" primitives that get written to an output format. True. It's the device enabling that I'm talking about. We enable aqua on Mac, and png on Linux. We ignore Cairo, and ignore X11 on Mac because it is too touchy (at least, according to the FAQ on this - different weird instructions for each type, and of course not everyone has X on Mac). > Bottom line for Sage is that as long as you implement at least one device > function, such as png, your users should be able to call plot, ggplot, and > the rest of R's graphics functions to their heart's content, they just won't > have a wide selection of output formats. > Great. That is okay with us; we aren't expecting (yet) people to be able to save R graphics in various output formats. Our native (matplotlib) graphics, we do expect this. >> Then, not only could we be smarter in how we compile R (currently >> somewhat naively searching for /usr/include/X11/Xwindows.h to >> determine whether we'll try for png support), but we would be able to >> tell users something very precise to do (e.g., apt-get foo) if they >> currently have R without PNG support in Sage. Again, I emphasize that >> apparently getting xorg-dev doesn't always do the trick. >> > In the trac ticket you linked, the configure output shows PNG is enabled > (I.E. the library was found) but you may be ending up with no support for an > actual png() graphics device due to one of the following > > - configure didn't find Xlib as X11 is not listed under Interfaces > - configure didn't find cairo as it is not listed under Additional > capabilities > > So, although R has the PNG library that is only useful for writing PNG > files. R also needs the Xlib or Cairo libraries to provide drawing > primitives that will create the figures those files will contain. Gotcha. I suspect that the X11 not listed under Interfaces is the problem (again, we ignore Cairo). What is the *exact* file or directory that the R configure looks for in trying to list X11 under Interfaces? And is there any way around this at all? That is, is there any way for R to create but not display a graphic if it has (for instance) png support, like the one on the Trac ticket did? We can always just search for the png file and serve it up in our own viewers. Note that we already search for /usr/include/X11/Xwindows.h, and adding xorg-dev didn't help with the latest one (which may not be on the Trac ticket). > In the ask.sagemath question the problem appears to be that the user had X11 > installed but not libpng. Yes, I just referenced that for reference, as it were. Thank you, and I hope we can get this resolved! Karl-Dieter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to get R to compile with PNG support
Followup with the specific issue in our most recent (non-posted, as of yet) attempts on a certain box. We now have xorg-dev, libcairo-dev, and Xwindows.h and libpng (as below) on this machine, but R is not compiling with support for any of these things. Once again, any help knowing *exactly* what to pass to the configuration script or anything else would be *greatly* appreciated. We are planning to use R in Sage on several occasions with this machine this summer if we can get this going (see http://www.maa.org/prep/2011/sage.html). R is now configured for i686-pc-linux-gnu Source directory: . Installation directory:/home/sageserver/sage/local C compiler:gcc -std=gnu99 -I/home/sageserver/sage/local/include -L/home/sageserver/sage/local/lib/ Fortran 77 compiler: sage_fortran -g -O2 C++ compiler: g++ -g -O2 Fortran 90/95 compiler:sage_fortran -g -O2 Obj-C compiler: Interfaces supported: X11 External libraries:readline, BLAS(ATLAS), LAPACK(generic) Additional capabilities: PNG, NLS Options enabled: shared R library, R profiling Recommended packages: yes However: > capabilities() jpeg png tifftcltk X11 aqua http/ftp sockets FALSEFALSEFALSEFALSEFALSEFALSE TRUE TRUE libxml fifo clediticonv NLS profmemcairo TRUE TRUE TRUE TRUE TRUEFALSEFALSE __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to get R to compile with PNG support
Thanks for your replies, Dirk and Matt. On Thu, Apr 21, 2011 at 7:49 AM, Dirk Eddelbuettel wrote: > > On 20 April 2011 at 12:16, Karl-Dieter Crisman wrote: > | > | > | R is now configured for i686-pc-linux-gnu > | Source directory: . > | Installation directory: /home/sageserver/sage/local > | C compiler: gcc -std=gnu99 > | -I/home/sageserver/sage/local/include > | -L/home/sageserver/sage/local/lib/ Fortran 77 compiler: > | sage_fortran -g -O2 > | C++ compiler: g++ -g -O2 > | Fortran 90/95 compiler: sage_fortran -g -O2 Obj-C compiler: > | Interfaces supported: X11 > | External libraries: readline, BLAS(ATLAS), LAPACK(generic) > | Additional capabilities: PNG, NLS > | Options enabled: shared R library, R profiling > | Recommended packages: yes > | > | > | However: > | > | > | > capabilities() > | jpeg png tiff tcltk X11 aqua http/ftp sockets > | FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE > | > | libxml fifo cledit iconv NLS profmem cairo > | TRUE TRUE TRUE TRUE TRUE FALSE FALSE > > Random guess: did you connect via ssh without x11 forwarding? Almost certainly, yes. (I am an interlocutor right now for someone who is actually doing this, my apologies.) But it's a machine we just ssh into, I'm pretty sure, though it does serve up web pages. > I cannot see how configure find png.h and libpng but the binary fails. As all > other X11 related formats are also shown false, methinks you are without a > valid DISPLAY. That is quite likely. So it sounds like for png() to be set to use the X11 device, there has to (somewhere) be a visual output - presumably that is the part LOGICAL(ans)[i++] = X11; in Matt's answer. > That is actually an issue related to your headless use---which is what Sage > may default too; see the R FAQ on this and the prior discussion on the > xvfb-run wrapper which 'simulates' an x11 environment (which you need for > png). So maybe you should revisit the Cairo devices---they allow you > plotting without an x11 device (and also give you SVG). > Yeah, and I saw your SO answer on this (after the fact) as well. In some sense, we are just trying to get graphics on one machine. Note that we have installed the cairo devel package on this very machine, but it's not being picked up - maybe it's looking in the wrong place? That is one of the reasons this is confusing. But in a larger sense, because of Sage's "batteries included" philosophy (which we know not everyone agrees with!), we would like to have a one-shot way so that *everyone* will see R graphics, not just people whose binary happens to have been compiled on a machine that has X and a display. If that means adding 22.5 MB to our tarball for Cairo... maybe, maybe not. I won't copy Matt's message here, but I appreciate the pointers to exactly where these things are defined very much - without knowing where to look, it would be a long slog. Hopefully we'll have some success! Thanks for the replies, and for any other ideas. Karl-Dieter __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] How to get R to compile with PNG support
Thanks for all the feedback. First, our update, then two responses. >From Jason Grout: +++ I finally got it working. After mucking around in the R configure file a bit and trying out some of the different tests, as well as comparing a working system with our broken system, I realized that `pkg-config --exists pangocairo` was working on the good system and not working on the broken system. So I installed libpango1.0-dev, and now R picks up the cairo package, which in turn means that my capabilities is now: > capabilities() jpeg png tifftcltk X11 aqua http/ftp sockets TRUE TRUEFALSEFALSEFALSEFALSE TRUE TRUE libxml fifo clediticonv NLS profmemcairo TRUE TRUE TRUE TRUE TRUEFALSE TRUE So in short, I think what I did was install libcairo-dev and libpango1.0-dev. There might have been other stuff in there that was needed; I'm not sure. When I build a new system again, I'll try just installing those packages and see if it is sufficient. For the record, I had also installed xorg-dev as well. +++ My comment: As someone who didn't know what configure scripts were a couple years ago, this is maddening; I don't see anything about libpango or whatever in the FAQs. Luckily, Jason knows a lot more than I do! @Dirk: > | Note that we have installed the cairo devel package on this very > | machine, but it's not being picked up - maybe it's looking in the > | wrong place? That is one of the reasons this is confusing. > > You have to understand that even though this problem may seem urgent and > novel to you and the Sage team, Novel, yes; urgent only to us, certainly we don't assume it's urgent to you :) > it is actually about as old as the web and R > itself. In a nutshell, we all (in the people reading r-help and r-devel > sense) have been explaining to folks since the late 1990s that in order to > run png() to 'just create some charts for a webserver' ... you need an X11 > server because that is where the font metrics come from. Or else no png for It's true this is findable, but the difference between having X11 on the system and having the display is arcane for those who just want to use R. But I understand your point. > is life. System such as Sage become so large because having things like this > around on all deployment systems implies (at least to some degree) > replicating fundamental OS level features because they unfortunately have > supply things missing or broken across OSs. Yes, that is true. We know of many people who download Sage because it's the easiest way to install Z, where Z is some specific mathematical program that is impossible to configure properly without special knowledge. Or, until fairly recently, to get Cython. @Simon: That's new to me that X11 is installed by default now, but it looks like you are right. However, we don't rely on this for Mac; we make sure to configure for quartz when we build - which I assume is separate from the other stuff? But updating the FAQ about this would be really great for future users :) Also thanks for the hint on all the other (possibly) needed stuff. Yikes! AFAIK this is an Ubuntu machine we're talking about. To all - if we come up with any more reliable way to make this work universally, i.e. with *exact* instructions for what to download, we will definitely pass that upstream. Thank you. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Invalid date-times and as.POSIXct problems (remotely related to DST issues)
I think this should be handled as a bug, but I’m not sure which platforms and versions it applies to, so I’m writing to this list. The problem is that as.POSIXct on character strings behaves in a strange way if one of the date-times are invalid; it converts all the date-times to dates (i.e., it discards the time part). Example, which I suspect only works on my locale, with the UTC+1/UTC+2 timezone: $ dates=c("2003-10-13 00:15:00", "2008-06-03 14:45:00", "2003-03-30 02:00:00") Note that the last date-time doesn’t actually exist (due to daylight saving time): http://www.timeanddate.com/worldclock/meetingtime.html?day=30&month=3&year=2003&p1=187&iv=0 $ d12=as.POSIXct(dates) $ d123=as.POSIXct(dates[1:2]) $ d12 [1] "2003-10-13 CEST" "2008-06-03 CEST" "2003-03-30 CET" $ d123 [1] "2003-10-13 00:15:00 CEST" "2008-06-03 14:45:00 CEST" When I include all values, they are all converted to (POSIXct) *dates*, but if I exclude the invalid one, the rest are properly converted to (POSIXct) date-times. Note that this is not just a display issue: $ unclass(d12) [1] 1065996000 1212444000 1048978800 attr(,"tzone") [1] "" $ unclass(d123) [1] 1065996900 1212497100 attr(,"tzone") [1] "" I can only reproduce this on Windows; on Linux all the strings are converted to date-times (the last one to 2003-03-30 01:00:00 CET). However, if ones specifies a completely invalid time, e.g., 25:00, the same thing does happen on Linux (2.14.2 Patched). I think the right/best behaviour would be to convert the invalid date-time string to NA and convert the other ones proper POSIXct date-times, and perhaps issue a warning about NAs being generated. (I originally discovered this problem on data from an Oracle database, using sqlQuery() from the RODBC package, which automatically converts date-times to date-times in current timezone (except if you specify as.is=TRUE), and was surprised that for some queries the date-times were truncated to dates. A warning that parts of the data were invalid would be very welcome.) Version details (for Windows): $ version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status major 2 minor 14.2 year 2012 month 02 day29 svn rev58522 language R version.string R version 2.14.2 (2012-02-29) $ sessionInfo() R version 2.14.2 (2012-02-29) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Norwegian-Nynorsk_Norway.1252 LC_CTYPE=Norwegian-Nynorsk_Norway.1252 LC_MONETARY=Norwegian-Nynorsk_Norway.1252 [4] LC_NUMERIC=C LC_TIME=Norwegian-Nynorsk_Norway.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Invalid date-times and as.POSIXct problems (remotely related to DST issues)
Karl Ove Hufthammer wrote: > I think this should be handled as a bug, but I’m not sure which > platforms and versions it applies to, so I’m writing to this list. No response, so I‘ve filed a bug at https://bugs.r-project.org/bugzilla3/show_bug.cgi?id=14845 (with some additional info). -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: Add links to NEWS and CHANGES on help.start() page
On Fri, 13 Nov 2009 09:37:31 +0100 Henrik Bengtsson wrote: > I'd like to recommend that links to (local) NEWS and CHANGES are added > to the help.start() overview pages. help("NEWS")/help("CHANGE LOG") > and help("CHANGES") could display/refer to them as well. Are you talking of the NEWS and CHANGES for R itself, or for packages too? It would be very useful having a convenience function for this for packages too. Perhaps something like library(news=MASS) (or MASS as a character string) and library(changes=spdep) similar to library(help=MASS) Or have I overlooked something, and a function for this already exists? -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Suggestion: Add links to NEWS and CHANGES on help.start() page
On Fri, 13 Nov 2009 14:31:10 +0100 Romain Francois wrote: > > Or have I overlooked something, and a function for this already exists? > > ?news I know about the 'news' function, but that doesn't *show* the NEWS or CHANGES file for a package, at least not in any useful format. The feature I'd prefer doesn't require any fancy parsing, just an ordinary listing of the contents of the text files NEWS/CHANGES (in a separate window, or perhaps opened in the user's browser). -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] R on Windows crashes when using certain characters in strings in data frames (PR#14125)
On Thu, 10 Dec 2009 10:20:09 +0100 (CET) k...@huftis.org wrote: > The following commands trigger the crash for me: > > n=1e5 > k=10 > x=sample(k,n,replace=TRUE) > y=sample(k,n,replace=TRUE) > xy=paste(x,y,sep=" × ") > z=sample(n) > d=data.frame(xy,z) Note: On the R Bug Tracking System Web site, the character causing the problem seems to be incorrectly displayed as a '.', though on the mailing list the correct character is used. The character should be the multiplication symbol, U+00D7, which looks similar to an 'x'. The character does exist in both ISO 8859-1 and Windows-1252. -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] PGF Device
jtxx000 skreiv: > PGF is a package for LaTeX which works with both ps > and pdf output without any nasty hacks like pictex. > Is there any technical reason why there could not be a > PGF graphic device for R? Not that I can think of. PGF is certainly powerful enough for this. > If not, I'm going to try to throw one together. Sounds wonderful. I am sure this will be useful for a lot of people. -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] ** operator
Peter Dalgaard: > Not really, just transcribed during the lexical analysis phase: > > case '*': > if (nextchar('*')) > c='^'; > yytext[0] = c; > yytext[1] = '\0'; > yylval = install(yytext); > return c; > > (There's no "->" function either...) You can also use expression() to see what various expressions are parsed as: > expression(2**5) expression(2^5) > expression(3->x) expression(x <- 3) -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] significant digits (PR#9682)
Duncan Murdoch: > The number 0.12345 is not exactly representable, but (I think) it is > represented by something slightly closer to 0.1235 than to 0.1234. I like using formatC for checking such things. On my (Linux) system, I get: $ formatC(.12345,digits=50) [1] "0.12345417443857259058859199285507202148" > So it looks as though Windows gets it right. -- Karl Ove Hufthammer __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] digits in summary.default
Martin Maechler skreiv: > Since I've now seen the code of summary.default in S-plus 6.2, > I'm not in a good position to propose a code change here --- > unless Insightful ``donates'' their 3 lines of implementation to > R {which I think would be quite fair given the recent flurry of > things they've recently ported into S-plus 8.x} It's also possible to be a bit smarter in specific cases. See for example the LaTeX table functions for regression summaries in the Dmisc package[1], which uses the magnitude of the standard errors to dermine the number of digits shown for estimates (s.t. the number of digits vary for each row/ estimate). [1] Not on CRAN. See http://www.menne-biomed.de/download/download.html -- Karl Ove Hufthammer E-mail and Jabber: [EMAIL PROTECTED] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] [patch] Support many columns in model.matrix
Generating a model matrix with very large numbers of columns overflows the stack and/or runs very slowly, due to the implementation of TrimRepeats(). This patch modifies it to use Rf_duplicated() to find the duplicates. This makes the running time linear in the number of columns and eliminates the recursive function calls. Thanks Index: src/library/stats/src/model.c === --- src/library/stats/src/model.c (revision 70230) +++ src/library/stats/src/model.c (working copy) @@ -1259,11 +1259,12 @@ static int TermZero(SEXP term) { -int i, val; -val = 1; -for (i = 0; i < nwords; i++) - val = val && (INTEGER(term)[i] == 0); -return val; +for (int i = 0; i < nwords; i++) { +if (INTEGER(term)[i] != 0) { +return 0; +} +} +return 1; } @@ -1271,11 +1272,12 @@ static int TermEqual(SEXP term1, SEXP term2) { -int i, val; -val = 1; -for (i = 0; i < nwords; i++) - val = val && (INTEGER(term1)[i] == INTEGER(term2)[i]); -return val; +for (int i = 0; i < nwords; i++) { +if (INTEGER(term1)[i] != INTEGER(term2)[i]) { +return 0; +} +} +return 1; } @@ -1303,18 +1305,37 @@ /* TrimRepeats removes duplicates of (bit string) terms - in a model formula by repeated use of ``StripTerm''. + in a model formula. Also drops zero terms. */ static SEXP TrimRepeats(SEXP list) { -if (list == R_NilValue) - return R_NilValue; -/* Highly recursive */ -R_CheckStack(); -if (TermZero(CAR(list))) - return TrimRepeats(CDR(list)); -SETCDR(list, TrimRepeats(StripTerm(CAR(list), CDR(list; +// Drop zero terms at the start of the list. +while (list != R_NilValue && TermZero(CAR(list))) { + list = CDR(list); +} +if (list == R_NilValue || CDR(list) == R_NilValue) + return list; + +// Find out which terms are duplicates. +SEXP all_terms = PROTECT(Rf_PairToVectorList(list)); +SEXP duplicate_sexp = PROTECT(Rf_duplicated(all_terms, FALSE)); +int* is_duplicate = LOGICAL(duplicate_sexp); +int i = 0; + +// Remove the zero terms and duplicates from the list. +for (SEXP current = list; CDR(current) != R_NilValue; i++) { + SEXP next = CDR(current); + + if (is_duplicate[i + 1] || TermZero(CAR(next))) { + // Remove the node from the list. + SETCDR(current, CDR(next)); + } else { + current = next; + } +} + +UNPROTECT(2); return list; } __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [patch] Support many columns in model.matrix
Thanks. Couldn't you implement model.matrix(..., sparse = TRUE) with a small amount of R code similar to MatrixModels::model.Matrix ? On Mon, Feb 29, 2016 at 10:01 AM, Martin Maechler wrote: >>>>>> Karl Millar via R-devel >>>>>> on Fri, 26 Feb 2016 15:58:20 -0800 writes: > > > Generating a model matrix with very large numbers of > > columns overflows the stack and/or runs very slowly, due > > to the implementation of TrimRepeats(). > > > This patch modifies it to use Rf_duplicated() to find the > > duplicates. This makes the running time linear in the > > number of columns and eliminates the recursive function > > calls. > > Thank you, Karl. > I've committed this (very slightly modified) to R-devel, > > (also after looking for a an example that runs on a non-huge > computer and shows the difference) : > > nF <- 11 ; set.seed(1) > lff <- setNames(replicate(nF, as.factor(rpois(128, 1/4)), simplify=FALSE), > letters[1:nF]) > str(dd <- as.data.frame(lff)); prod(sapply(dd, nlevels)) > ## 'data.frame':128 obs. of 11 variables: > ## $ a: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 2 2 1 1 1 ... > ## $ b: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 2 1 1 1 ... > ## $ c: Factor w/ 3 levels "0","1","2": 1 1 1 2 1 1 1 2 1 1 ... > ## $ d: Factor w/ 3 levels "0","1","2": 1 1 2 2 1 2 1 1 2 1 ... > ## $ e: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 2 1 ... > ## $ f: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 2 ... > ## $ g: Factor w/ 4 levels "0","1","2","3": 2 1 1 2 1 3 1 1 1 1 ... > ## $ h: Factor w/ 4 levels "0","1","2","4": 1 1 1 1 2 1 1 1 1 1 ... > ## $ i: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ... > ## $ j: Factor w/ 3 levels "0","1","2": 1 2 3 1 1 1 1 1 1 1 ... > ## $ k: Factor w/ 3 levels "0","1","2": 1 1 1 1 1 1 1 1 1 1 ... > ## > ## [1] 139968 > > system.time(mff <- model.matrix(~ . ^ 11, dd, contrasts = list(a = > "contr.helmert"))) > ## user system elapsed > ## 0.255 0.033 0.287 --- *with* the patch on my desktop (16 GB) > ## 1.489 0.031 1.522 --- for R-patched (i.e. w/o the patch) > >> dim(mff) > [1]128 139968 >> object.size(mff) > 154791504 bytes > > --- > > BTW: These example would gain tremendously if I finally got > around to provide > >model.matrix(, sparse = TRUE) > > which would then produce a Matrix-package sparse matrix. > > Even for this somewhat small case, a sparse matrix is a factor > of 13.5 x smaller : > >> s1 <- object.size(mff); s2 <- object.size(M <- Matrix::Matrix(mff)); >> as.vector( s1/s2 ) > [1] 13.47043 > > I'm happy to collaborate with you on adding such a (C level) > interface to sparse matrices for this case. > > Martin Maechler __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Undocumented 'use.names' argument to c()
'c' has an undocumented 'use.names' argument. I'm not sure if this is a documentation or implementation bug. > c(a = 1) a 1 > c(a = 1, use.names = F) [1] 1 Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Undocumented 'use.names' argument to c()
I'd expect that a lot of the performance overhead could be eliminated by simply improving the underlying code. IMHO, we should ignore it in deciding the API that we want here. On Fri, Sep 23, 2016 at 10:54 AM, Henrik Bengtsson wrote: > I'd vote for it to stay. It could of course suprise someone who'd > expect c(list(a=1), b=2, use.names = FALSE) to generate list(a=1, b=2, > use.names=FALSE). On the upside, is the performance gain from using > use.names=FALSE. Below benchmarks show that the combining of the > names attributes themselves takes ~20-25 times longer than the > combining of the integers themselves. Also, at no surprise, > use.names=FALSE avoids some memory allocations. > >> options(digits = 2) >> >> a <- b <- c <- d <- 1:1e4 >> names(c) <- c >> names(d) <- d >> >> stats <- microbenchmark::microbenchmark( > + c(a, b, use.names=FALSE), > + c(c, d, use.names=FALSE), > + c(a, d, use.names=FALSE), > + c(a, b, use.names=TRUE), > + c(a, d, use.names=TRUE), > + c(c, d, use.names=TRUE), > + unit = "ms" > + ) >> >> stats > Unit: milliseconds >expr minlq mean medianuq max neval > c(a, b, use.names = FALSE) 0.031 0.032 0.049 0.034 0.036 1.474 100 > c(c, d, use.names = FALSE) 0.031 0.031 0.035 0.034 0.035 0.064 100 > c(a, d, use.names = FALSE) 0.031 0.031 0.049 0.034 0.035 1.452 100 > c(a, b, use.names = TRUE) 0.031 0.031 0.055 0.034 0.036 2.094 100 > c(a, d, use.names = TRUE) 0.510 0.526 0.588 0.549 0.617 1.998 100 > c(c, d, use.names = TRUE) 0.780 0.815 0.886 0.841 0.944 1.430 100 > >> profmem::profmem(c(c, d, use.names=FALSE)) > Rprofmem memory profiling of: > c(c, d, use.names = FALSE) > > Memory allocations: > bytes calls > 1 80040 > total 80040 > >> profmem::profmem(c(c, d, use.names=TRUE)) > Rprofmem memory profiling of: > c(c, d, use.names = TRUE) > > Memory allocations: >bytes calls > 1 80040 > 2 160040 > total 240080 > > /Henrik > > On Fri, Sep 23, 2016 at 10:25 AM, William Dunlap via R-devel > wrote: >> In Splus c() and unlist() called the same C code, but with a different >> 'sys_index' code (the last argument to .Internal) and c() did not consider >> an argument named 'use.names' special. >> >>> c >> function(..., recursive = F) >> .Internal(c(..., recursive = recursive), "S_unlist", TRUE, 1) >>> unlist >> function(data, recursive = T, use.names = T) >> .Internal(unlist(data, recursive = recursive, use.names = use.names), >> "S_unlist", TRUE, 2) >>> c(A=1,B=2,use.names=FALSE) >> A B use.names >> 1 2 0 >> >> The C code used sys_index==2 to mean 'the last argument is the 'use.names' >> argument, if sys_index==1 only the recursive argument was considered >> special. >> >> Sys.funs.c: >> 405 S_unlist(vector *ent, vector *arglist, s_evaluator *S_evaluator) >> 406 { >> 407 int which = sys_index; boolean named, recursive, names; >> ... >> 419 args = arglist->value.tree; n = arglist->length; >> ... >> 424 names = which==2 ? logical_value(args[--n], ent, S_evaluator) >> : (which == 1); >> >> Thus there is no historical reason for giving c() the use.names argument. >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Fri, Sep 23, 2016 at 9:37 AM, Suharto Anggono Suharto Anggono via >> R-devel wrote: >> >>> In S-PLUS 3.4 help on 'c' (http://www.uni-muenster.de/ >>> ZIV.BennoSueselbeck/s-html/helpfiles/c.html), there is no 'use.names' >>> argument. >>> >>> Because 'c' is a generic function, I don't think that changing formal >>> arguments is good. >>> >>> In R devel r71344, 'use.names' is not an argument of functions 'c.Date', >>> 'c.POSIXct' and 'c.difftime'. >>> >>> Could 'use.names' be documented to be accepted by the default method of >>> 'c', but not listed as a formal argument of 'c'? Or, could the code that >>> handles the argument name 'use.names' be removed? >>> >>> >>>>> David Winsemius >>> >>>>> on Tue, 20 Sep 2016 23:46:48 -0700 writes: >>> >>> >> On Sep 20, 2016, at 7:18 PM, Karl Millar via
[Rd] Is importMethodsFrom actually needed?
IIUC, loading a namespace automatically registers all the exported methods as long as the generic can be found when the namespace gets loaded. Generics can be exported and imported as regular functions. In that case, code in a package should be able to simply import the generic and the methods will automatically work correctly without any need for importMethodsFrom. Is there something that I'm missing here? What breaks if you don't explicitly import methods? Thanks, Karl __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Upgrading a package to which other packages are LinkingTo
A couple of points: - rebuilding dependent packages is needed if there is an ABI change, not just an API change. For packages like Rcpp which export inline functions or macros that might have changed, this is potentially any change to existing functions, but for packages like Matrix, it isn't really an issue at all IIUC. - If we're looking into a way to check if package APIs are compatible, then that's something that's relevant for all packages, since they all export an R API. I believe that CRAN only tests package compatibility with the most recent versions of packages on CRAN that import or depend on it. There's no guarantee that a package update won't contain API or behaviour changes that breaks older versions of packages, packages not on CRAN or any scripts that use the package, and these sorts of breakages do happen semi-regularly. - AFAICT, the only difference with packages like Rcpp is that you can potentially have all of your CRAN packages at the latest version, but some of them might have inlined code from an older version of Rcpp even after running update.packages(). While that is an issue, in my experience that's been a lot less trouble than the general case of backwards compatibility. Karl On Fri, Dec 16, 2016 at 8:19 AM, Dirk Eddelbuettel wrote: > > On 16 December 2016 at 11:00, Duncan Murdoch wrote: > | On 16/12/2016 10:40 AM, Dirk Eddelbuettel wrote: > | > On 16 December 2016 at 10:14, Duncan Murdoch wrote: > | > | On 16/12/2016 8:37 AM, Dirk Eddelbuettel wrote: > | > | > > | > | > On 16 December 2016 at 08:20, Duncan Murdoch wrote: > | > | > | Perhaps the solution is to recommend that packages which export > their > | > | > | C-level entry points either guarantee them not to change or offer > | > | > | (require?) version checks by user code. So dplyr should start out > by > | > | > | saying "I'm using Rcpp interface 0.12.8". If Rcpp has a new version > | > | > | with a compatible interface, it replies "that's fine". If Rcpp has > | > | > | changed its interface, it says "Sorry, I don't support that any > more." > | > | > > | > | > We try. But it's hard, and I'd argue, likely impossible. > | > | > > | > | > For example I even added a "frozen" package [1] in the sources / unit > tests > | > | > to test for just this. In practice you just cannot hit every possible > access > | > | > point of the (rich, in our case) API so the tests pass too often. > | > | > > | > | > Which is why we relentlessly test against reverse-depends to _at > least ensure > | > | > buildability_ from our releases. > | > > | > I meant to also add: "... against a large corpus of other packages." > | > The intent is to empirically answer this. > | > > | > | > As for seamless binary upgrade, I don't think in can work in > practice. Ask > | > | > Uwe one day we he rebuilds everything every time on Windows. And for > what it > | > | > is worth, we essentially do the same in Debian. > | > | > > | > | > Sometimes you just need to rebuild. That may be the price of > admission for > | > | > using the convenience of rich C++ interfaces. > | > | > > | > | > | > | Okay, so would you say that Kirill's suggestion is not overkill? Every > | > | time package B uses LinkingTo: A, R should assume it needs to rebuild B > | > | when A is updated? > | > > | > Based on my experience is a "halting problem" -- i.e. cannot know ex ante. > | > > | > So "every time" would be overkill to me. Sometimes you know you must > | > recompile (but try to be very prudent with public-facing API). Many times > | > you do not. It is hard to pin down. > | > > | > At work we have a bunch of servers with Rcpp and many packages against > them > | > (installed system-wide for all users). We _very really_ needs rebuild. > > Edit: "We _very rarely_ need rebuilds" is what was meant there. > > | So that comes back to my suggestion: you should provide a way for a > | dependent package to ask if your API has changed. If you say it hasn't, > | the package is fine. If you say it has, the package should abort, > | telling the user they need to reinstall it. (Because it's a hard > | question to answer, you might get it wrong and say it's fine when it's > | not. But that's easy to fix: just make a new release that does require > > Sure. > > We have always increased the higher-order version number when that is needed. > > One problem with your proposal is that the testing code may run after the > package load, and in the case where it matters ... that very code may not get > reached because the package didn't load. > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c
It's not always clear when it's safe to remove the DLL. The main problem that I'm aware of is that native objects with finalizers might still exist (created by R_RegisterCFinalizer etc). Even if there are no live references to such objects (which would be hard to verify), it still wouldn't be safe to unload the DLL until a full garbage collection has been done. If the DLL is unloaded, then the function pointer that was registered now becomes a pointer into the memory where the DLL was, leading to an almost certain crash when such objects get garbage collected. A better approach would be to just remove the limit on the number of DLLs, dynamically expanding the array if/when needed. On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms wrote: > On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson > wrote: >> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some >> packages don't unload their DLLs when they being unloaded themselves. > > I am surprised by this. Why does R not do this automatically? What is > the case for keeping the DLL loaded after the package has been > unloaded? What happens if you reload another version of the same > package from a different library after unloading? > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Request: Increasing MAX_NUM_DLLS in Rdynload.c
It does, but you'd still be relying on the R code ensuring that all of these objects are dead prior to unloading the DLL, otherwise they'll survive the GC. Maybe if the package counted how many such objects exist, it could work out when it's safe to remove the DLL. I'm not sure that it can be done automatically. What could be done is to to keep the DLL loaded, but remove it from R's table of loaded DLLs. That way, there's no risk of dangling function pointers and a new DLL of the same name could be loaded. You could still run into issues though as some DLLs assume that the associated namespace exists. Currently what I do is to never unload DLLs. If I need to replace one, then I just restart R. It's less convenient, but it's always correct. On Wed, Dec 21, 2016 at 9:10 AM, Henrik Bengtsson wrote: > On Tue, Dec 20, 2016 at 7:39 AM, Karl Millar wrote: >> It's not always clear when it's safe to remove the DLL. >> >> The main problem that I'm aware of is that native objects with >> finalizers might still exist (created by R_RegisterCFinalizer etc). >> Even if there are no live references to such objects (which would be >> hard to verify), it still wouldn't be safe to unload the DLL until a >> full garbage collection has been done. >> >> If the DLL is unloaded, then the function pointer that was registered >> now becomes a pointer into the memory where the DLL was, leading to an >> almost certain crash when such objects get garbage collected. > > Very good point. > > Does base::gc() perform such a *full* garbage collection and thereby > trigger all remaining finalizers to be called? In other words, do you > think an explicit call to base::gc() prior to cleaning out left-over > DLLs (e.g. R.utils::gcDLLs()) would be sufficient? > > /Henrik > >> >> A better approach would be to just remove the limit on the number of >> DLLs, dynamically expanding the array if/when needed. >> >> >> On Tue, Dec 20, 2016 at 3:40 AM, Jeroen Ooms >> wrote: >>> On Tue, Dec 20, 2016 at 7:04 AM, Henrik Bengtsson >>> wrote: >>>> On reason for hitting the MAX_NUM_DLLS (= 100) limit is because some >>>> packages don't unload their DLLs when they being unloaded themselves. >>> >>> I am surprised by this. Why does R not do this automatically? What is >>> the case for keeping the DLL loaded after the package has been >>> unloaded? What happens if you reload another version of the same >>> package from a different library after unloading? >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unlicense
Please don't use 'Unlimited' or 'Unlimited + ...'. Google's lawyers don't recognize 'Unlimited' as being open-source, so our policy doesn't allow us to use such packages due to lack of an acceptable license. To our lawyers, 'Unlimited + file LICENSE' means something very different than it presumably means to Uwe. Thanks, Karl On Sat, Jan 14, 2017 at 12:10 AM, Uwe Ligges wrote: > Dear all, > > from "Writing R Extensions": > > The string ‘Unlimited’, meaning that there are no restrictions on > distribution or use other than those imposed by relevant laws (including > copyright laws). > > If a package license restricts a base license (where permitted, e.g., using > GPL-3 or AGPL-3 with an attribution clause), the additional terms should be > placed in file LICENSE (or LICENCE), and the string ‘+ file LICENSE’ (or ‘+ > file LICENCE’, respectively) should be appended to the > corresponding individual license specification. > ... > Please note in particular that “Public domain” is not a valid license, since > it is not recognized in some jurisdictions." > > So perhaps you aim for > License: Unlimited > > Best, > Uwe Ligges > > > > > > On 14.01.2017 07:53, Deepayan Sarkar wrote: >> >> On Sat, Jan 14, 2017 at 5:49 AM, Duncan Murdoch >> wrote: >>> >>> On 13/01/2017 3:21 PM, Charles Geyer wrote: >>>> >>>> >>>> I would like the unlicense (http://unlicense.org/) added to R >>>> licenses. Does anyone else think that worthwhile? >>>> >>> >>> That's a question for you to answer, not to ask. Who besides you thinks >>> that it's a good license for open source software? >>> >>> If it is recognized by the OSF or FSF or some other authority as a FOSS >>> license, then CRAN would probably also recognize it. If not, then CRAN >>> doesn't have the resources to evaluate it and so is unlikely to recognize >>> it. >> >> >> Unlicense is listed in https://spdx.org/licenses/ >> >> Debian does include software "licensed" like this, and seems to think >> this is one way (not the only one) of declaring something to be >> "public domain". The first two examples I found: >> >> https://tracker.debian.org/media/packages/r/rasqal/copyright-0.9.29-1 >> >> https://tracker.debian.org/media/packages/w/wiredtiger/copyright-2.6.1%2Bds-1 >> >> This follows the format explained in >> >> https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-specification, >> which does not explicitly include Unlicense, but does include CC0, >> which AFAICT is meant to formally license something so that it is >> equivalent to being in the public domain. R does include CC0 as a >> shorthand (e.g., geoknife). >> >> https://www.debian.org/legal/licenses/ says that >> >> >> >> Licenses currently found in Debian main include: >> >> - ... >> - ... >> - public domain (not a license, strictly speaking) >> >> >> >> The equivalent for CRAN would probably be something like "License: >> public-domain + file LICENSE". >> >> -Deepayan >> >>> Duncan Murdoch >>> >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] unlicense
Unfortunately, our lawyers say that they can't give legal advice in this context. My question would be, what are people looking for that the MIT or 2-clause BSD license don't provide? They're short, clear, widely accepted and very permissive. Another possibility might be to dual-license packages with both an OSI-approved license and whatever-else-you-like, e.g. 'MIT | ', but IIUC there's a bunch more complexity there than just using an OSI-approved license. Karl On Tue, Jan 17, 2017 at 3:35 PM, Uwe Ligges wrote: > > > On 18.01.2017 00:13, Karl Millar wrote: >> >> Please don't use 'Unlimited' or 'Unlimited + ...'. >> >> Google's lawyers don't recognize 'Unlimited' as being open-source, so >> our policy doesn't allow us to use such packages due to lack of an >> acceptable license. To our lawyers, 'Unlimited + file LICENSE' means >> something very different than it presumably means to Uwe. > > > > Karl, > > thanks for this comment. What we like to hear now is a suggestion what the > maintainer is supposed to do to get what he aims at, as we already know that > "freeware" does not work at all and was hard enough to get to the > "Unlimited" options. > > We have many CRAN requests asking for what they should write for "freeware". > Can we get an opinion from your layers which standard license comes closest > to what these maintainers probably aim at and will work more or less > globally, i.e. not only in the US? > > Best, > Uwe > > > > >> Thanks, >> >> Karl >> >> On Sat, Jan 14, 2017 at 12:10 AM, Uwe Ligges >> wrote: >>> >>> Dear all, >>> >>> from "Writing R Extensions": >>> >>> The string ‘Unlimited’, meaning that there are no restrictions on >>> distribution or use other than those imposed by relevant laws (including >>> copyright laws). >>> >>> If a package license restricts a base license (where permitted, e.g., >>> using >>> GPL-3 or AGPL-3 with an attribution clause), the additional terms should >>> be >>> placed in file LICENSE (or LICENCE), and the string ‘+ file LICENSE’ (or >>> ‘+ >>> file LICENCE’, respectively) should be appended to the >>> corresponding individual license specification. >>> ... >>> Please note in particular that “Public domain” is not a valid license, >>> since >>> it is not recognized in some jurisdictions." >>> >>> So perhaps you aim for >>> License: Unlimited >>> >>> Best, >>> Uwe Ligges >>> >>> >>> >>> >>> >>> On 14.01.2017 07:53, Deepayan Sarkar wrote: >>>> >>>> >>>> On Sat, Jan 14, 2017 at 5:49 AM, Duncan Murdoch >>>> wrote: >>>>> >>>>> >>>>> On 13/01/2017 3:21 PM, Charles Geyer wrote: >>>>>> >>>>>> >>>>>> >>>>>> I would like the unlicense (http://unlicense.org/) added to R >>>>>> licenses. Does anyone else think that worthwhile? >>>>>> >>>>> >>>>> That's a question for you to answer, not to ask. Who besides you >>>>> thinks >>>>> that it's a good license for open source software? >>>>> >>>>> If it is recognized by the OSF or FSF or some other authority as a FOSS >>>>> license, then CRAN would probably also recognize it. If not, then CRAN >>>>> doesn't have the resources to evaluate it and so is unlikely to >>>>> recognize >>>>> it. >>>> >>>> >>>> >>>> Unlicense is listed in https://spdx.org/licenses/ >>>> >>>> Debian does include software "licensed" like this, and seems to think >>>> this is one way (not the only one) of declaring something to be >>>> "public domain". The first two examples I found: >>>> >>>> https://tracker.debian.org/media/packages/r/rasqal/copyright-0.9.29-1 >>>> >>>> >>>> https://tracker.debian.org/media/packages/w/wiredtiger/copyright-2.6.1%2Bds-1 >>>> >>>> This follows the format explained in >>>> >>>> >>>> https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/#license-specification, >>>> which does not explicitly include Unlicense, but does include CC0, >>>> which AFAICT is meant to formally license something so that it is >>>> equivalent to being in the public domain. R does include CC0 as a >>>> shorthand (e.g., geoknife). >>>> >>>> https://www.debian.org/legal/licenses/ says that >>>> >>>> >>>> >>>> Licenses currently found in Debian main include: >>>> >>>> - ... >>>> - ... >>>> - public domain (not a license, strictly speaking) >>>> >>>> >>>> >>>> The equivalent for CRAN would probably be something like "License: >>>> public-domain + file LICENSE". >>>> >>>> -Deepayan >>>> >>>>> Duncan Murdoch >>>>> >>>>> >>>>> __ >>>>> R-devel@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>>> >>>> >>>> __ >>>> R-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>> >>> >>> __ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Control statements with condition with greater than one should give error (not just warning) [PATCH]
Is there anything that actually requires R core members to manually do significant amounts of work here? IIUC, you can do a CRAN run to detect the broken packages, and a simple script can collect the emails of the affected maintainers, so you can send a single email to them all. If authors don't respond by fixing their packages, then those packages should be archived, since there's high probability of those packages being buggy anyway. If you expect a non-trivial amount of questions regarding this change from the affected package maintainers, then you can create a FAQ page for it, which you can fill in as questions arrive, so you don't get too many duplicated questions. Karl On Mon, Mar 6, 2017 at 4:51 AM, Martin Maechler wrote: > >>>>> Michael Lawrence > >>>>> on Sat, 4 Mar 2017 12:20:45 -0800 writes: > > > Is there really a need for these complications? Packages > > emitting this warning are broken by definition and should be fixed. > > I agree and probably Henrik, too. > > (Others may disagree to some extent .. and find it convenient > that R does translate 'if(x)' to 'if(x[1])' for them albeit > with a warning .. ) > > > Perhaps we could "flip the switch" in a test > > environment and see how much havoc is wreaked and whether > > authors are sufficiently responsive? > > > Michael > > As we have > 10'000 packages on CRAN alonce, and people have > started (mis)using suppressWarnings(.) in many places, there > may be considerably more packages affected than we optimistically assume... > > As R core member who would "flip the switch" I'd typically then > have to be the one sending an e-mail to all package maintainers > affected and in this case I'm very reluctant to volunteer > for that and so, I'd prefer the environment variable where R > core and others can decide how to use it .. for a while .. until > the flip is switched for all. > > or have I overlooked an issue? > > Martin > > > On Sat, Mar 4, 2017 at 12:04 PM, Martin Maechler > > >> wrote: > > >> >>>>> Henrik Bengtsson >>>>> > >> on Fri, 3 Mar 2017 10:10:53 -0800 writes: > >> > >> > On Fri, Mar 3, 2017 at 9:55 AM, Hadley Wickham > > >> wrote: >>> But, how you propose a > >> warning-to-error transition >>> should be made without > >> wreaking havoc? Just flip the >>> switch in R-devel and > >> see CRAN and Bioconductor packages >>> break overnight? > >> Particularly Bioconductor devel might >>> become > >> non-functional (since at times it requires >>> R-devel). > >> For my own code / packages, I would be able >>> to handle > >> such a change, but I'm completely out of >>> control if > >> one of the package I'm depending on does not >>> provide > >> a quick fix (with the only option to remove >>> package > >> tests for those dependencies). > >> >> > >> >> Generally, a package can not be on CRAN if it has any > >> >> warnings, so I don't think this change would have any > >> >> impact on CRAN packages. Isn't this also true for >> > >> bioconductor? > >> > >> > Having a tests/warn.R file with: > >> > >> > warning("boom") > >> > >> > passes through R CMD check --as-cran unnoticed. > >> > >> Yes, indeed.. you are right Henrik that many/most R > >> warning()s would not produce R CMD check 'WARNING's .. > >> > >> I think Hadley and I fell into the same mental pit of > >> concluding that such warning()s from > >> if() ... would not currently happen > >> in CRAN / Bioc packages and hence turning them to errors > >> would not have a direct effect. > >> > >> With your 2nd e-mail of saying that you'd propose such an > >> option only for a few releases of R you've indeed > >> clarified your intent to me. OTOH, I would prefer using > >> an environment variable (as you've proposed as an > >> alternative) which is turned "active" at the beginning > >> only manually or for the "CRAN incoming" checks of the > >> CRAN team (and bioconductor submission checks?) and > >> later for '--as-cran' etc until it eventually becomes the > >> unconditional behavior of R (and the env.variable is no > >> longer used). > >> > >> Martin > >> > >> __ > >> R-devel@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-devel > >> > > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] segfault when trying to allocate a large vector
Hi Pierrick, You're storing largevec on the stack, which is probably causing a stack overflow. Allocate largvec on the heap with malloc or one of the R memory allocation routines instead and it should work fine. Karl On Thu, Dec 18, 2014 at 12:00 AM, Pierrick Bruneau wrote: > > Dear R contributors, > > I'm running into trouble when trying to allocate some large (but in > theory viable) vector in the context of C code bound to R through > .Call(). Here is some sample code summarizing the problem: > > SEXP test() { > > int size = 1000; > double largevec[size]; > memset(largevec, 0, size*sizeof(double)); > return(R_NilValue); > > } > > If size if small enough (up to 10^6), everything is fine. When it > reaches 10^7 as above, I get a segfault. As far as I know, a double > value is represented with 8 bytes, which would make largevec above > approx. 80Mo -> this is certainly large for a single variable, but > should remain well below the limits of my machine... Also, doing a > calloc for the same vector size leads to the same outcome. > > In my package, I would use large vectors that cannot be assumed to be > sparse - so utilities for sparse matrices may not be considered. > > I run R on ubuntu 64-bit, with 8G RAM, and a 64-bit R build (3.1.2). > As my problem looks close to that seen in > http://r.789695.n4.nabble.com/allocMatrix-limits-td864864.html, > following what I have seen in ?"Memory-limits" I checked that ulimit > -v returns "unlimited". > > I guess I must miss something, like contiguity issues, or other. Does > anyone have a clue for me? > > Thanks by advance, > Pierrick > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] [PATCH] Makefile: add support for git svn clones
Fellipe, CXXR development has moved to github, and we haven't fixed up the build for using git yet. Could you send a pull request with your change to the repo at https://github.com/cxxr-devel/cxxr/? Also, this patch may be useful for pqR too. https://github.com/radfordneal/pqR Thanks On Mon, Jan 19, 2015 at 2:35 PM, Dirk Eddelbuettel wrote: > > On 19 January 2015 at 17:11, Duncan Murdoch wrote: > | The people who would have to maintain the patch can't test it. > > I don't understand this. > > The patch, as we may want to recall, was all of > >+GIT := $(shell if [ -d "$(top_builddir)/.git" ]; then \ >+echo "git"; fi) >+ > > and > >- (cd $(srcdir); LC_ALL=C TZ=GMT svn info || $(ECHO) "Revision: > -99") 2> /dev/null \ >+ (cd $(srcdir); LC_ALL=C TZ=GMT $(GIT) svn info || $(ECHO) > "Revision: -99") 2> /dev/null \ > > I believe you can test that builds works before applying the patch, and > afterwards---even when you do not have git, or in this case a git checkout. > The idiom of expanding a variable to "nothing" if not set is used all over > the R sources and can be assumed common. And if (hypothetically speaking) > the build failed when a .git directory was present? None of R Core's > concern > either as git was never supported. > > I really do not understand the excitement over this. The patch is short, > clean, simple, and removes an entirely unnecessary element of friction. > > Dirk > > -- > http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel