[Rd] Recent and upcoming changes to R-devel
There was an R-core meeting the week before last, and various planned changes will appear in R-devel over the next few weeks. These are changes planned for R 2.14.0 scheduled for Oct 31. As we are sick of people referring to R-devel as '2.14' or '2.14.0', that version number will not be used until we reach 2.14.0 alpha. You will be able to have a package depend on an svn version number when referring to R-devel rather than using R (>= 2.14.0). All packages are installed with lazy-loading (there were 72 CRAN packages and 8 BioC packages which opted out). This means that the code is always parsed at install time which inter alia simplifies the descriptions. R 2.13.1 RC warns on installation about packages which ask not to be lazy-loaded, and R-devel ignores such requests (with a warning). In the near future all packages will have a name space. If the sources do not contain one, a default NAMESPACE file will be added. This again will simplify the descriptions and also a lot of internal code. Maintainers of packages without name spaces (currently 42% of CRAN) are encouraged to add one themselves. R-devel is installed with the base and recommended packages byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but done less inefficiently). There is a new option R CMD INSTALL --byte-compile to byte-compile contributed packages, but that remains optional. Byte-compilation is quite expensive (so you definitely want to do it at install time, which requires lazy-loading), and relatively few packages benefit appreciably from byte-compilation. A larger number of packages benefit from byte-compilation of R itself: for example AER runs its checks 10% faster. The byte-compiler technology is thanks to Luke Tierney. There is support for figures in Rd files: currently with a first-pass implementation (thanks to Duncan Murdoch). -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Recent and upcoming changes to R-devel
On 07/04/2011 05:08 AM, Prof Brian Ripley wrote: There was an R-core meeting the week before last, and various planned changes will appear in R-devel over the next few weeks. These are changes planned for R 2.14.0 scheduled for Oct 31. As we are sick of people referring to R-devel as '2.14' or '2.14.0', that version number will not be used until we reach 2.14.0 alpha. You will be able to have a package depend on an svn version number when referring to R-devel rather than using R (>= 2.14.0). All packages are installed with lazy-loading (there were 72 CRAN packages and 8 BioC packages which opted out). This means that the code is always parsed at install time which inter alia simplifies the descriptions. R 2.13.1 RC warns on installation about packages which ask not to be lazy-loaded, and R-devel ignores such requests (with a warning). In the near future all packages will have a name space. If the sources do not contain one, a default NAMESPACE file will be added. This again will simplify the descriptions and also a lot of internal code. Maintainers of packages without name spaces (currently 42% of CRAN) are encouraged to add one themselves. R-devel is installed with the base and recommended packages byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but done less inefficiently). There is a new option R CMD INSTALL --byte-compile to byte-compile contributed packages, but that remains optional. Anticipating the future, contributed package byte-compilation will have large effects on CRAN and especially Bioconductor build systems. For instance, a moderate-sized package like Biobase built without vignettes installs in about 19s with byte compilation, 9s with, while a more complicated package IRanges is 1m25s, vs. 29s. For Bioconductor this will certainly require new hardware across all supported platforms, and almost certainly significant effort to improve build system efficiencies. Martin Byte-compilation is quite expensive (so you definitely want to do it at install time, which requires lazy-loading), and relatively few packages benefit appreciably from byte-compilation. A larger number of packages benefit from byte-compilation of R itself: for example AER runs its checks 10% faster. The byte-compiler technology is thanks to Luke Tierney. There is support for figures in Rd files: currently with a first-pass implementation (thanks to Duncan Murdoch). -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Recent and upcoming changes to R-devel
On Mon, 4 Jul 2011, Martin Morgan wrote: On 07/04/2011 05:08 AM, Prof Brian Ripley wrote: There was an R-core meeting the week before last, and various planned changes will appear in R-devel over the next few weeks. These are changes planned for R 2.14.0 scheduled for Oct 31. As we are sick of people referring to R-devel as '2.14' or '2.14.0', that version number will not be used until we reach 2.14.0 alpha. You will be able to have a package depend on an svn version number when referring to R-devel rather than using R (>= 2.14.0). All packages are installed with lazy-loading (there were 72 CRAN packages and 8 BioC packages which opted out). This means that the code is always parsed at install time which inter alia simplifies the descriptions. R 2.13.1 RC warns on installation about packages which ask not to be lazy-loaded, and R-devel ignores such requests (with a warning). In the near future all packages will have a name space. If the sources do not contain one, a default NAMESPACE file will be added. This again will simplify the descriptions and also a lot of internal code. Maintainers of packages without name spaces (currently 42% of CRAN) are encouraged to add one themselves. R-devel is installed with the base and recommended packages byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but done less inefficiently). There is a new option R CMD INSTALL --byte-compile to byte-compile contributed packages, but that remains optional. Anticipating the future, contributed package byte-compilation will have large effects on CRAN and especially Bioconductor build systems. For instance, a moderate-sized package like Biobase built without vignettes installs in about 19s with byte compilation, 9s with, while a more complicated package IRanges is 1m25s, vs. 29s. I presume the first is 'with' the second 'without'. Yes, as I did say 'byte compilation is quite expensive', and it is not clear if it will ever become the default for contributed packages. For Bioconductor this will certainly require new hardware across all supported platforms, and almost certainly significant effort to improve build system efficiencies. Martin Byte-compilation is quite expensive (so you definitely want to do it at install time, which requires lazy-loading), and relatively few packages benefit appreciably from byte-compilation. A larger number of packages benefit from byte-compilation of R itself: for example AER runs its checks 10% faster. The byte-compiler technology is thanks to Luke Tierney. There is support for figures in Rd files: currently with a first-pass implementation (thanks to Duncan Murdoch). -- Computational Biology Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: M1-B861 Telephone: 206 667-2793 -- Brian D. Ripley, rip...@stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UKFax: +44 1865 272595 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] speeding up perception
Hi -- It's my first post on this list; as a relatively new user with little knowledge of R internals, I am a bit intimidated by the depth of some of the discussions here, so please spare me if I say something incredibly silly. I feel that someone at this point should mention Matthew Dowle's excellent data.table package (http://cran.r-project.org/web/packages/data.table/index.html) which seems to me to address many of the inefficiencies of data.frame. data.tables have no row names; and operations that only need data from one or two columns are (I believe) just as quick whether the total number of columns is 5 or 1000. This results in very quick operations (and, often, elegant code as well). Regards Timothee On Mon, Jul 4, 2011 at 6:19 AM, ivo welch wrote: > thank you, simon. this was very interesting indeed. I also now > understand how far out of my depth I am here. > > fortunately, as an end user, obviously, *I* now know how to avoid the > problem. I particularly like the as.list() transformation and back to > as.data.frame() to speed things up without loss of (much) > functionality. > > > more broadly, I view the avoidance of individual access through the > use of apply and vector operations as a mixed "IQ test" and "knowledge > test" (which I often fail). However, even for the most clever, there > are also situations where the KISS programming principle makes > explicit loops still preferable. Personally, I would have preferred > it if R had, in its standard "statistical data set" data structure, > foregone the row names feature in exchange for retaining fast direct > access. R could have reserved its current implementation "with row > names but slow access" for a less common (possibly pseudo-inheriting) > data structure. > > > If end users commonly do iterations over a data frame, which I would > guess to be the case, then the impression of R by (novice) end users > could be greatly enhanced if the extreme penalties could be eliminated > or at least flagged. For example, I wonder if modest special internal > code could store data frames internally and transparently as lists of > vectors UNTIL a row name is assigned to. Easier and uglier, a simple > but specific warning message could be issued with a suggestion if > there is an individual read/write into a data frame ("Warning: data > frames are much slower than lists of vectors for individual element > access"). > > > I would also suggest changing the "Introduction to R" 6.3 from "A > data frame may for many purposes be regarded as a matrix with columns > possibly of differing modes and attributes. It may be displayed in > matrix form, and its rows and columns extracted using matrix indexing > conventions." to "A data frame may for many purposes be regarded as a > matrix with columns possibly of differing modes and attributes. It may > be displayed in matrix form, and its rows and columns extracted using > matrix indexing conventions. However, data frames can be much slower > than matrices or even lists of vectors (which, like data frames, can > contain different types of columns) when individual elements need to > be accessed." Reading about it immediately upon introduction could > flag the problem in a more visible manner. > > > regards, > > /iaw > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] speeding up perception
Timothée, On Jul 4, 2011, at 2:47 AM, Timothée Carayol wrote: > Hi -- > > It's my first post on this list; as a relatively new user with little > knowledge of R internals, I am a bit intimidated by the depth of some > of the discussions here, so please spare me if I say something > incredibly silly. > > I feel that someone at this point should mention Matthew Dowle's > excellent data.table package > (http://cran.r-project.org/web/packages/data.table/index.html) which > seems to me to address many of the inefficiencies of data.frame. > data.tables have no row names; and operations that only need data from > one or two columns are (I believe) just as quick whether the total > number of columns is 5 or 1000. This results in very quick operations > (and, often, elegant code as well). > I agree that data.table is a very good alternative (for other reasons) that should be promoted more. The only slight snag is that it doesn't help with the issue at hand since it simply does a pass-though for subassignments to data frame's methods and thus suffers from the same problems (in fact there is a rather stark asymmetry in how it handles subsetting vs subassignment - which is a bit surprising [if I read the code correctly you can't use the same indexing in both]). In fact I would propose that it should not do that but handle the simple cases itself more efficiently without unneeded copies. That would make it indeed a very interesting alternative. Cheers, Simon > > On Mon, Jul 4, 2011 at 6:19 AM, ivo welch wrote: >> thank you, simon. this was very interesting indeed. I also now >> understand how far out of my depth I am here. >> >> fortunately, as an end user, obviously, *I* now know how to avoid the >> problem. I particularly like the as.list() transformation and back to >> as.data.frame() to speed things up without loss of (much) >> functionality. >> >> >> more broadly, I view the avoidance of individual access through the >> use of apply and vector operations as a mixed "IQ test" and "knowledge >> test" (which I often fail). However, even for the most clever, there >> are also situations where the KISS programming principle makes >> explicit loops still preferable. Personally, I would have preferred >> it if R had, in its standard "statistical data set" data structure, >> foregone the row names feature in exchange for retaining fast direct >> access. R could have reserved its current implementation "with row >> names but slow access" for a less common (possibly pseudo-inheriting) >> data structure. >> >> >> If end users commonly do iterations over a data frame, which I would >> guess to be the case, then the impression of R by (novice) end users >> could be greatly enhanced if the extreme penalties could be eliminated >> or at least flagged. For example, I wonder if modest special internal >> code could store data frames internally and transparently as lists of >> vectors UNTIL a row name is assigned to. Easier and uglier, a simple >> but specific warning message could be issued with a suggestion if >> there is an individual read/write into a data frame ("Warning: data >> frames are much slower than lists of vectors for individual element >> access"). >> >> >> I would also suggest changing the "Introduction to R" 6.3 from "A >> data frame may for many purposes be regarded as a matrix with columns >> possibly of differing modes and attributes. It may be displayed in >> matrix form, and its rows and columns extracted using matrix indexing >> conventions." to "A data frame may for many purposes be regarded as a >> matrix with columns possibly of differing modes and attributes. It may >> be displayed in matrix form, and its rows and columns extracted using >> matrix indexing conventions. However, data frames can be much slower >> than matrices or even lists of vectors (which, like data frames, can >> contain different types of columns) when individual elements need to >> be accessed." Reading about it immediately upon introduction could >> flag the problem in a more visible manner. >> >> >> regards, >> >> /iaw >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] speeding up perception
I've written a "dataframe" package that replaces existing methods for data frame creation and subscripting with versions that use less memory. For example, as.data.frame(a vector) makes 4 copies of the data in R 2.9.2, and 1 copy with the package. There is a small speed gain. I and others have been using it at Google for some years, and it is time to either put it on CRAN, or move it into R. R core folks - would you prefer that this be released to CRAN, or would you like to consider merging it directly into R? I took existing functions, and did some hacks to reduce the number of times R copies objects. Some of it is ugly. This could be done more efficiently, and with cleaner code, with some changes or hooks in R internal code, but I'm not prepared to do that. I often use lists instead of data frames. In another package I have a 'subscriptRows' function that subscripts a list as if it were a data frame. I could merge that into the dataframe package. Memory use - number of copies made # R 2.9.2 library(dataframe) # as.data.frame(y)4 1 # data.frame(y) 8 3 # data.frame(y, z)8 3 # as.data.frame(l)10 3 # data.frame(l) 15 5 # d$z <- z3,2 1,1 # d[["z"]] <- z 4,3 2,1 # d[, "z"] <- z 6,4,2 2,2,1 # d["z"] <- z 6,5,2 2,2,1 # d["z"] <- list(z=z) 6,3,2 2,2,1 # d["z"] <- Z #list(z=z) 6,2,2 2,1,1 # a <- d["y"] 2 1 # a <- d[, "y", drop=F] 2 1 # y and z are vectors, Z and l are lists, and d a data frame. # Where two numbers are given, they refer to: # (copies of the old data frame), # (copies of the new column) # A third number refers to numbers of # (copies made of an integer vector of row names) # --- seconds (multiple repetitions) --- # creation/column subscripting row subscripting # R 2.9.2: 34.2 43.9 43.3 10.6 13.0 # library(dataframe) : 22.5 21.8 21.89.7 9.5 9.5 I reported one of the simpler hacks to this list earlier, and it was included in some version of R after 2.9.2, so the current version of R isn't as bad as 2.9.2. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Recent and upcoming changes to R-devel
I may have misunderstood, but: Please could we have an optional installation that does not *not* byte-compile base and recommended? Reason: it's not possible to debug byte-compiled code-- at least not with the 'debug' package, which is quite widely used. I quite often end up using 'mtrace' on functions in base/recommended packages to figure out what they are doing. And sometimes I (and others) experiment with changing functions in base/recommended to improve functionality. That seems to be harder with BC versions, and might even be impossible, as best I can tell from hints in the documentation of 'compile'). Personally, if I had to choose only one, I'd rather live with the speed penalty from not byte-compiling. But of course, if both are available, I could install both. Thanks Mark -- Mark Bravington CSIRO Mathematical & Information Sciences Marine Laboratory Castray Esplanade Hobart 7001 TAS ph (+61) 3 6232 5118 fax (+61) 3 6232 5012 mob (+61) 438 315 623 Prof Brian Ripley wrote: > There was an R-core meeting the week before last, and various planned > changes will appear in R-devel over the next few weeks. > > These are changes planned for R 2.14.0 scheduled for Oct 31. As we > are sick of people referring to R-devel as '2.14' or '2.14.0', that > version number will not be used until we reach 2.14.0 alpha. You > will be able to have a package depend on an svn version number when > referring to R-devel rather than using R (>= 2.14.0). > > All packages are installed with lazy-loading (there were 72 CRAN > packages and 8 BioC packages which opted out). This means that the > code is always parsed at install time which inter alia simplifies the > descriptions. R 2.13.1 RC warns on installation about packages which > ask not to be lazy-loaded, and R-devel ignores such requests (with a > warning). > > In the near future all packages will have a name space. If the > sources do not contain one, a default NAMESPACE file will be added. > This again will simplify the descriptions and also a lot of internal > code. Maintainers of packages without name spaces (currently 42% of > CRAN) are encouraged to add one themselves. > > R-devel is installed with the base and recommended packages > byte-compiled (the equivalent of 'make bytecode' in R 2.13.x, but > done less inefficiently). There is a new option R CMD INSTALL > --byte-compile to byte-compile contributed packages, but that remains > optional. > Byte-compilation is quite expensive (so you definitely want to do it > at install time, which requires lazy-loading), and relatively few > packages benefit appreciably from byte-compilation. A larger number > of packages benefit from byte-compilation of R itself: for example > AER runs its checks 10% faster. The byte-compiler technology is > thanks to Luke Tierney. > > There is support for figures in Rd files: currently with a first-pass > implementation (thanks to Duncan Murdoch). __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Circumventing code/documentation mismatches ('R CMD check')
Hello, As prompted by B. Ripley (see below), I am transfering this over from R-User ... For a package I am writing a function that looks like test <- function(Argument1=NA){ # Prerequisite testing if(!(is.na(Argument1))){ if(!(is.character(Argument1))){ stop("Wrong class.") } } # Function Body cat("Hello World\n") } Documentation of this is straight forward: ... \usage{test(Argument1=NA)} ... However writing the function could be made more concise like so: test2 <- function(Argument1=NA_character_){ # Prerequisite testing if(!(is.character(Argument1))){ stop("Wrong class.") } # Function Body cat("Hello World\n") } To prevent confusion I do not want to use 'NA_character_' in the user- exposed documentation and using ... \usage{test2(Argument1=NA)} ... leads to a warning reagrding a code/documentation mismatch. Is there any way to prevent that? Sincerely, Joh Prof Brian Ripley wrote: > On Mon, 4 Jul 2011, Johannes Graumann wrote: > >> Hello, >> >> I'm writing a package am running 'R CMD check' on it. >> >> Is there any way to make 'R CMD check' not warn about a missmatch between >> 'NA_character_' (in the function definition) and 'NA' (in the >> documentation)? > > Be consistent Why do you want incorrect documentation of your > package? (It is not clear of the circumstances here: normally 1 vs 1L > and similar are not reported if they are the only errors.) > > And please do note the posting guide > > - this is not really the correct list > - you were asked to give an actual example with output. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel