Re: [Rd] Question re: NA, NaNs in R

2014-02-10 Thread Tim Hesterberg
aggregate package is available at http://www.timhesterberg.net/r-packages Here is the inst/doc/missingValues.txt file from that package: -- Copyright 2012 Google Inc. All Rights Reserved. Author: Tim Hesterberg Distributed under GPL 2 or later.

[Rd] Suggest adding a "testing" keyword

2013-06-13 Thread Tim Hesterberg
I suggest adding this to R_HOME/doc/KEYWORDS.db: Programming|testing: Software testing and add a corresponding entry in R_HOME/doc/KEYWORDS. [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailma

Re: [Rd] R 3.0.0 memory use

2013-04-14 Thread Tim Hesterberg
n/src/R-3-0-branch/src/main/memory.c:2478 > #3 0x7790bedf in growData () at gram.y:3391 > >and the memory allocations are from these lines in the parser gram.y > > PROTECT( bigger = allocVector( INTSXP, data_size * DATA_ROWS ) ) ; > PROTECT( biggertext

[Rd] R 3.0.0 memory use

2013-04-14 Thread Tim Hesterberg
I did some benchmarking of data frame code, and it appears that R 3.0.0 is far worse than earlier versions of R in terms of how many large objects it allocates space for, for data frame operations - creation, subscripting, subscript replacement. For a data frame with n rows, it makes either 2 or 4

Re: [Rd] Suggest adding a 'pivot' argument to qr.R

2012-09-11 Thread Tim Hesterberg
>On Sep 11, 2012, at 16:02 , Warnes, Gregory wrote: > >> >> On 9/7/12 2:42 PM, "peter dalgaard" wrote: >> >>> >>> On Sep 7, 2012, at 17:16 , Tim Hesterberg wrote: >>> >>>> I suggest adding a 'pivot' argument to qr.

[Rd] Suggest adding a 'pivot' argument to qr.R

2012-09-07 Thread Tim Hesterberg
I suggest adding a 'pivot' argument to qr.R, to obtain columns in the same order as the original x, so that a <- qr(x) qr.Q(a) %*% qr.R(a, pivot=TRUE) returns x. -- # File src/library/base/R/qr.R qr.R <- function(qr, complete = FALSE, pivot = F

[Rd] Need to tell R CMD check that a function qr.R is not a method

2012-09-07 Thread Tim Hesterberg
When creating a package, I would like a way to tell R that a function with a period in its name is not a method. I'm writing a package now with a modified version of qr.R. R CMD check gives warnings: * checking S3 generic/method consistency ... WARNING qr: function(x, ...) qr.R: function(qr,

[Rd] suggest that as.double( something double ) not make a copy

2012-06-06 Thread Tim Hesterberg
I've been playing with passing arguments to .C(), and found that replacing as.double(x) with if(is.double(x)) x else as.double(x) saves time and avoids one copy, in the case that x is already double. I suggest modifying as.double to avoid the extra copy and just return x, when x is already

[Rd] Add DUP = FALSE when tabulate() calls .C("R_tabulate"

2012-04-08 Thread Tim Hesterberg
In base/R/tabulate.R, tabulate() calls .C("R_tabulate"; I suggest adding DUP = FALSE to that call. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Suggested improvement for src/library/base/man/qraux.Rd

2011-11-21 Thread Tim Hesterberg
information. Tim Hesterberg -- % File src/library/base/man/qraux.Rd % Part of the R package, http://www.R-project.org<http://www.r-project.org/> % Copyright 1995-2007 R Core Development Team % Distributed under GPL 2 or later \name{QR.Auxiliaries}

Re: [Rd] speeding up perception

2011-07-04 Thread Tim Hesterberg
I've written a "dataframe" package that replaces existing methods for data frame creation and subscripting with versions that use less memory. For example, as.data.frame(a vector) makes 4 copies of the data in R 2.9.2, and 1 copy with the package. There is a small speed gain. I and others have b

Re: [Rd] median and data frames

2011-04-30 Thread Tim Hesterberg
I also favor deprecating mean.data.frame. One possible exception would be for a single-column data frame. But even here I'd say no, lest people expect the same behavior for median, var, ... Pat's suggestion of using stop() would work nicely for mean. (but omit paste - stop handles t

Re: [Rd] matrixStats: Extend to arrays too (Was: Re: Suggestion: Adding quick rowMin and rowMax functions to base package)

2011-02-16 Thread Tim Hesterberg
For consistency with rowSums colSums rowMeans etc., the names should be colMins colMaxs rowMins rowMaxs This is also consistent with S+. FYI, the rowSums naming convention was chosen to avoid conflict with rowsum (which computes column sums!). Tim Hesterberg >> A well-de

Re: [Rd] aperm() should retain class of input object

2010-12-28 Thread Tim Hesterberg
Having aperm() return an object of the same class is dangerous, there are undoubtedly classes for which that is not appropriate, producing an illegal object for that class or quietly giving incorrect results. Three alternatives are to: * add the keep.class option but with default FALSE * make aper

Re: [Rd] Using sample() to sample one value from a single value?

2010-11-04 Thread Tim Hesterberg
erg/articles/JSM04-bootknife.pdf All three are undefined for samples of size 1. You need to go to some other bootstrap, e.g. a parametric bootstrap with variability estimated from other data. Tim Hesterberg __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] suggest enhancement to segments and arrows to facilitate horizontal and vertical segments

2009-10-02 Thread Tim Hesterberg
ion (x0, y0, x1 = x0, y1 = y0, col = par("fg"), lty = par("lty"), --- > function (x0, y0, x1, y1, col = par("fg"), lty = par("lty"), Arrows: < function (x0, y0, x1 = x0, y1 = y0, length = 0.25, angle = 30, code = 2, --- > function (x0, y0, x1, y1,

[Rd] Faster as.data.frame & save copy by doing names(x) <- NULL only if needed

2009-07-14 Thread Tim Hesterberg
A number of as.data.frame methods do names(x) <- NULL Replacing that with if(!is.null(names(x))) names(x) <- NULL appears to save making one copy of the data (based on tracemem and Rprofmem in a copy of R compiled with --enable-memory-profiling) and gives a modest but consistent b

[Rd] non-duplicate names in data frames

2009-02-01 Thread Tim Hesterberg
Any data frames with more than this have dup.row.names default to 2. The name 'dup.row.names' is for consistency with S+; there the options are NULL, F or T. Tim Hesterberg __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] (PR#8192) [ subscripting sometimes loses names

2009-02-01 Thread Tim Hesterberg
t this doesn't go far enough; subscripting and other operations sometimes convert the automatic names to real names, and check/enforce uniqueness, which is a big waste of time when working with large data frames. I'll comment more on this in a new thread. Tim Hesterberg __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ifelse

2008-08-25 Thread Tim Hesterberg
Others have commented on why this holds. There is an alternative, 'ifelse1', part of the splus2R package, that does what you'd like here. Tim Hesterberg >I find it slightly surprising, that > ifelse(TRUE, character(0), "") >returns NA instead of char

Re: [Rd] [.data.frame speedup

2008-07-03 Thread Tim Hesterberg
length(i) < 2 || + (is.numeric(i) && min(i, 0, na.rm=TRUE) < 0) || + (!any(is.na(i)) && all(i[-length(i)] wrote: > >>>>> "TH" == Tim Hesterberg <[EMAIL PROTECTED]> > >>

Re: [Rd] [.data.frame speedup

2008-07-01 Thread Tim Hesterberg
ict, any(x[-1] >= x[-n]), any(x[-1] > x[-n])) } else { # check for sort in increasing order ifelse1(strict, any(x[-1] <= x[-n]), any(x[-1] < x[-n])) } } On Tue, Jul 1, 2008 at 3:23 PM, Tim Hesterberg <[EMAIL PROTECTED]>

Re: [Rd] [.data.frame speedup

2008-07-01 Thread Tim Hesterberg
ng=FALSE, check for sort in increasing order # If strict=TRUE, ties correspond to not being sorted n <- length(x) if(length(n) < 2) return(FALSE) if(!is.atomic(x) || (!na.rm && any(is.na(x return(NA) if(na.rm && any(ii <- is.na(x))) x <- x[!ii]

[Rd] [.data.frame speedup

2008-07-01 Thread Tim Hesterberg
Below is a version of [.data.frame that is faster for subscripting rows of large data frames; it avoids calling duplicated(rows) if there is no need to check for duplicate row names, when: i is logical attr(x, "dup.row.names") is not NULL (S+ compatibility) i is numeric and negative

Re: [Rd] (PR#11537) help (using ?) does not handle trailing whitespace

2008-05-31 Thread Tim Hesterberg
By whitespace, I mean either a space or tab (preceding the newline). I'm using ESS: ess-version's value is "5.3.6" GNU Emacs 21.4.1 (i486-pc-linux-gnu, X toolkit, Xaw3d scroll bars) of 2007-08-28 on terranova, modified by Debian I have the following in my .emacs: (load "ess-5.3.6/lisp/ess-site")

Re: [Rd] Standard method for S4 object

2008-02-25 Thread Tim Hesterberg
Hi Oleg, If there as a class to inherit from, then my point about an S4 class requiring lots of methods is moot. I think it would come down then to whether one prefers flexibility (advantage S3) or a definite structure for use with C/C++ (advantage S4). Tim >well, I am not arguing that there ar

Re: [Rd] Standard method for S4 object

2008-02-25 Thread Tim Hesterberg
>Tim Hesterberg wrote: >> It depends on what the object is to be used for. >> >> If you want users to be able to operate with the object as if it >> were a normal vector, to do things like mean(x), cos(x), etc. >> then the list would be very long indeed; for exam

Re: [Rd] Standard method for S4 object

2008-02-25 Thread Tim Hesterberg
LUS), plus additional methods defined for inheriting classes. In cases like this you might prefer using an S3 class, using attributes rather than slots for auxiliary information, so that you don't need to write so many methods. Tim Hesterberg >I am defining a new class. Shortly, I wil

[Rd] Sampling with unequal probabilities

2008-02-07 Thread Tim Hesterberg
mVector <- (1:size - runif(1))/size * observation i is selected if cprob[i-1] < uniformVector[j] <= cprob[i] for any j In the case (size*max(prob) > 1), the number of times the observation is selected is the number of j's for which the inequalities hold. * the selected observat

[Rd] deprecate "freq" argument to hist

2008-01-23 Thread Tim Hesterberg
ut not (yet) "freq", including mean median ppoints tabulate. Other functions like lm have always had a weights argument. -- Tim Hesterberg Disclaimer - my own opinions, not Insightful's. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] S3 vs S4 for a simple package

2008-01-07 Thread Tim Hesterberg
realized that in some cases we wanted to add a "call" attribute or component/slot so that update() would work. If it had been an S3 object we could have done so, but as an S4 object we would have broken existing objects of the class. Tim Hesterberg Disclaimer - this is my personal opini

Re: [Rd] is(x, "parent") returns FALSE when class(x) is c("child", "parent") (PR#10549)

2008-01-07 Thread Tim Hesterberg
In S-PLUS, is() does catch parent S3 classes. It does not require a setOldClass definition to do so. I would prefer that R work the same way, to make porting code easier. I use is() in S-PLUS for both S3 and S4 classes because it is faster than inherits(). I use inherits() only for testing a ve

Re: [Rd] hasNA() / anyNA()?

2007-08-14 Thread Tim Hesterberg
e, and methods for data frames and other classes. The code below seems to presume a list, and would be very slow for vectors. For reasons of consistency between S-PLUS and R, I would ask that an R function be called anyMissing rather than hasNA or anyNA. Tim Hesterberg >is there a hasNA() /

Re: [Rd] comment causes browser() to exit (PR#9063)

2006-07-07 Thread Tim Hesterberg
ing thing I've found about using R. As I anticipate using R a lot in the future, I would appreciate very much if it is changed. I spent a fair amount of time trying to see if I could change it myself, but gave up. Tim Hesterberg Andy Liaw wrote: >If I'm not mistaken, this works as

Re: [Rd] Open .ssc .S ... files in R (PR#8690)

2006-03-17 Thread Tim Hesterberg
>>day 20 >>>svn rev 36812 >>>language R >> >> I responded: >>>You can open them in R. On Windows, File:Open Script, >>>change "Files of type" to "All Files", then open the .ssc file. >> >> So there is a workaroun