Re: [R] aggregate function - na.action

2011-02-07 Thread Matthew Dowle
Hadley, That's fine; please do. I'm happy to explain it offline where the documentation or comments in the code aren't sufficient. It's GPL code so you can take it and improve it, or depend on it. Whatever works for you. As long as (of course) you don't stand on it's shoulders and then restric

Re: [R] aggregate function - na.action/ performance issues re structs and algorithms

2011-02-07 Thread Mike Marchywka
> From: had...@rice.edu > Date: Mon, 7 Feb 2011 11:00:59 -0600 > To: mdo...@mdowle.plus.com > CC: r-h...@stat.math.ethz.ch > Subject: Re: [R] aggregate function - na.action > > > Does FAQ 1.8 answer that ok ? > >

Re: [R] aggregate function - na.action

2011-02-07 Thread Hadley Wickham
> Does FAQ 1.8 answer that ok ? >   "Ok, I'm starting to see what data.table is about, but why didn't you > enhance data.frame in R? Why does it have to be a new package?" >   http://datatable.r-forge.r-project.org/datatable-faq.pdf Kind of. I think there are two sets of features data.table provi

Re: [R] aggregate function - na.action

2011-02-07 Thread Matthew Dowle
Hi Hadley, Does FAQ 1.8 answer that ok ? "Ok, I'm starting to see what data.table is about, but why didn't you enhance data.frame in R? Why does it have to be a new package?" http://datatable.r-forge.r-project.org/datatable-faq.pdf Matthew "Hadley Wickham" wrote in message news:AANLkT

Re: [R] aggregate function - na.action

2011-02-07 Thread Hadley Wickham
On Mon, Feb 7, 2011 at 5:54 AM, Matthew Dowle wrote: > Looking at the timings by each stage may help : > >>   system.time(dt <- data.table(dat)) >   user  system elapsed >   1.20    0.28    1.48 >>   system.time(setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8))   # sort by the >> 8 columns (one-off) >  

Re: [R] aggregate function - na.action

2011-02-07 Thread Matthew Dowle
Looking at the timings by each stage may help : > system.time(dt <- data.table(dat)) user system elapsed 1.200.281.48 > system.time(setkey(dt, x1, x2, x3, x4, x5, x6, x7, x8)) # sort by the > 8 columns (one-off) user system elapsed 4.720.945.67 > system.time(

Re: [R] aggregate function - na.action

2011-02-06 Thread David Winsemius
On Feb 6, 2011, at 7:41 PM, Hadley Wickham wrote: There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data

Re: [R] aggregate function - na.action

2011-02-06 Thread Hadley Wickham
> There's definitely something amiss with aggregate() here since similar > functions from other packages can reproduce your 'control' sum. I expect > ddply() will have some timing issues because of all the subgrouping in your > data frame, but data.table did very well and the summaryBy() function i

Re: [R] aggregate function - na.action

2011-02-06 Thread Dennis Murphy
Hi: There's definitely something amiss with aggregate() here since similar functions from other packages can reproduce your 'control' sum. I expect ddply() will have some timing issues because of all the subgrouping in your data frame, but data.table did very well and the summaryBy() function in t

Re: [R] aggregate function - na.action

2011-02-06 Thread jim holtman
Try 'data.table' package. It took 3 seconds to aggregate the 500K levels: Is this what you were after? > # note the characters are converted to factors that 'data.table' likes > dat=data.frame( +x1=sample(c(NA,'m','f'), 2e6, replace=TRUE), +x2=sample(c(NA, 1:10), 2e6, replace=TRU

Re: [R] aggregate function - na.action

2011-02-06 Thread Gene Leynes
By the way, thanks for sending that formula, it's quite thoughtful of you to send an answer with an actual working line of code! When I experimented with ddply earlier last week I couldn't figure out the syntax for a single line aggregation, so it's good to have this example. I will likely use it

Re: [R] aggregate function - na.action

2011-02-06 Thread Denis Kazakiewicz
Try to use formula notation and use na.action=na.pass It is all described in the help(aggregate) У Няд, 06/02/2011 у 14:54 -0600, Gene Leynes піша: > On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn wrote: > > > > > > > However, I don't think you've told us what you're actually trying to > > > accompl

Re: [R] aggregate function - na.action

2011-02-06 Thread Gene Leynes
On Fri, Feb 4, 2011 at 6:54 PM, Ista Zahn wrote: > > > > However, I don't think you've told us what you're actually trying to > > accomplish... > > > I'm trying to aggregate the y value of a big data set which has several x's and a y. I'm using an abstracted example for many reasons. Partially,

Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Ista, Thank you again. I had figured that out... and was crafting another message when you replied. The NAs do come though on the variable that is being aggregated, However, they do not come through on the categorical variable(s). The aggregate function must be converting the data frame variabl

Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Just to be clear: This works: > set.seed(100) > dat=data.frame( + x1=sample(c(NA,'m','f'), 100, replace=TRUE), + x2=sample(c(NA, 1:10), 100, replace=TRUE), + x3=sample(c(NA,letters[1:5]), 100, replace=TRUE), + x4=sample(c(NA,T,F), 100, replace=TRUE), + y=sam

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
oops. For clarity, that should have been sum(ddply(dat, .(x1,x2,x3,x4), function(x){data.frame(y.sum=sum(x$y, na.rm=TRUE))})$y.sum) -Ista On Fri, Feb 4, 2011 at 7:52 PM, Ista Zahn wrote: > Hi again, > > On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes wrote: >> Ista, >> >> Thank you again. >> >> I

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi again, On Fri, Feb 4, 2011 at 7:18 PM, Gene Leynes wrote: > Ista, > > Thank you again. > > I had figured that out... and was crafting another message when you replied. > > The NAs do come though on the variable that is being aggregated, > However, they do not come through on the categorical va

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi, On Fri, Feb 4, 2011 at 6:33 PM, Gene Leynes wrote: > Thank you both for the thoughtful (and funny) replies. > > I agree with both of you that sum is the one picking up aggregate.  Although > I didn't mention it, I did realize that in the first place. > Also, thank you Phil for pointing out th

Re: [R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Thank you both for the thoughtful (and funny) replies. I agree with both of you that sum is the one picking up aggregate. Although I didn't mention it, I did realize that in the first place. Also, thank you Phil for pointing out that aggregate only accepts a formula value in more recent versions!

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Sorry, I didn't see Phil's reply, which is better than mine anyway. -Ista On Fri, Feb 4, 2011 at 5:16 PM, Ista Zahn wrote: > Hi, > > Please see ?na.action > > (just kidding!) > > So it seems to me the problem is that you are passing na.rm to the sum > function. So there is no missing data for th

Re: [R] aggregate function - na.action

2011-02-04 Thread Ista Zahn
Hi, Please see ?na.action (just kidding!) So it seems to me the problem is that you are passing na.rm to the sum function. So there is no missing data for the na.action argument to operate on! Compare sum(aggregate(y~x1+x2+x3+x4, data=dat, sum, na.action=na.fail)$y) sum(aggregate(y~x1+x2+x3+x4

Re: [R] aggregate function - na.action

2011-02-04 Thread Phil Spector
Gene - Let me try to address your concerns one at a time: Since the formula interface to aggregate was introduced pretty recently (I think R-2.11.1, but I might be wrong) so when you try to use it in an R-2.10.1 it won't work. Now let's take a close look at the help page for aggregate. The

[R] aggregate function - na.action

2011-02-04 Thread Gene Leynes
Can someone please tell me what is up with na.action in aggregate? My (somewhat) reproducible example: (I say somewhat because some lines wouldn't run in a separate session, more below) set.seed(100) dat=data.frame( x1=sample(c(NA,'m','f'), 100, replace=TRUE), x2=sample(c(NA, 1:10