I wrote : > (some may return vectors, others may return vectors) Its been pointed out there was a typo, and wasn't very clear anyway. It should read '(some may return vectors, others may return scalars)'. I've been asked for further explanation so here goes ...
The point I was trying to make is that the following expression is very natural to write. It takes a bit of getting used to though. A reminder of the 2 column Dataset (containing a group of 4 rows and a group of 3 rows) then the R expression and then the output : LEAID ratio 6307 0.7200000 6307 0.7623810 6307 0.8600000 6307 0.9200000 8300 0.5678462 8300 0.7700000 8300 0.8300000 the syntax : Dataset = data.table(Dataset) Dataset[,DT(ratio,scaled=abs(ratio-median(ratio)),sum=sum(ratio)),by="LEAID"] and the 4 column output : LEAID ratio scaled sum 6307 0.7200000 0.0911905 3.262381 6307 0.7623810 0.0488095 3.262381 6307 0.8600000 0.0488095 3.262381 6307 0.9200000 0.1088095 3.262381 8300 0.5678462 0.2021538 2.167846 8300 0.7700000 0.0000000 2.167846 8300 0.8300000 0.0600000 2.167846 The 2nd argument (the call to DT()) contains 3 expressions, which are executed for each subset of the Dataset grouped by LEAID. The row order is maintained for each subset, and these expressions operate on ordered vectors as usual in R. We can use column names as variable names directly (like an implicit ?with). Note that Dataset doesn't have to be ordered by LEAID, but it just happens to be in this example. A comment on each of the 3 expressions (the 3 arguments passed to DT() above) is perhaps useful : ratio : just repeats the ratio vector as is. You don't have to include this but I wanted to keep the input data in the output to demonstrate. abs(ratio-median(ratio)) : median() returns a scalar, subtracted from each element from ratio, and returns a vector. abs() takes a vector, and returns a vector. Standard R and basic stuff. Any R expresssion can be used, so its more powerful than SQL in thats sense because SQL is restricted to a small set of functions (avg, min, max, etc), which has been said before and been true about R for a long time. Its the overall syntax of the single 'query' that I'm trying to demonstrate. sum(ratio) : returns a scalar aggregate on the vector input. Thats what I meant by "others may return scalars". Notice the the value of sum(ratio) is repeated in the final column of the output. The reason is because at least one of the other expressions return vectors, and standard R silent repetition rules are coming into play inside DT(). Then the 2 data.table's (one for each of the 2 groups) are combined and a single data.table is returned. Very similar to SQL really and some other ways to aggregate in R, but more compact, more natural, easier and more convenient (and therefore quicker) to write, debug and maintain. "Matthew Dowle" <mdo...@mdowle.plus.com> wrote in message news:hgnjev$3h...@ger.gmane.org... > or if Dataset is a data.table : > >> Dataset = data.table(Dataset) >> Dataset[,abs(ratio-median(ratio)),by="LEAID"] > LEAID V1 > [1,] 6307 0.0911905 > [2,] 6307 0.0488095 > [3,] 6307 0.0488095 > [4,] 6307 0.1088095 > [5,] 8300 0.2021538 > [6,] 8300 0.0000000 > [7,] 8300 0.0600000 > rather than : >> Dataset$abs <- with(Dataset, ave(ratio, LEAID, >> FUN=function(x)abs(x-median(x)))) > > This is less code and more natural (to me anyway) e.g. it doesn't require > use of function() or ave(). data.table knows that if the j expression > returns a vector it should silently repeat the groups to match the length > of the j result (which it is doing here). If the j expression returns a > scalar you would just get 2 rows in this example. Note that the 'by' > expression must evaluation to integer, or a list of integer vectors, so > in this case LEAID must either be integer already or coerced to integer > using by="as.integer(LEAID)". > > To give the aggregate expression a name, just wrap with the DT function. > This is also how to return multiple aggregate functions from each subset > (some may return vectors, others may return vectors) by listing them > inside DT() : > >> Dataset[,DT(ratio,scaled=abs(ratio-median(ratio)),sum=sum(ratio)),by="LEAID"] > LEAID ratio scaled sum > [1,] 6307 0.7200000 0.0911905 3.262381 > [2,] 6307 0.7623810 0.0488095 3.262381 > [3,] 6307 0.8600000 0.0488095 3.262381 > [4,] 6307 0.9200000 0.1088095 3.262381 > [5,] 8300 0.5678462 0.2021538 2.167846 > [6,] 8300 0.7700000 0.0000000 2.167846 > [7,] 8300 0.8300000 0.0600000 2.167846 > > > "William Dunlap" <wdun...@tibco.com> wrote in message > news:77eb52c6dd32ba4d87471dcd70c8d7000243c...@na-pa-vbe03.na.tibco.com... >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] On Behalf Of L.A. >> Sent: Saturday, December 12, 2009 12:39 PM >> To: r-help@r-project.org >> Subject: Re: [R] by function ?? >> >> >> >> Thanks for all the help, They all worked, But I'm stuck again. >> I've tried searching, but I not sure how to word my search as >> nothing came >> up. >> Here is my new hurdle, my data has 7 abservations and my >> results have 2 >> answers: >> >> >> Here is my data >> >> LEAID ratio >> 3 6307 0.7200000 >> 1 6307 0.7623810 >> 2 6307 0.8600000 >> 4 6307 0.9200000 >> 5 8300 0.5678462 >> 7 8300 0.7700000 >> 6 8300 0.8300000 >> >> >> > median<-summaryBy(ratio ~ LEAID, data = Dataset, FUN = median) >> >> > print(median) >> LEAID ratio.median >> 1 6307 0.8111905 >> 2 8300 0.7700000 >> >> Now what I want is a way to compute >> abs(ratio- median)by LEAID for each observation to produce >> something like >> this >> >> LEAID ratio abs >> 3 6307 0.7200000 .0912 >> 1 6307 0.7623810 .0488 >> 2 6307 0.8600000 .0488 >> 4 6307 0.9200000 .1088 >> 5 8300 0.5678462 .2022 >> 7 8300 0.7700000 .0000 >> 6 8300 0.8300000 .0600 > > Try ave(), as in > > Dataset$abs <- with(Dataset, ave(ratio, LEAID, > FUN=function(x)abs(x-median(x)))) > > Dataset > LEAID ratio abs > 3 6307 0.7200000 0.0911905 > 1 6307 0.7623810 0.0488095 > 2 6307 0.8600000 0.0488095 > 4 6307 0.9200000 0.1088095 > 5 8300 0.5678462 0.2021538 > 7 8300 0.7700000 0.0000000 > 6 8300 0.8300000 0.0600000 > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com > >> >> Thanks, >> L.A. >> >> >> >> >> Ista Zahn wrote: >> > >> > Hi, >> > I think you want >> > >> > by(TestData[ , "RATIO"], LEAID, median) >> > >> > -Ista >> > >> > On Tue, Dec 8, 2009 at 8:36 PM, L.A. <ro...@millect.com> wrote: >> >> >> >> I'm just learning and this is probably very simple, but I'm stuck. >> >> I'm trying to understand the by(). >> >> This works. >> >> by(TestData, LEAID, summary) >> >> >> >> But, This doesn't. >> >> >> >> by(TestData, LEAID, median(RATIO)) >> >> >> >> >> >> ERROR: could not find function "FUN" >> >> >> >> HELP! >> >> Thanks, >> >> LA >> >> -- >> >> View this message in context: >> >> http://n4.nabble.com/by-function-tp955789p955789.html >> >> Sent from the R help mailing list archive at Nabble.com. >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> >> >> > >> > >> > >> > -- >> > Ista Zahn >> > Graduate student >> > University of Rochester >> > Department of Clinical and Social Psychology >> > http://yourpsyche.org >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> -- >> View this message in context: >> http://n4.nabble.com/by-function-tp955789p962666.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.