You could use S+. Its median function has a weights argument. E.g., > median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8)) [1] 3 > median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8+10)) [1] 40000 > median(c(1,2,3,4e4), weights=c(1e8,1e8,1,2e8+1)) [1] 20001.5
Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Satsangi, > Vivek (GE Capital) > Sent: Wednesday, November 18, 2009 1:55 PM > To: r-help@r-project.org > Subject: [R] Median on Aggregated data > > Folks, > > I have the following code, that works fine on smaller data sets. For > larger datasets, it runs out of memory and runs way too slow > because we > are essentially creating large vectors with rep() and then calling > median() on it. (I learned this approach from a post on the web). > > Below that, I have written the corresponding SAS code. The SAS code > works fast because I can just tell the proc summary (by the weights > option) that the Counts variable is a frequency. > > So, the question is, is there a simple way to do the same > thing in R? I > have to run this on a large dataset -- for a small set it is not a > problem. > > > ---------------------- Begin R code > ------------------------------------ > N <- 1005 * 14; > myNorm <- data.frame(PaydexNormingCategory = numeric(N), > SIC = numeric(N), CatMedian = numeric(N)); > > k=1; > #j = 7941; ## For testing only > for (j in levels(SIC)){ > for (i in levels(PaydexNormingCategory)){ > myData <- dfpaydex[(Paydex==i) & (SIC==j),]; > myMedian <- with(myData, > levels(Paydex)[median(rep(as.numeric(Paydex), > Counts))]); > myNorm[k] <-c( as.numeric(i), as.numeric(j), as.numeric(myMedian) ); > k <- k+1; > } > } > > ---------------------- Begin SAS code > ------------------------------------ > > proc summary data=SASUser.PaydexNormfull nway; > > class PaydexNormingCategory SIC ; > weight Counts; > var Paydex; > > output out=outstat (drop=_type_ _freq_) > median= / autoname; > run; > > ---------------------- End SAS code > ------------------------------------ > > Thanks for your guidance! > > > Vivek Satsangi > GE Capital > Americas > > GE imagination at work > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.