> -----Original Message----- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Thomas Lumley > Sent: Thursday, September 17, 2009 6:59 AM > To: William Revelle > Cc: r-h...@stat.math.ethz.ch > Subject: Re: [R] Fastest Way to Divide Elements of Row With Its RowSum > > On Thu, 17 Sep 2009, William Revelle wrote: > > > At 2:40 PM +0900 9/17/09, Gundala Viswanath wrote: > >> I have a data frame (dat). What I want to do is for each row, > >> divide each row with the sum of its row. > >> > >> The number of row can be large > 1million. > >> Is there a faster way than doing it this way? > >> > >> datnorm; > >> for (rw in 1:length(dat)) { > >> tmp <- dat[rw,]/sum(dat[rw,]) > >> datnorm <- rbind(datnorm, tmp); > >> } > >> > >> > >> - G.V. > > > > > > datnorm <- dat/rowSums(dat) > > > > this will be faster if dat is a matrix rather than a data.frame. > > > > Even if it's a data frame and he needs a data frame answer it > might be faster to do > mat<-as.matrix(dat) > matnorm<-mat/rowSums(mat) > datnorm<-as.data.frame(dat)
If the data.frame has many more rows than columns and the number of rows is large (e.g., dimensions 10^6 x 20) you may find that you run out of space converting it to a matrix. You can use much less space by looping over the columns, both to compute the row sums and to do the division. E.g., the following should require only 1 (maybe 2) column's worth of scratch space: f2 <- function(x){ stopifnot(is.data.frame(x), ncol(x)>=1) rowsum <- x[[1]] if(ncol(x)>1) for(i in 2:ncol(x)) rowsum <- rowsum + x[[i]] for(i in 1:ncol(x)) x[[i]] <- x[[i]] / rowsum x } For a 10^6 by 20 all numeric data.frame this runs in 13 seconds on my machine but things like x/rowSums(x) run out of memory. When working with data.frames it generally pays to think a column at a time instead of a row at a time. Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > The other advantage, apart from speed, of doing it with > dat/rowSums(dat) rather than the loop is he gets the right > answer. The loop goes from 1 to the number of columns if dat > is a data frame and 1 to the number of entries if dat is a > matrix, not from 1 to the number of rows. > > -thomas > > Thomas Lumley Assoc. Professor, Biostatistics > tlum...@u.washington.edu University of Washington, Seattle > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.