Pre-compute the per-interval answers and use findInterval to look up the per-row answers...
dat <- read.table( text= "Tract Pct Totpop 1 0.05 4000 2 0.03 3500 3 0.01 4500 4 0.12 4100 5 0.21 3900 6 0.04 4250 7 0.07 5100 8 0.09 4700 9 0.06 4950 10 0.03 4800 ", header=TRUE ) dat2 <- aggregate( Totpop ~ Pct, dat, FUN = sum ) dat2$TotpopSum <- rev( cumsum( rev( dat2$Totpop ) ) ) Cutoff <- seq( 0, .15, .01 ) ans <- data.frame( Cutoff = Cutoff , Pop = dat2$TotpopSum[ findInterval( Cutoff , c( -Inf, dat2$Pct ) , left.open = TRUE ) ] ) ans On October 14, 2023 8:10:56 AM PDT, Bert Gunter <bgunter.4...@gmail.com> wrote: >Well, here's one way to do it: >(dat is your example data frame) > >Cutoff <- seq(0, .15, .01) >Pop <- with(dat, sapply(Cutoff, \(p)sum(Totpop[Pct >= p]))) > >I think there must be a more efficient way to do it with cumsum(), though. > >Cheers, >Bert > >On Sat, Oct 14, 2023 at 12:53 AM Jason Stout, M.D. <jason.st...@duke.edu> >wrote: >> >> This seems like it should be simple but I can't get it to work properly. >> I'm starting with a data frame like this: >> >> Tract Pct Totpop >> 1 0.05 4000 >> 2 0.03 3500 >> 3 0.01 4500 >> 4 0.12 4100 >> 5 0.21 3900 >> 6 0.04 4250 >> 7 0.07 5100 >> 8 0.09 4700 >> 9 0.06 4950 >> 10 0.03 4800 >> >> And I want to end up with a data frame with two columns, a "Cutoff" column >> that is a simple sequence of equally spaced cutoffs (let's say in this case >> from 0-0.15 by 0.01) and a "Pop" column which equals the sum of "Totpop" in >> the prior data frame in which "Pct" is greater than or equal to "cutoff." >> So in this toy example, this is what I want for a result: >> >> Cutoff Pop >> 1 0.00 43800 >> 2 0.01 43800 >> 3 0.02 39300 >> 4 0.03 39300 >> 5 0.04 31000 >> 6 0.05 26750 >> 7 0.06 22750 >> 8 0.07 17800 >> 9 0.08 12700 >> 10 0.09 12700 >> 11 0.10 8000 >> 12 0.11 8000 >> 13 0.12 8000 >> 14 0.13 3900 >> 15 0.14 3900 >> 16 0.15 3900 >> >> I can do this with a for loop but it seems there should be an easier, >> vectorized way that would be more efficient. Here is a reproducible example: >> >> dummydata<-data.frame(Tract=seq(1,10,by=1),Pct=c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),Totpop=c(4000,3500,4500,4100, >> >> 3900,4250,5100,4700, >> >> 4950,4800)) >> dfrm<-data.frame(matrix(ncol=2,nrow=0,dimnames=list(NULL,c("Cutoff","Pop")))) >> for (i in seq(0,0.15,by=0.01)) { >> temp<-sum(dummydata[dummydata$Pct>=i,"Totpop"]) >> dfrm[nrow(dfrm)+1,]<-c(i,temp) >> } >> >> Jason Stout, MD, MHS >> Division of Infectious Diseases >> Dept of Medicine >> Duke University >> Box 102359-DUMC >> Durham, NC 27710 >> FAX 919-681-7494 >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.