This seems to work. A couple of fine points, including handling duplicated Pct
values right, which is easier if you do the reversed cumsum.
> dd2 <- dummydata[order(dummydata$Pct),]
> dd2$Cum <- rev(cumsum(rev(dd2$Totpop)))
> use <- !duplicated(dd2$Pct)
> approx(dd2$Pct[use], dd2$Cum[use], ctof,
Sorry, misstatements. It should (of course) read:
If one makes the reasonable assumption that Pct is much larger than
Cutoff, sorting Pct is the expensive part e.g O(nlog2(n) for
Quicksort (n = length Pct). I believe looping is O(n^2).
etc.
On Mon, Oct 16, 2023 at 7:48 AM Bert Gunter wrote:
>
>
If one makes the reasonable assumption that Pct is much larger than
Cutoff, sorting Cutoff is the expensive part e.g O(nlog2(n) for
Quicksort (n = length Cutoff). I believe looping is O(n^2). Jeff's
approach using findInterval may be faster. Of course implementation
details matter.
-- Bert
On Mo
Dear Jason,
The code could look something like:
dummyData = data.frame(Tract=seq(1, 10, by=1),
Pct = c(0.05,0.03,0.01,0.12,0.21,0.04,0.07,0.09,0.06,0.03),
Totpop = c(4000,3500,4500,4100,3900,4250,5100,4700,4950,4800))
# Define the cutoffs
# - allow for duplicate entries;
by = 0.03; # by
Dear Jason,
I do not think that the solution based on aggregate offered by GPT was
correct. That quasi-solution only aggregates for every individual level.
As I understand, you want the cumulative sum. The idea was proposed by
Bert; you need only to sort first based on the cutoff (e.g. usin
After I sent this, a colleague referred me to the GPT-4 interface on Bing. I
entered the exact email query below and it provided the following solution,
which worked for the toy example and was successfully adapted to my application:
# Define the cutoffs
cutoffs <- seq(0, 0.15, by = 0.01)
# Cr
Pre-compute the per-interval answers and use findInterval to look up the
per-row answers...
dat <- read.table( text=
"Tract Pct Totpop
1 0.054000
2 0.033500
3 0.014500
4 0.124100
5 0.
Well, here's one way to do it:
(dat is your example data frame)
Cutoff <- seq(0, .15, .01)
Pop <- with(dat, sapply(Cutoff, \(p)sum(Totpop[Pct >= p])))
I think there must be a more efficient way to do it with cumsum(), though.
Cheers,
Bert
On Sat, Oct 14, 2023 at 12:53 AM Jason Stout, M.D. wrot
8 matches
Mail list logo