try the 'data.table' package.  Takes about 0.1 seconds to normalize the data.

> x <- data.frame(id = sample(10000, 100000, TRUE), value = runif(100000))
> require(data.table)
Loading required package: data.table
data.table 1.8.2  For help type: help("data.table")
> system.time({
+     x <- data.table(x)
+     newX <- x[
+         , list(value = value  # keep original value
+             , normValue = value / sum(value)
+             )
+         , by = id
+         ]
+ })
   user  system elapsed
   0.03    0.01    0.11
>
> head(newX, 20)
      id     value   normValue
 1: 8094 0.6805425 0.101140797
 2: 8094 0.3154233 0.046877543
 3: 8094 0.8998646 0.133735993
 4: 8094 0.8858863 0.131658564
 5: 8094 0.1859526 0.027635892
 6: 8094 0.4694456 0.069768023
 7: 8094 0.9302886 0.138257544
 8: 8094 0.7482040 0.111196505
 9: 8094 0.9052426 0.134535255
10: 8094 0.4650028 0.069107739
11: 8094 0.2428116 0.036086145
12: 6287 0.1979209 0.037505820
13: 6287 0.5117723 0.096980353
14: 6287 0.6425769 0.121767688
15: 6287 0.0397795 0.007538177
16: 6287 0.1255722 0.023795811
17: 6287 0.5606742 0.106247214
18: 6287 0.4818579 0.091311594
19: 6287 0.3913614 0.074162596
20: 6287 0.4622984 0.087605098
>


On Thu, Nov 29, 2012 at 1:55 PM, Noah Silverman <[email protected]> wrote:
> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered per group.
>
> The values are in one column, the group ID is an integer in the second column.
>
> I want to normalize the values by group:
>
> for(g in unique(groups){
>         x[group==g] / sum(x[group==g])
> }
>
> This works find in a loop, but is slow.  Is there a faster way to do this?
>
> Thanks!
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to