On 29-11-2012, at 19:55, Noah Silverman wrote:
> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about 10 entered per group.
>
> The values are in one column, the group ID is an integer in the second column.
>
> I want to normali
Close, but not quite what I need.
That very nicely gives me sums by group.
I need to take each value of X and divide it by the sum of the group it belongs
to.
With your example, I have 100,000 X and only 10,000 group. The "by" command
gives me 10,000 sums. I still have to loop over all 100,0
try the 'data.table' package. Takes about 0.1 seconds to normalize the data.
> x <- data.frame(id = sample(1, 10, TRUE), value = runif(10))
> require(data.table)
Loading required package: data.table
data.table 1.8.2 For help type: help("data.table")
> system.time({
+ x <- data.ta
Hello,
If yopu want one value per group use tapply(), if you want one value per
value of x use ave()
tapply(x, group, FUN = function(.x) .x/sum(.x))
ave(x, group, FUN = function(.x) .x/sum(.x))
Hope this helps,
Rui Barradas
Em 29-11-2012 18:55, Noah Silverman escreveu:
Hi,
I have a very l
Yes, type in:
?by
for example:
data <- data.frame(fac=factor(c("A","A","B","B")), vec=c(1:4) )
by(data$vec,data$fac, FUN=sum)
Best,
MikoÅaj Hnatiuk
2012/11/29 Noah Silverman
> Hi,
>
> I have a very large data set (aprox. 100,000 rows.)
>
> The data comes from around 10,000 "groups" with about
Not tested but should work:
sums = tapply(x, group, sum);
sums.ext = sums[ match(group, names(sums))]
normalized = x/sums.ext
It may be that the tapply is just as slow as your loop though, I'm not sure.
HTH,
Peter
On Thu, Nov 29, 2012 at 10:55 AM, Noah Silverman wrote:
> Hi,
>
> I have a ver
6 matches
Mail list logo