Re: [R] Need help on dataframe

Mario Bourgoin Sat, 05 Jan 2013 13:42:47 -0800

Given the data:
dat1<-read.table(header=TRUE,text="
ID  V1  V2  V3  V4
1    6    5    3    2
2    3    2    2    1
3    6    5    3    2
4    12  15  3    2
5    6    8    3    2
6    3    2    4    1
7    6    5    3    3
8    12  15  3    1
9    6    5    3    3
10    3    2    7    5
11    6    5    8    2
12    12  19  3    2
13    6    5    3    2
14    3    4    2    1
15    6    5    6    2
16    12  15  5    2
17    6    5    5    2
18    3    2    8    1
19    6    5    3    9
20    12  15  3    10
21    6    5    3    2
22    3    2    2    11
23    6    5    3    4
24    12  15  9    2
25    6    5    3    2
26    3    2    2    1
27   6    5    3    2
28    12  15  3    2
29    6    8    3    2
30    3    2    4    1
31    6    5    3    3
32    12  15  3    1
33    6    5    3    3
34    3    2    7    5
35    6    5    8    2
36    12  19  3    2
37    6    5    3    2
38    3    4    2    1"))


The following seems to be rather quick (0.394 ms per group of up to 12 rows
in 1201 rows):
do.call(rbind,by(dat1,floor((dat1$ID-1)/12),colMeans))
    ID       V1       V2       V3       V4
0  6.0 6.272727 6.272727 3.818182 2.181818
1 17.5 6.750000 7.250000 3.833333 4.000000
2 29.5 6.750000 7.000000 4.250000 2.166667
3 37.0 7.000000 9.333333 2.666667 1.666667

If you know the number of rows is a multiple of 12:
dat2<-read.table(header=TRUE,text="
ID  V1  V2  V3  V4
1    6    5    3    2
2    3    2    2    1
3    6    5    3    2
4    12  15  3    2
5    6    8    3    2
6    3    2    4    1
7    6    5    3    3
8    12  15  3    1
9    6    5    3    3
10    3    2    7    5
11    6    5    8    2
12    12  19  3    2
13    6    5    3    2
14    3    4    2    1
15    6    5    6    2
16    12  15  5    2
17    6    5    5    2
18    3    2    8    1
19    6    5    3    9
20    12  15  3    10
21    6    5    3    2
22    3    2    2    11
23    6    5    3    4
24    12  15  9    2
25    6    5    3    2
26    3    2    2    1
27   6    5    3    2
28    12  15  3    2
29    6    8    3    2
30    3    2    4    1
31    6    5    3    3
32    12  15  3    1
33    6    5    3    3
34    3    2    7    5
35    6    5    8    2
36    12  19  3    2"))

This is marginally better (0.378 ms per group of 12 rows in 1212 rows, or
4% less time):
do.call(rbind,by(dat2,rep(1:(NROW(dat2)/12),each=12),colMeans))
    ID   V1       V2       V3       V4
1  6.5 6.75 7.333333 3.750000 2.166667
2 18.5 6.75 6.916667 4.333333 4.000000
3 30.5 6.75 7.333333 3.750000 2.166667

Best,
Mario


On Sat, Jan 5, 2013 at 8:33 AM, Simonas Kecorius <simolas2...@gmail.com>wrote:

> Dear R users, I came up to a problem by taking means (or other summary
> statistics) of a big dataframe.
>
> Suppose we do have a dataframe:
>
> ID  V1  V2  V3  V4 ........................ V71
>  1    6     5    3     2  ........................  3
>  2    3     2    2     1  ........................  1
>  3    6     5    3     2  ........................  3
>  4    12   15  3     2  ........................  100
> ........................................................
> ........................................................
> 288 10  20  30   30 .......................... 499
>
> I need to find out the way, how to calculate a mean of every 12 lines to
> get:
>
> V1                              V2                V3                 V4
> ........................... V71
> mean from 1 to 7       same as V1    same as V1
> mean from 8 to 14     same as V1    same as V1
> etc.
>
> I can do it column by column using:
>
> y.ts <- ts(y$V1, frequency=12)
> aggregate(y.ts, FUN=mean)
>
> Bu this is a hardcore... Can anyone suggest a better way to compute all the
> dataframe at once and get a result as matrix?
>
> Thank you in advance!
>
> --
> Simonas Kecorius
> **
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need help on dataframe

Reply via email to