On 1/17/21 12:15 PM, Bernard McGarvey wrote:
I have a data frame that consists of several factor columns say A, B, C, D, and 
E and several columns containing numerical data, say X1, X2, .... X10. I would 
like to create statistics of some of the numerical columns by some of the 
factor columns. For example,

Calculate the mean, min, and max of variables X1 and X7, by factors A, and E. 
The results should look like the table below:

Factor A Factor E     mean(X1) min(x1) max(X1) mean(X7) min(x7) max(X7) 
mean(X10) min(x10) max(X10)
A1        E1
A1        E2
A1        E3
A2        E1
A2        E2
A2        E3

I would like the results to be returned to a data frame or other object that I 
can write out using the write.csv function. I have looked at the summarize and 
numSummary functions but they do not appear to be flexible enough to do the 
above.


The `aggregate` function will do the subsetting and function application.

> dfrm <- cbind(dfrm, matrix(rnorm(600), ncol=10 ) ); names(dfrm)[3:12] <- paste0("X", 1:10)
> str(dfrm)
'data.frame':    60 obs. of  12 variables:
 $ Factor_A: Factor w/ 2 levels "A1","A2": 1 1 1 2 2 2 1 1 1 2 ...
 $ Factor_B: Factor w/ 3 levels "E1","E2","E3": 1 2 3 1 2 3 1 2 3 1 ...
 $ X1      : num  -0.02116 -0.00049 0.12875 -0.05412 0.51886 ...
 $ X2      : num  1.6799 -0.0963 -0.5727 -0.3638 -0.322 ...
 $ X3      : num  -0.349 0.267 -0.666 -0.329 0.902 ...
 $ X4      : num  0.1125 -0.5384 0.0924 0.6849 -0.4194 ...
 $ X5      : num  -0.421 0.372 1.316 1.323 -0.03 ...
 $ X6      : num  -0.0767 1.4972 0.1967 -0.7092 -1.0943 ...
 $ X7      : num  0.1771 -0.2136 -1.0818 -0.0671 2.0015 ...
 $ X8      : num  1.456 -0.383 -0.47 0.965 0.569 ...
 $ X9      : num  -1.795 -0.4546 0.0069 1.2245 -0.395 ...
 $ X10     : num  -1.931 1.708 0.274 0.73 -0.995 ...



 aggregate(  dfrm[ ,  c("X1", "X7", "X10")],    # columns to analyze

                      dfrm[ c("Factor_A", "Factor_B")],  # classifying columns

                      FUN=function (x) c(mn =mean(x), min=min(x), max=max(x) ) )  # desired "summarizers"

#--- result----

  Factor_A Factor_B        X1.mn       X1.min       X1.max X7.mn      X7.min      X7.max 1       A1       E1  0.187513792 -0.866094155  2.310960164 0.22489729 -0.91442493  1.94095786 2       A2       E1  0.078361707 -1.515410191  1.382420050 -0.51309155 -1.67026123  0.70869034 3       A1       E2 -0.267416858 -1.995131138  1.392115793 -0.04772929 -2.45426692  2.02225946 4       A2       E2 -0.069807208 -0.703073589  1.879448658 -0.37770923 -2.66221239  2.00152154 5       A1       E3 -0.007800886 -1.297561250  1.216627848 -0.30395411 -1.08181218  1.09764895 6       A2       E3 -0.054466856 -1.577891927  1.674719118 0.35594015 -1.20865279  2.25765422
      X10.mn    X10.min    X10.max
1 -0.3458888 -2.0312811  1.1483179
2 -0.1021727 -1.3230372  0.8045472
3  0.3514645 -3.2334010  1.7075298
4 -0.4988984 -2.1091311  0.5857192
5  0.2297461 -1.1336967  0.8483935

6  0.3700621 -1.5609424  2.2792024


--

David


Any help would be appreciated,

Thanks

Bernard McGarvey
Director, Fort Myers Beach Lions Foundation, Inc.
Retired (Lilly Engineering Fellow).

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to