Well, if you know the column subscripts you need, just forget about the names!
I would just write a (one-liner) function to do it for any data frame: myfun <- function(dat)tapply(dat[,3], dat[,1:2], sum) ## dat[,1:2] is a list because it's a data frame and all data frames are lists myfun(data_original) ## then gives the result ## you could even make the indices arguments of the myfun if they change in different ## data sets -- Bert On Fri, Oct 21, 2022 at 5:14 PM Kelly Thompson <[email protected]> wrote: > > Bert, > Thanks! I'm pretty sure what you provided gets me to what I was > looking for, and is much simpler. I really appreciate your help. > > A follow-up question: > I adjusted the code to not use "hard-coded" column names. > > mat2 <- with(data_original, tapply( get(names(data_original)[3]), > list( get(names(data_original)[1]), get(names(data_original)[2])), sum > )) > > Is there any better way to write that? > > Thanks again! > ----- > > For clarity and to improve upon what I previously wrote, and so I can > practice writing questions like this and asking for help, here's a > recap of my question and "reproducible code", and the "better way" you > provided: > > I have data presented in a 3-column data frame as shown below in > "data_original". > > I want to aggregate the data in column 3, with the "by" argument using > the first and second columns of "data_original". > > I want the results of the aggregation in a matrix, as shown below in "mat1". > > As my end "result", I want a matrix with one row for each unique value > of column1 of data_original and one column for each unique value of > column2 of data_original. > > What I show below seems like one way this can be done. > > My question: Are there easier or better ways to do this, especially in > Base R, and also in R packages? > > > #create data > set.seed(1) > data_original <- data.frame(year = rep(1990:1999, length = 50), > category = sample(1:5, size = 50, replace = TRUE), sales = > sample(0:99999, size = 50 , replace = TRUE) ) > dim(data_original) > > #remove rows where data_original[,1] == 1990 & data_original[,2] == 5, > to ensure there is at least one NA in the desired matrix (this is an > "edge" case I want the code to "deal with" correctly.) > data_original <- data_original[ (data_original[,1] == 1990 & > data_original[,2] == 5) == FALSE, ] > dim(data_original) > > #aggregate data > data_aggregate_col3_by_col1_and_col2 <- aggregate(x = > data_original[3], by = list(data_original[,1], data_original[,2]), FUN > = sum) > colnames(data_aggregate_col3_by_col1_and_col2) <- colnames(data_original) > dim(data_aggregate_col3_by_col1_and_col2) > > data_expanded <- > expand.grid(unique(data_aggregate_col3_by_col1_and_col2[,1]), > unique(data_aggregate_col3_by_col1_and_col2[,2])) > colnames(data_expanded) <- colnames(data_aggregate_col3_by_col1_and_col2)[1:2] > dim(data_expanded) > > data_expanded <- merge(data_expanded, > data_aggregate_col3_by_col1_and_col2, all = TRUE) > dim(data_expanded) > > mat1 <- matrix(data = data_expanded[,3], nrow = > length(unique(data_expanded[,1])), ncol = > length(unique(data_expanded[,2])) , byrow = TRUE, dimnames = list( > unique(data_expanded[,1]), unique(data_expanded[,2]) ) ) > > #this is an easier way, using with and tapply > mat2 <- with(data_original, tapply( get(names(data_original)[3]), > list( get(names(data_original)[1]), get(names(data_original)[2])), sum > )) > #check that mat1 and mat 2 are "nearly equal" > all.equal(mat1, mat2) > > > > Gunter <[email protected]> wrote: > > > > "As my end result, I want a matrix or data frame, with one row for each > > year, and one column for each category." > > > > If I understand you correctly, no reshaping gymnastics are needed -- > > just use ?tapply: > > > > set.seed(1) > > do <- data.frame(year = rep(1990:1999, length = 50), > > category = sample(1:5, size = 50, replace = TRUE), > > sales = sample(0:99999, size = 50 , replace = TRUE) ) > > > > > > with(do, tapply(sales, list(year, category),sum)) > > ## which gives the matrix: > > > > 1 2 3 4 5 > > 1990 13283 NA 55083 87522 64877 > > 1991 NA 80963 NA 30100 28277 > > 1992 9391 202916 NA 55090 NA > > 1993 29696 167344 NA NA 17625 > > 1994 98015 99521 NA 70536 52252 > > 1995 157003 NA 26875 NA 11366 > > 1996 32986 88683 6562 79475 95282 > > 1997 13601 NA 134757 12398 NA > > 1998 30537 51117 31333 20204 NA > > 1999 39240 87845 62479 NA 98804 > > > > If this is not what you wanted, you may need to explain further or > > await a response from someone more insightful than I. > > > > Cheers, > > Bert > > > > > > On Fri, Oct 21, 2022 at 3:34 PM Kelly Thompson <[email protected]> wrote: > > > > > > As my end result, I want a matrix or data frame, with one row for each > > > year, and one column for each category. > > > > > > On Fri, Oct 21, 2022 at 6:23 PM Kelly Thompson <[email protected]> > > > wrote: > > > > > > > > # I think this might be a better example. > > > > > > > > # I have data presented in a "vertical" dataframe as shown below in > > > > data_original. > > > > # I want this data in a matrix or "grid", as shown below. > > > > # What I show below seems like one way this can be done. > > > > > > > > # My question: Are there easier or better ways to do this, especially > > > > in Base R, and also in R packages? > > > > > > > > #create data > > > > set.seed(1) > > > > data_original <- data.frame(year = rep(1990:1999, length = 50), > > > > category = sample(1:5, size = 50, replace = TRUE), sales = > > > > sample(0:99999, size = 50 , replace = TRUE) ) > > > > dim(data_original) > > > > > > > > #remove rows where data_original$year == 1990 & data_original$category > > > > == 5, to ensure there is at least one NA in the "grid" > > > > data_original <- data_original[ (data_original$year == 1990 & > > > > data_original$category == 5) == FALSE, ] > > > > dim(data_original) > > > > > > > > #aggregate data > > > > data_aggregate_sum_by_year_and_category <- aggregate(x = > > > > data_original$sales, by = list(year = data_original$year, category = > > > > data_original$category), FUN = sum) > > > > colnames(data_aggregate_sum_by_year_and_category) <- c('year', > > > > 'category', 'sum_of_sales') > > > > dim(data_aggregate_sum_by_year_and_category) > > > > > > > > data_expanded <- expand.grid(year = > > > > unique(data_aggregate_sum_by_year_and_category$year), category = > > > > unique(data_aggregate_sum_by_year_and_category$category)) > > > > dim(data_expanded) > > > > data_expanded <- merge(data_expanded, > > > > data_aggregate_sum_by_year_and_category, all = TRUE) > > > > dim(data_expanded) > > > > > > > > mat <- matrix(data = data_expanded$sum_of_sales, nrow = > > > > length(unique(data_expanded$year)), ncol = > > > > length(unique(data_expanded$category)) , byrow = TRUE, dimnames = > > > > list( unique(data_expanded$year), unique(data_expanded$category) ) ) > > > > > > > > > > > > data_original > > > > data_expanded > > > > mat > > > > > > > > On Fri, Oct 21, 2022 at 5:03 PM Kelly Thompson <[email protected]> > > > > wrote: > > > > > > > > > > ### > > > > > #I have data presented in a "vertical" data frame as shown below in > > > > > data_original. > > > > > #I want this data in a matrix or "grid", as shown below. > > > > > #What I show below seems like one way this can be done. > > > > > > > > > > #My question: Are there easier or better ways to do this, especially > > > > > in Base R, and also in R packages? > > > > > > > > > > #reproducible example > > > > > > > > > > data_original <- data.frame(year = c('1990', '1999', '1990', '1989'), > > > > > size = c('s', 'l', 'xl', 'xs'), n = c(99, 33, 3, 4) ) > > > > > > > > > > data_expanded <- expand.grid(unique(data_original$year), > > > > > unique(data_original$size), stringsAsFactors = FALSE ) > > > > > colnames(data_expanded) <- c('year', 'size') > > > > > data_expanded <- merge(data_expanded, data_original, all = TRUE) > > > > > > > > > > mat <- matrix(data = data_expanded $n, nrow = > > > > > length(unique(data_expanded $year)), ncol = > > > > > length(unique(data_expanded $size)) , byrow = TRUE, dimnames = list( > > > > > unique(data_expanded$year), unique(data_expanded$size) ) ) > > > > > > > > > > data_original > > > > > data_expanded > > > > > mat > > > > > > ______________________________________________ > > > [email protected] mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > [email protected] mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ [email protected] mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.

