My dataframe has 113K rows split by a factor into 58 separate data.frames, with a different numbers of rows (see error output below).
I cannot think of a way of proving a sample of data; if a sample for a MWE is desired advice on produing one using dput() is needed. To summarize each group within this dataframe I'm using by() and getting an error because of the different number of rows:
by(rainfall_by_site, rainfall_by_site[, 'name'], function(x) {
+ mean.rain <- mean(rainfall_by_site[, 'prcp']) + }) Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 4900, 1085, 1894, 2844, 3520, 647, 239, 3652, 3701, 3063, 176, 4713, 4887, 119, 165, 1221, 3358, 1457, 4896, 166, 690, 1110, 212, 1727, 227, 236, 1175, 1485, 186, 769, 139, 203, 2727, 4357, 1035, 1329, 1454, 973, 4536, 208, 350, 125, 3437, 731, 4894, 2598, 2419, 752, 427, 136, 685, 4849, 914, 171 My web searches have not found anything relevant; perhaps my search terms (such as 'R: apply by() with different factor row numbers') can be improved. The help pages found using apropos('by') appear the same: ?by, ?by.data.frame, ?by.default and provide no hint on how to work with unequal rows per factor. How can I apply by() on these data.frames? Rich ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.