This bug is fixed in the dev version. Hadley On Sunday, November 23, 2014, John Posner <john.pos...@mjbiostat.com> wrote:
> Thanks to John Kane for an off-list consultation. As the following > annotated transcript shows, it's the group_by() function that transforms a > data frame into something else: a "grouped_df" object that *looks* > identical to the original data frame (e.g. the rows are in the original > order -- *not* grouped, as arrange() would do), but does not always act > like a data frame. > > > library(dplyr) > > > # set up data frame, and show its structure [ see below for clean copy > of dput() code ] > > > > frm = structure(list(Id = structure(1:10, .Label = c("P01", "P02", > + "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10"), class = > "factor"), > + Sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label = > c("Female", > + "Male"), class = "factor"), Height = structure(c(1L, 1L, > + 3L, 2L, 1L, 3L, 1L, 2L, 1L, 1L), .Label = c("Short", "Medium", > + "Tall"), class = "factor"), Value = c(69.47, 64.61, 74.77, > + 73.31, 64.76, 72.78, 64.64, 55.96, 60.45, 51.11)), .Names = c("Id", > + "Sex", "Height", "Value"), row.names = c(NA, -10L), class = "data.frame") > > > > str(frm) > 'data.frame': 10 obs. of 4 variables: > $ Id : Factor w/ 10 levels "P01","P02","P03",..: 1 2 3 4 5 6 7 8 9 10 > $ Sex : Factor w/ 2 levels "Female","Male": 2 1 1 2 2 2 1 2 2 1 > $ Height: Factor w/ 3 levels "Short","Medium",..: 1 1 3 2 1 3 1 2 1 1 > $ Value : num 69.5 64.6 74.8 73.3 64.8 ... > > > # run group_by() on data frame, and show resulting structure > > > > after.group_by = frm %>% group_by(Sex, Height) > > > str(after.group_by) > Classes 'grouped_df', 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 4 > variables: > $ Id : Factor w/ 10 levels "P01","P02","P03",..: 1 2 3 4 5 6 7 8 9 10 > $ Sex : Factor w/ 2 levels "Female","Male": 2 1 1 2 2 2 1 2 2 1 > $ Height: Factor w/ 3 levels "Short","Medium",..: 1 1 3 2 1 3 1 2 1 1 > $ Value : num 69.5 64.6 74.8 73.3 64.8 ... > - attr(*, "vars")=List of 2 > ..$ : symbol Sex > ..$ : symbol Height > - attr(*, "drop")= logi TRUE > - attr(*, "indices")=List of 5 > ..$ : int 1 6 9 > ..$ : int 2 > ..$ : int 0 4 8 > ..$ : int 3 7 > ..$ : int 5 > - attr(*, "group_sizes")= int 3 1 3 2 1 > - attr(*, "biggest_group_size")= int 3 > - attr(*, "labels")='data.frame': 5 obs. of 2 variables: > ..$ Sex : Factor w/ 2 levels "Female","Male": 1 1 2 2 2 > ..$ Height: Factor w/ 3 levels "Short","Medium",..: 1 3 1 2 3 > ..- attr(*, "vars")=List of 2 > .. ..$ : symbol Sex > .. ..$ : symbol Height > > > # the two data structure *seem* to be the same ... > > > frm == after.group_by > Id Sex Height Value > [1,] TRUE TRUE TRUE TRUE > [2,] TRUE TRUE TRUE TRUE > [3,] TRUE TRUE TRUE TRUE > ...etc. > > > # ... but they're not > > > frm[4] > Value > 1 69.47 > 2 64.61 > ...etc. > > > after.group_by[4] > Error in eval(expr, envir, enclos) : index out of bounds > > > # fortunately, we can convert back to a true data frame > > > as.data.frame(after.group_by)[4] > Value > 1 69.47 > 2 64.61 > ...etc. > > ################################## dput() code below > > structure(list(Id = structure(1:10, .Label = c("P01", "P02", > "P03", "P04", "P05", "P06", "P07", "P08", "P09", "P10"), class = "factor"), > Sex = structure(c(2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L), .Label = > c("Female", > "Male"), class = "factor"), Height = structure(c(1L, 1L, > 3L, 2L, 1L, 3L, 1L, 2L, 1L, 1L), .Label = c("Short", "Medium", > "Tall"), class = "factor"), Value = c(69.47, 64.61, 74.77, > 73.31, 64.76, 72.78, 64.64, 55.96, 60.45, 51.11)), .Names = c("Id", > "Sex", "Height", "Value"), row.names = c(NA, -10L), class = "data.frame") > > > > > > -----Original Message----- > > From: John Kane [mailto:jrkrid...@inbox.com <javascript:;>] > > Sent: Friday, November 21, 2014 12:33 PM > > To: John Posner; 'r-help@r-project.org <javascript:;>' > > Subject: RE: [R] dplyr/summarize does not create a true data frame > > > > Your code in creating 'frm' is not working for me and it is complicated > enough > > that I don't want to work it out. See ?dput for a better way to supply > data. > > Also see: > > https://github.com/hadley/devtools/wiki/Reproducibility > > http://stackoverflow.com/questions/5963269/how-to-make-a-great-r- > > reproducible-example > > > > That said, I don't see why 'my.output[4]' is not working. Try something > like > > str(frm) to see what you have there and/or resubmit the data in dput > format > > > > See simple example below: > > > > dat1 <- data.frame(aa = sample(1:20, 100, replace = TRUE), bb = 1:100 ) > > dat1[2] > > > > John Kane > > Kingston ON Canada > > > > > > > -----Original Message----- > > > From: john.pos...@mjbiostat.com <javascript:;> > > > Sent: Fri, 21 Nov 2014 17:10:16 +0000 > > > To: r-help@r-project.org <javascript:;> > > > Subject: [R] dplyr/summarize does not create a true data frame > > > > > > I got an error when trying to extract a 1-column subset of a data > > > frame (called "my.output") created by dplyr/summarize. The ncol() > > > function says that my.output has 4 columns, but "my.output[4]" fails. > > > Note that converting my.output using as.data.frame() makes for a happy > > ending. > > > > > > Is this the intended behavior of dplyr? > > > > > > Tx, > > > John > > > > > >> library(dplyr) > > > > > >> # set up data frame > > >> rows = 100 > > >> repcnt = 50 > > >> sexes = c("Female", "Male") > > >> heights = c("Med", "Short", "Tall") > > > > > >> frm = data.frame( > > > + Id = paste("P", sprintf("%04d", 1:rows), sep=""), > > > + Sex = sample(rep(sexes, repcnt), rows, replace=T), > > > + Height = sample(rep(heights, repcnt), rows, replace=T), > > > + V1 = round(runif(rows)*25, 2) + 50, > > > + V2 = round(runif(rows)*1000, 2) + 50, > > > + V3 = round(runif(rows)*350, 2) - 175 > > > + ) > > >> > > >> # use dplyr/summarize to create data frame my.output = frm %>% > > > + group_by(Sex, Height) %>% > > > + summarize(V1sum=sum(V1), V2sum=sum(V2)) > > > > > >> # work with columns in the output data frame > > >> ncol(my.output) > > > [1] 4 > > > > > >> my.output[1] > > > Source: local data frame [6 x 1] > > > Groups: Sex > > > > > > Sex > > > 1 Female > > > 2 Female > > > 3 Female > > > 4 Male > > > 5 Male > > > 6 Male > > > > > >> my.output[4] > > > Error in eval(expr, envir, enclos) : index out of bounds ######## > > > ERROR HERE > > > > > >> as.data.frame(my.output)[4] > > > V2sum > > > 1 12427.97 > > > 2 8449.82 > > > 3 8610.97 > > > 4 7249.20 > > > 5 12616.91 > > > 6 10372.15 > > >> > > > > > > ______________________________________________ > > > R-help@r-project.org <javascript:;> mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > > > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > __________________________________________________________ > > __ > > FREE ONLINE PHOTOSHARING - Share your photos online with your friends > > and family! > > Visit http://www.inbox.com/photosharing to find out more! > > > > ______________________________________________ > R-help@r-project.org <javascript:;> mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- http://had.co.nz/ [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.