Nice example of the issue Bill. Thank you. Is this a known issue? Plans to be fixed?
Thanks again, Axel. > On Nov 2, 2015, at 8:58 PM, William Dunlap <wdun...@tibco.com> wrote: > > dplyr::mutate does not collapse factor variables well. They seem to get > their levels from the levels > computed for the first group and mutate does not check for them having > different levels. > > > data.frame(group=rep(c("A","B","C"),each=2), value=rep(c("X","Y","Z"),3:1)) > > %>% dplyr::group_by(group) %>% dplyr::mutate(fv=factor(value)) > Source: local data frame [6 x 3] > Groups: group [3] > > group value fv > (fctr) (fctr) (fctr) > 1 A X X > 2 A X X > 3 B X X > 4 B Y NA > 5 C Y X > 6 C Z NA > > levels(.Last.value$fv) > [1] "X" > > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com/> > On Mon, Nov 2, 2015 at 5:38 PM, Axel Urbiz <axel.ur...@gmail.com > <mailto:axel.ur...@gmail.com>> wrote: > Actually, the results are not the same. Looks like in the code below (see > "using dplyr”), the function create_bins2 is not being applied separately to > each "group_by" variable. That is surprising to me, or I'm misunderstanding > dplyr. > > ### Create some data > > set.seed(4) > df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = > c("model1", "model2"))) > > ### This is the code using plyr, which I'd like to change using dplyr > > create_bins <- function(x, nBins) { > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > dfB <- data.frame(pred = x$pred, > bin = cut(x$pred, breaks = Breaks, > include.lowest = TRUE)) > dfB > } > > nBins = 10 > res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) > head(res_plyr) > > ### Attempt using dplyr > > create_bins2 <- function (pred, nBins) { > Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins))) > bin <- cut(pred, breaks = Breaks, include.lowest = TRUE) > bin > } > > res_dplyr <- dplyr::mutate(dplyr::group_by(df, models), > bin=create_bins2(pred, nBins)) > > > identical(res_plyr, as.data.frame(res_dplyr)) > [1] FALSE > #levels(res_dplyr$bin) == levels(res_plyr$bin) > > Thanks, > Axel. > > > >> On Oct 30, 2015, at 12:19 PM, William Dunlap <wdun...@tibco.com >> <mailto:wdun...@tibco.com>> wrote: >> >> dplyr::mutate is probably what you want instead of dplyr::summarize: >> >> create_bins3 <- function (xpred, nBins) >> { >> Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins))) >> bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE) >> bin >> } >> dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins)) >> #Source: local data frame [100 x 3] >> #Groups: models [2] >> # >> # pred models Bin >> # (dbl) (fctr) (fctr) >> #1 0.2167549 model1 (0.167,0.577] >> #2 -0.5424926 model1 (-0.869,-0.481] >> ... >> >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com/> >> On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap <wdun...@tibco.com >> <mailto:wdun...@tibco.com>> wrote: >> The error message is not very helpful and the stack trace is pretty >> inscrutable as well >> > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) >> Error: not a vector >> > traceback() >> 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL)) >> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) >> 12: summarise_impl(.data, dots) >> 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...)) >> 10: summarise_(.data, .dots = lazyeval::lazy_dots(...)) >> 9: dplyr::summarize(., create_bins) >> 8: function_list[[k]](value) >> 7: withVisible(function_list[[k]](value)) >> 6: freduce(value, `_function_list`) >> 5: `_fseq`(`_lhs`) >> 4: eval(expr, envir, enclos) >> 3: eval(quote(`_fseq`(`_lhs`)), env, env) >> 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) >> 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) >> >> >> It does not mean that your function, create_bins, does not return a vector -- >> the sum function gives the same result. help(summarize,package="dplyr") >> says: >> ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’, >> ‘max()’ etc. >> It apparently means calls to summary functions, not summary functions >> themselves. The examples in the help file show the proper usage. >> >> Use a call to your function and you will see it works better >> > dplyr::group_by(df, models) %>% >> dplyr::summarize(create_bins(pred,nBins)) >> Error: $ operator is invalid for atomic vectors >> The traceback again is not very useful, because the call information was >> stripped by dplyr (by the call=NULL in the call to stop()): >> > traceback() >> 14: stop(list(message = "$ operator is invalid for atomic vectors", >> call = NULL, cppstack = NULL)) >> 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) >> However it is clear that the fault is in your function, which is expecting a >> data.frame x with a column called pred but gets pred itself. Change x to >> xpred >> in the argument list and x$pred to xpred in the body of the function. >> >> You will run into more problems because your function returns a vector >> the length of its input but summarize expects a summary function - one >> that returns a scalar for any size vector input. >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com <http://tibco.com/> >> >> On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.ur...@gmail.com >> <mailto:axel.ur...@gmail.com>> wrote: >> So in this case, "create_bins" returns a vector and I still get the same >> error. >> >> >> create_bins <- function(x, nBins) >> { >> Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) >> bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE) >> bin >> } >> >> >> ### Using dplyr (fails) >> nBins = 10 >> by_group <- dplyr::group_by(df, models) >> res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) >> Error: not a vector >> >> On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us >> <mailto:jdnew...@dcn.davis.ca.us>> >> wrote: >> >> > You are jumping the gun (your other email did get through) and you are >> > posting using HTML (which does not come through on the list). Some time >> > (re)reading the Posting Guide mentioned at the bottom of all emails on this >> > list seems to be in order. >> > >> > The error is actually quite clear. You should return a vector from your >> > function, not a data frame. >> > --------------------------------------------------------------------------- >> > Jeff Newmiller The ..... ..... Go Live... >> > DCN:<jdnew...@dcn.davis.ca.us <mailto:jdnew...@dcn.davis.ca.us>> >> > Basics: ##.#. ##.#. Live >> > Go... >> > Live: OO#.. Dead: OO#.. Playing >> > Research Engineer (Solar/Batteries O.O#. #.O#. with >> > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k >> > --------------------------------------------------------------------------- >> > Sent from my phone. Please excuse my brevity. >> > >> > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.ur...@gmail.com >> > <mailto:axel.ur...@gmail.com>> >> > wrote: >> > >Hello, >> > > >> > >Sorry, resending this question as the prior was not sent properly. >> > > >> > >I’m using the plyr package below to add a variable named "bin" to my >> > >original data frame "df" with the user-defined function "create_bins". >> > >I'd >> > >like to get similar results using dplyr instead, but failing to do so. >> > > >> > >set.seed(4) >> > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = >> > >c("model1", "model2"))) >> > > >> > > >> > >### Using plyr (works fine) >> > >create_bins <- function(x, nBins) >> > >{ >> > > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) >> > > dfB <- data.frame(pred = x$pred, >> > > bin = cut(x$pred, breaks = Breaks, include.lowest = >> > >TRUE)) >> > > dfB >> > >} >> > > >> > >nBins = 10 >> > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) >> > >head(res_plyr) >> > > >> > >### Using dplyr (fails) >> > > >> > >by_group <- dplyr::group_by(df, models) >> > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) >> > >Error: not a vector >> > > >> > > >> > >Any help would be much appreciated. >> > > >> > >Best, >> > >Axel. >> > > >> > > [[alternative HTML version deleted]] >> > > >> > >______________________________________________ >> > >R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To >> > >UNSUBSCRIBE and more, see >> > >https://stat.ethz.ch/mailman/listinfo/r-help >> > ><https://stat.ethz.ch/mailman/listinfo/r-help> >> > >PLEASE do read the posting guide >> > >http://www.R-project.org/posting-guide.html >> > ><http://www.r-project.org/posting-guide.html> >> > >and provide commented, minimal, self-contained, reproducible code. >> > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To >> UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> <http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.