Actually, the results are not the same. Looks like in the code below (see "using dplyr”), the function create_bins2 is not being applied separately to each "group_by" variable. That is surprising to me, or I'm misunderstanding dplyr.
### Create some data set.seed(4) df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = c("model1", "model2"))) ### This is the code using plyr, which I'd like to change using dplyr create_bins <- function(x, nBins) { Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) dfB <- data.frame(pred = x$pred, bin = cut(x$pred, breaks = Breaks, include.lowest = TRUE)) dfB } nBins = 10 res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) head(res_plyr) ### Attempt using dplyr create_bins2 <- function (pred, nBins) { Breaks <- unique(quantile(pred, probs = seq(0, 1, 1/nBins))) bin <- cut(pred, breaks = Breaks, include.lowest = TRUE) bin } res_dplyr <- dplyr::mutate(dplyr::group_by(df, models), bin=create_bins2(pred, nBins)) identical(res_plyr, as.data.frame(res_dplyr)) [1] FALSE #levels(res_dplyr$bin) == levels(res_plyr$bin) Thanks, Axel. > On Oct 30, 2015, at 12:19 PM, William Dunlap <wdun...@tibco.com> wrote: > > dplyr::mutate is probably what you want instead of dplyr::summarize: > > create_bins3 <- function (xpred, nBins) > { > Breaks <- unique(quantile(xpred, probs = seq(0, 1, 1/nBins))) > bin <- cut(xpred, breaks = Breaks, include.lowest = TRUE) > bin > } > dplyr::group_by(df, models) %>% dplyr::mutate(Bin=create_bins3(pred,nBins)) > #Source: local data frame [100 x 3] > #Groups: models [2] > # > # pred models Bin > # (dbl) (fctr) (fctr) > #1 0.2167549 model1 (0.167,0.577] > #2 -0.5424926 model1 (-0.869,-0.481] > ... > > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com/> > On Fri, Oct 30, 2015 at 9:06 AM, William Dunlap <wdun...@tibco.com > <mailto:wdun...@tibco.com>> wrote: > The error message is not very helpful and the stack trace is pretty > inscrutable as well > > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) > Error: not a vector > > traceback() > 14: stop(list(message = "not a vector", call = NULL, cppstack = NULL)) > 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) > 12: summarise_impl(.data, dots) > 11: summarise_.tbl_df(.data, .dots = lazyeval::lazy_dots(...)) > 10: summarise_(.data, .dots = lazyeval::lazy_dots(...)) > 9: dplyr::summarize(., create_bins) > 8: function_list[[k]](value) > 7: withVisible(function_list[[k]](value)) > 6: freduce(value, `_function_list`) > 5: `_fseq`(`_lhs`) > 4: eval(expr, envir, enclos) > 3: eval(quote(`_fseq`(`_lhs`)), env, env) > 2: withVisible(eval(quote(`_fseq`(`_lhs`)), env, env)) > 1: dplyr::group_by(df, models) %>% dplyr::summarize(create_bins) > > > It does not mean that your function, create_bins, does not return a vector -- > the sum function gives the same result. help(summarize,package="dplyr") > says: > ...: Name-value pairs of summary functions like ‘min()’, ‘mean()’, > ‘max()’ etc. > It apparently means calls to summary functions, not summary functions > themselves. The examples in the help file show the proper usage. > > Use a call to your function and you will see it works better > > dplyr::group_by(df, models) %>% dplyr::summarize(create_bins(pred,nBins)) > Error: $ operator is invalid for atomic vectors > The traceback again is not very useful, because the call information was > stripped by dplyr (by the call=NULL in the call to stop()): > > traceback() > 14: stop(list(message = "$ operator is invalid for atomic vectors", > call = NULL, cppstack = NULL)) > 13: .Call("dplyr_summarise_impl", PACKAGE = "dplyr", df, dots) > However it is clear that the fault is in your function, which is expecting a > data.frame x with a column called pred but gets pred itself. Change x to > xpred > in the argument list and x$pred to xpred in the body of the function. > > You will run into more problems because your function returns a vector > the length of its input but summarize expects a summary function - one > that returns a scalar for any size vector input. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com <http://tibco.com/> > > On Fri, Oct 30, 2015 at 4:04 AM, Axel Urbiz <axel.ur...@gmail.com > <mailto:axel.ur...@gmail.com>> wrote: > So in this case, "create_bins" returns a vector and I still get the same > error. > > > create_bins <- function(x, nBins) > { > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > bin <- cut(x$pred, breaks = Breaks, include.lowest = TRUE) > bin > } > > > ### Using dplyr (fails) > nBins = 10 > by_group <- dplyr::group_by(df, models) > res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) > Error: not a vector > > On Thu, Oct 29, 2015 at 8:28 PM, Jeff Newmiller <jdnew...@dcn.davis.ca.us > <mailto:jdnew...@dcn.davis.ca.us>> > wrote: > > > You are jumping the gun (your other email did get through) and you are > > posting using HTML (which does not come through on the list). Some time > > (re)reading the Posting Guide mentioned at the bottom of all emails on this > > list seems to be in order. > > > > The error is actually quite clear. You should return a vector from your > > function, not a data frame. > > --------------------------------------------------------------------------- > > Jeff Newmiller The ..... ..... Go Live... > > DCN:<jdnew...@dcn.davis.ca.us <mailto:jdnew...@dcn.davis.ca.us>> > > Basics: ##.#. ##.#. Live > > Go... > > Live: OO#.. Dead: OO#.. Playing > > Research Engineer (Solar/Batteries O.O#. #.O#. with > > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > > --------------------------------------------------------------------------- > > Sent from my phone. Please excuse my brevity. > > > > On October 29, 2015 4:55:19 PM MST, Axel Urbiz <axel.ur...@gmail.com > > <mailto:axel.ur...@gmail.com>> > > wrote: > > >Hello, > > > > > >Sorry, resending this question as the prior was not sent properly. > > > > > >I’m using the plyr package below to add a variable named "bin" to my > > >original data frame "df" with the user-defined function "create_bins". > > >I'd > > >like to get similar results using dplyr instead, but failing to do so. > > > > > >set.seed(4) > > >df <- data.frame(pred = rnorm(100), models = gl(2, 50, 100, labels = > > >c("model1", "model2"))) > > > > > > > > >### Using plyr (works fine) > > >create_bins <- function(x, nBins) > > >{ > > > Breaks <- unique(quantile(x$pred, probs = seq(0, 1, 1/nBins))) > > > dfB <- data.frame(pred = x$pred, > > > bin = cut(x$pred, breaks = Breaks, include.lowest = > > >TRUE)) > > > dfB > > >} > > > > > >nBins = 10 > > >res_plyr <- plyr::ddply(df, plyr::.(models), create_bins, nBins) > > >head(res_plyr) > > > > > >### Using dplyr (fails) > > > > > >by_group <- dplyr::group_by(df, models) > > >res_dplyr <- dplyr::summarize(by_group, create_bins, nBins) > > >Error: not a vector > > > > > > > > >Any help would be much appreciated. > > > > > >Best, > > >Axel. > > > > > > [[alternative HTML version deleted]] > > > > > >______________________________________________ > > >R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To > > >UNSUBSCRIBE and more, see > > >https://stat.ethz.ch/mailman/listinfo/r-help > > ><https://stat.ethz.ch/mailman/listinfo/r-help> > > >PLEASE do read the posting guide > > >http://www.R-project.org/posting-guide.html > > ><http://www.r-project.org/posting-guide.html> > > >and provide commented, minimal, self-contained, reproducible code. > > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org <mailto:R-help@r-project.org> mailing list -- To > UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > <https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > <http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.