1. Generally this sort of thing (statistical issues) is OT here. 2. Have you tried googling? "recursive partitioning R" .
3. Have you looked at the CRAN "Machine Laearning" Task View? https://cran.r-project.org/web/views/MachineLearning.html -- Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, May 30, 2017 at 4:51 AM, Ross Gayler <r.gay...@gmail.com> wrote: > I am after R package recommendations. > > I have a data frame with ~5 million rows and ~50 columns. (I could do what > I want with a sample of the rows, but ideally i would use all the rows.) > > (1) I want to recursively partition the rows of the data frame in a way > that I manually specify. That is, I want to generate a tree structure such > that each node of the tree represents a subset of the rows of the data > frame and the child nodes of any parent node represent a partition of the > rows represented by the parent node. This is the sort of thing that tree > induction algorithms like CART and ID3 do, but I want to manually specify > the tree structure rather than have some algorithm decide it for me. > > (2) I want the means for specifying the tree structure to be as simple as > possible, because the users will be trying out different tree structures. > > (3) Each node (internal or terminal) of the tree represents a row subset of > the root data frame. I want to be able to specify a function to be applied > to each node that takes the node data frame as input and calculates a set > of summary statistics. I will probably write this node summary function as > a dplyr pipeline. I will want to be able to associate the summaries with > the nodes so that I keep track of the summaries in terms of the tree > structure. > > (4) I want to be able to print and plot the tree of summaries in a way that > shows the summaries in the context of the tree structure. Inevitably, there > will be fiddling with the formatting of the prints and plots, so I expect i > will need user definable print/plot formatting functions that are applied > to each node of the tree. > > What I am looking for is an R package that provides the best starting point > for me to implement this. I am not a particularly good programmer, so > getting a package that minimises what I have to write is important to me. > > So far, the most likely packages appear to be: > > - partykit <http://partykit.r-forge.r-project.org/partykit/> > - data.tree <https://github.com/gluc/data.tree> > > I would appreciate any recommendations for R packages that would serve as a > good base; any comments on the relative merits of the packages for my > purposes; and any pointers to example code of people doing similar things. > > Thanks > > Ross > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.