The example you gave had only one split. If your real situation has three
splits, you'll have to take a look at testtree$csplit matrix and decide
how you want to define the new grouping variable. Here's one way to do it
...
Jean
library(rpart)
library(rpart.plot)
test_set <- data.frame(
list_var=paste("A", (1:1000)%/%25, sep=''),
list_val=c(runif(250, 1, 4), runif(250, 3, 5), runif(250, 4, 6),
runif(250, 5, 7))
)
# a preliminary tree, to get the splits (not plotted)
testtree <- rpart(list_val ~ list_var, minbucket=100, data=test_set)
# a vector of the unique values of list_var, sorted
suvar <- sort(unique(test_set$list_var))
# define a new variable to represent all combinations of splits in
testtree
groups <- factor(apply(testtree$csplit, 2, paste, collapse="-"),
labels=seq(table(splitz)))
# expand this new variable to the length of the original data frame
test_set$var_grp <- as.factor(groups[match(test_set$list_var, suvar)])
# fit another tree, using the grouping variable, for plotting purposes
testtree2 <- rpart(list_val ~ var_grp, data=test_set)
rpart.plot(testtree2, type=3)
Mark Beauchene <[email protected]> wrote on 07/11/2012 02:34:52
PM:
> Thank you, it works very well.
>
> Could you help me out by explaining a little bit of how it works?
> In my actual plot I have 3 splits on the same long list class
> variable, and I don't completely follow your code.
>
> Mark Beauchene
>
> To: [email protected]
> CC: [email protected]
> Subject: Re: [R] Plotting rpart trees with long list of class members
> From: [email protected]
> Date: Tue, 10 Jul 2012 09:10:05 -0500
>
> Thanks. Very helpful.
>
> You can use the information from the splits in the first tree, to
> define a new grouping variable, which will simplify the plot:
> suvar <- sort(unique(test_set$list_var))
> test_set$var_grp <- as.factor(testtree$csplit[match(test_set
> $list_var, suvar)])
> testtree2 <- rpart ( list_val ~ var_grp, data = test_set )
> rpart.plot(testtree2, type=3)
>
> Not to other readers, you will need to load these packages, before
> running the code:
> library(rpart)
> library(rpart.plot)
>
> Jean
>
>
> MarkBeauchene <[email protected]> wrote on 07/09/2012 03:42:32
PM:
> > Here is some sample code. It generates a class (list_var) that is
used in
> > rpart. list_val is the dependant variable.
> >
> > The plot shows all the values of the class, which is a mess and makes
the
> > plot unuseable. I'd like to either suppress the list entirely or
replace it
> > with something like "Group 1", "Group 2", etc.
> >
> > list_var <- rep(NA,2000)
> > list_val <- rep(NA,2000)
> > for (i in 1:1000) {
> > list_var[i] <- paste("A",i%/%25,sep='')
> > list_val[i] <- runif(1,0,1) }
> > test_set <- data.frame(list_var, list_val )
> >
> >
> >
> >
> > testtree <- rpart ( list_val ~ list_var, data = test_set )
> > rpart.plot(testtree, type=3)
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.