Dear r-help mailing list,

Is there a way to incorporate weights into the minsplit criteria in rpart,
when the weights are uneven? I could not find a way for the minsplit
threshold to take the weights into account, and when the weights are uneven
it becomes an issue, as the following example shows.
My current workaround is to expand the data into one in which each row is
an observation, but that seems wasteful in both time and memory (and I
doubt I can keep the real datasets I need to work with in memory in their
expanded form anyway), thus - turning for help.
Thanks in advance for your help,
-Saar

The following code shows what the issue is; the first 3 trees are the same,
but the following two (with uneven weights) turn out differently:


## playing with rpart weights
require(rpart)
dev.new()
par(mfrow=c(2,3), xpd=NA)
data(kyphosis)

fitOriginal <- rpart(Kyphosis ~ Age + Number + Start, data=kyphosis,
control=rpart.control(minsplit=15))
plot(fitOriginal)
text(fitOriginal, use.n=TRUE)

# this dataset is the original data repeated 3 times
kyphosisRepeated <- rbind(kyphosis, kyphosis, kyphosis)
fitRepeated <- rpart(Kyphosis ~ Age + Number + Start,
data=kyphosisRepeated, control=rpart.control(minsplit=45))
plot(fitRepeated)
text(fitRepeated, use.n=TRUE)

# instead of repeating, use weights
kyphosisWeighted <- kyphosis
kyphosisWeighted$myWeights <- 3
fitWeighted <- rpart(Kyphosis ~ Age + Number + Start,
data=kyphosisWeighted, weights=myWeights,
    control=rpart.control(minsplit=15))        ## minsplit has to be
adjusted for weights...
plot(fitWeighted)
text(fitWeighted, use.n=TRUE)

# uneven weights don't works the same way
kyphosisUnevenWeights <- rbind(kyphosis, kyphosis)
kyphosisUnevenWeights$myWeights <- c(rep(1,length.out=nrow(kyphosis)),
rep(2,length.out=nrow(kyphosis)))

fitUneven15 <- rpart(Kyphosis ~ Age + Number + Start,
data=kyphosisUnevenWeights, weights=myWeights,
    control=rpart.control(minsplit=15))
plot(fitUneven15)
text(fitUneven15, use.n=TRUE)

fitUneven45 <- rpart(Kyphosis ~ Age + Number + Start,
data=kyphosisUnevenWeights, weights=myWeights,
    control=rpart.control(minsplit=45))
plot(fitUneven45)
text(fitUneven45, use.n=TRUE)

## 30 works, but seems like a special case
fitUneven30 <- rpart(Kyphosis ~ Age + Number + Start,
data=kyphosisUnevenWeights, weights=myWeights,
    control=rpart.control(minsplit=30))
plot(fitUneven30)
text(fitUneven30, use.n=TRUE)

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to