I am trying to use tree to partition a data set. The data set has 3924 observations. Partitioning seems to work for small subsets of the data, but when I use the entire data set, no partitioning occurs. The variables are:
RESP respondent to a survey (0 = not a respondent, 1 = respondent) AGE_P Age (continuous) ORIGIN_I Hispanic Ethnicity (1 = Hispanic, 2 = non-Hispanic) RACRECI2 Race Recode (1 = White, 2 = Black, 3 = Other) parents Parent(s) present in the family (1 = Yes, 2 = No) educ Education Recode (1 = HS, GED, or less, 5 = some college, 6 = Bachelor's or AA degree, 9 = Master's & higher Here are 2 calls to tree and a snip of summary results: ### Use a sample of 100 #### > set.seed(331) > nsize = 100 > sam <- sample(1:nrow(nhis), nsize) > > t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + educ, + method = "class", + control = tree.control(nobs = nsize, minsize = 10), + data = nhis[sam,]) > summary(t1) Classification tree: tree(formula = RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + educ, data = nhis[sam, ], control = tree.control(nobs = nsize, minsize = 10), method = "class") Number of terminal nodes: 13 ##### All vars were used #### Use entire data set #### > nsize = 3924 > sam <- sample(1:nrow(nhis), nsize) > > t1 <- tree( RESP ~ AGE_P + ORIGIN_I + RACRECI2 + parents + educ, + method = "class", + control = tree.control(nobs = nsize, minsize = 10), + data = nhis[sam,]) > summary(t1) Classification tree: . . . Variables actually used in tree construction: character(0) ##### No vars were used Number of terminal nodes: 1 It doesn't matter whether I use the categorical vars as factors or not; I still get the same results. As I increase the subsample from 100 incrementally up to 1200 , fewer vars are used in tree construction. At 1200 the point is reached where none are used. Is there a way to force tree to do something with the larger sample sizes and the whole data set? Package tree version 1.0-26 R 2.6.0 Windows XP, v.5.1, service pack 2 Thanks Richard Valliant U of Maryland US ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.