Hi Amanda, Sorry for the bit of a slow response (classes and research have been chaotic). Below are details on what I looked at and a few suggestions at the end for what you can do.
To the general R community: summary.rpart() makes explicit the default dropping behavior of `[` which makes me think that it may be important, but it seems to cause problems in the case of only one node because a 1 x k matrix is passed which when the dimensions are dropped results in a vector. Could this be changed to drop = FALSE (fixing the case for one node) without causing problems for other models? Cheers, Josh ## Read in example data trial <- structure(list(ENROLL_YN = structure(c(1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("N", "Y"), class = "factor"), MINORITY = c(0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = c("ENROLL_YN", "MINORITY"), class = "data.frame", row.names = c(8566L, 7657L, 3155L, 6429L, 8651L, 7973L, 6L, 5865L, 5878L, 5037L, 6950L, 9139L, 960L, 3058L, 7979L, 2465L, 4231L, 1529L, 7500L, 8248L)) require(rpart) ## fit the model ## no errors suggesting the problem is not here m <- rpart(ENROLL_YN ~ MINORITY, data = trial, method="class") ## this throws an error ## makes me think that either some summary information ## or the print/show methods are the cause summary(m) ## look at the class of the model object class(m) ## look at the methods for summary methods(summary) ## poke in the source code for summary.rpart ## (note non exported function so using :::) rpart:::summary.rpart ## we already know from your traceback() output the code to look for ## x$functions$summary ## looking at the summary.rpart source ## x is the model object ## so.... m$functions$summary ## yval, the first argument evidently needs at least two dimensions ## and at least 2 columns ## back at the summary.rpart code, it looks like what is getting passed is ## else tprint <- x$functions$summary(ff$yval2[rows, , drop = FALSE], ## ff$dev[rows], ff$wt[rows], ylevel, digits) # so what is ff is defined earlier as x$frame (where x is the model object) m$frame$yval2 ## is a 1 x 5 matrix ## look what happens when we select all of it with drop = TRUE m$frame$yval2[, , drop = TRUE] ## looking now at ?rpart.object where we learn that the frame element contains: ## Extra response information is in 'yval2', which contains the ## number of events at the node (poisson), or a matrix ## containing the fitted class, the class counts for each node ## and the class probabilities (classification). Also included ## in the frame are 'complexity', the complexity parameter at ## which this split will collapse, 'ncompete', the number of ## competitor splits retained, and 'nsurrogate', the number of ## surrogate splits retained. ## basically, the issue is, your model (at least in the example data) only has 1 node ## so the matrix has 1 row, and when drop = TRUE, this reduces yval2 to a vector ## which causes problems for the summary methods ## I am not familiar enough with rpart to say if this is at it should be ## or if perhaps a modification is in order ## for here and now, you can either just not use summary() ## find a way to get more nodes ## or create a copy of rpart:::summary.rpart where you change drop = TRUE to drop = FALSE ## around line 57 of the function. Call it something new (like rpartSummary2) ## then rpartSummary2(m) and it will work ## I did this and got: ## > rpartSummary2(m) ## Call: ## rpart(formula = ENROLL_YN ~ MINORITY, data = trial, method = "class") ## n= 20 ## CP nsplit rel error ## 1 0.01 0 1 ## Node number 1: 20 observations ## predicted class=N expected loss=0.15 ## class counts: 17 3 ## probabilities: 0.850 0.150 On Wed, Jan 11, 2012 at 1:31 PM, Amanda Marie Elling <ell...@stolaf.edu> wrote: > Hi Josh, > Thanks for getting back to us so fast!! > We created a subset of 20 cases and still ran into the same issue, I have > copied the code below along with the dput() and traceback() outputs. > >> trial=accept.students.n08[sample(1:5000,20),] >> dput(trial[, c("ENROLL_YN", "MINORITY")]) > structure(list(ENROLL_YN = structure(c(1L, 1L, 1L, 1L, 2L, 1L, > 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L), .Label = c("N", > "Y"), class = "factor"), MINORITY = c(0L, 0L, 1L, 0L, 0L, 0L, > 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L)), .Names = > c("ENROLL_YN", > "MINORITY"), class = "data.frame", row.names = c(8566L, 7657L, > 3155L, 6429L, 8651L, 7973L, 6L, 5865L, 5878L, 5037L, 6950L, 9139L, > 960L, 3058L, 7979L, 2465L, 4231L, 1529L, 7500L, 8248L)) >> fit_rpart2=rpart(trial$ENROLL_YN~trial$MINORITY, method="class") >> summary(fit_rpart2) > Call: > rpart(formula = trial$ENROLL_YN ~ trial$MINORITY, method = "class") > n= 20 > > CP nsplit rel error > 1 0.01 0 1 > > Error in yval[, 1] : incorrect number of dimensions >> traceback() > 3: x$functions$summary(ff$yval2[rows, , drop = TRUE], ff$dev[rows], > ff$wt[rows], ylevel, digits) > 2: summary.rpart(fit_rpart2) > 1: summary(fit_rpart2) > >> data.frame(trial$MINORITY,trial$ENROLL_YN) > trial.MINORITY trial.ENROLL_YN > 1 0 N > 2 0 N > 3 1 N > 4 0 N > 5 0 Y > 6 0 N > 7 0 N > 8 0 N > 9 0 N > 10 0 N > 11 1 N > 12 0 N > 13 0 N > 14 0 Y > 15 0 N > 16 0 N > 17 0 N > 18 1 N > 19 1 Y > 20 0 N > > We are still unsure what the error is referring to. Thoughts?? Let us know > if you need anything else. Thanks so much for your help! > > Amanda > > > On Sun, Jan 8, 2012 at 7:41 PM, Joshua Wiley <jwiley.ps...@gmail.com> wrote: >> >> Hi Amanda, >> >> Can you reproduce the error with a small subset of the data? If so, >> could you send it to us? For instance if say 20 cases is sufficient, >> you could send the output of dput() which pastes easily into the >> console: >> >> dput(yourdata[, c("ENROLL_YN", "MINORITY")]) >> >> You could also try calling traceback() after the error to get a bit >> more diagnostics (and post those if they do not make any sense or help >> you). >> >> Hope this helps, >> >> Josh >> >> On Sun, Jan 8, 2012 at 1:48 PM, Amanda Marie Elling <ell...@stolaf.edu> >> wrote: >> > We are trying to make a decision tree using rpart and we are continually >> > running into the following error: >> > >> >> fit_rpart=rpart(ENROLL_YN~MINORITY,method="class") >> >> summary(fit_rpart) >> > Call: >> > rpart(formula = ENROLL_YN ~ MINORITY, method = "class") >> > n= 5725 >> > >> > CP nsplit rel error >> > 1 0 0 1 >> > Error in yval[, 1] : incorrect number of dimensions >> > >> > ENROLL_YN is a categorical variable with two options- yes or no. >> > MINORITY is also a categorical variable with two options- 0 or 1. >> > >> > We have confirmed that all variables are the same length and there are >> > no >> > NAs. >> > >> > Does anyone have any ideas that might help?? All thoughts would be >> > appreciated, thanks! >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Joshua Wiley >> Ph.D. Student, Health Psychology >> Programmer Analyst II, Statistical Consulting Group >> University of California, Los Angeles >> https://joshuawiley.com/ > > -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.