The index indicates which samples should go into the training set. However, you are using out of bag sampling, so it would use the whole training set and return the OOB error (instead of the error estimates that would be produced by resampling via the index).
Which do you want? OOB estimates or other estimates? Based on your previous email, I figured you would have an index list with three sets of sample indicies for sites A+B, sites A+C and sites B+C. In this way you would do three resamples: the first fits using data from sites A &B, then predicts on C (and so on). In this way, the resampled error estimates would be based on the average of the three hold-out sets (actually hold-out sites). OOB error doesn't sound like what you want. MAx On Tue, Jul 27, 2010 at 2:46 PM, Coll <gbco...@gmail.com> wrote: > > Thanks for all the help. > > I had tried using the "index" in caret to try to dictate which rows of the > sample would be used in each of the tree building in RF. (e.g. use all data > from A B site for training, hold out all data from C site for testing etc) > > However after running, when I cross-checked the "index" that goes to train > function and the "inbag" in the resulting randomForest object, I found the > two didn't match. > > Shown as below: > >> data(iris) >> tmpIrisIndex <- createDataPartition(iris$Species, p=0.632, times = 10) >> head(tmpIrisIndex,3) > [[1]] > [1] 1 2 3 7 10 11 12 13 16 18 20 22 24 25 26 27 28 29 > 31 > [20] 34 35 36 37 38 39 40 41 43 46 47 48 50 52 53 55 56 57 > 58 > [39] 61 64 65 66 67 68 69 71 74 75 76 77 79 82 83 84 85 86 > 88 > [58] 90 91 92 94 96 98 99 102 103 104 106 108 109 111 112 113 114 115 > 116 > [77] 117 119 120 121 123 126 128 129 130 131 132 134 136 139 140 141 143 146 > 147 > [96] 150 > > [[2]] > [1] 1 3 6 7 8 10 12 13 14 16 18 20 21 22 23 24 26 27 > 28 > [20] 29 30 32 34 35 36 38 42 44 46 47 48 50 51 53 54 55 58 > 60 > [39] 61 62 67 68 69 70 72 73 74 76 77 79 81 82 83 85 86 88 > 89 > [58] 90 92 93 95 97 99 100 103 104 105 107 108 109 111 112 113 114 117 > 119 > [77] 120 121 122 123 124 125 127 130 132 133 134 135 137 139 140 141 142 145 > 147 > [96] 149 > > [[3]] > [1] 1 5 7 9 10 11 12 14 18 20 21 22 23 24 26 29 30 31 > 33 > [20] 34 35 36 37 38 39 40 44 45 46 47 48 49 51 52 53 54 56 > 58 > [39] 61 63 65 66 69 70 72 74 75 76 77 78 79 80 82 83 85 86 > 87 > [58] 90 91 92 93 94 98 100 102 103 105 106 107 109 110 113 114 115 116 > 117 > [77] 121 122 123 124 125 128 129 130 131 132 133 134 135 138 139 140 141 142 > 146 > [96] 150 > >> irisTrControl <- trainControl(method = "oob", index = tmpIrisIndex) >> rf.iris.obj <-train(Species~., data= iris, method = "rf", ntree = 10, >> keep.inbag = TRUE, trControl = irisTrControl) > Fitting: mtry=2 > Fitting: mtry=3 > Fitting: mtry=4 >> head(rf.iris.obj$finalModel$inbag,20) > [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > [1,] 1 0 1 0 0 0 1 0 1 1 > [2,] 1 1 1 1 1 0 1 0 1 0 > [3,] 1 1 1 0 0 1 1 0 0 0 > [4,] 1 0 1 0 1 1 0 1 0 1 > [5,] 0 1 1 1 1 1 0 1 0 1 > [6,] 1 1 0 1 0 0 1 1 1 0 > [7,] 1 1 0 0 1 1 0 0 0 0 > [8,] 1 1 1 1 1 0 1 1 1 1 > [9,] 1 1 0 1 0 1 0 1 1 0 > [10,] 1 1 1 0 1 1 0 0 0 1 > [11,] 1 1 1 1 1 1 1 0 1 0 > [12,] 1 1 1 1 1 0 1 0 1 1 > [13,] 1 0 1 1 1 1 1 1 0 1 > [14,] 0 1 1 1 0 1 0 0 0 0 > [15,] 1 1 1 1 1 1 1 1 1 0 > [16,] 1 1 0 0 0 0 1 0 1 1 > [17,] 1 0 1 0 0 0 1 1 0 1 > [18,] 1 0 1 1 1 1 1 1 1 1 > [19,] 1 0 1 0 1 1 1 0 1 1 > [20,] 1 0 1 0 1 1 1 0 1 0 > > My understanding is the 1st tree in the RF should be built with > tmpIrisIndex[1] i.e. "1 2 3 7 10 11 12 13 ..." ? > But the Inbag in the resulting forest is showing it is using "1 2 3 4 6 7 8 > 9..." for inbag in 1st tree? > > Why the index passed to train does not match what got from inbag in the rf > object? Or I had looked to the wrong place to check this? > > Any help / comments would be appreciated. Thanks a lot. > > Regards, > Coll > > > > -- > View this message in context: > http://r.789695.n4.nabble.com/Random-Forest-Strata-tp2295731p2303958.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.