Thanks Max and Dennis. Based on the syntax change I got the result for the PCA part also.
training2 <- training[,grepl("^IL",names(training))] preProc <- preProcess(training2,method="pca",thresh=0.8) test2 <- testing[,grepl("^IL",names(testing))] trainpca <- predict(preProc, training2) testpca <- predict(preProc, test2) modelFitpca <- train(training1$diagnosis ~ .,method="glm",data=trainpca) confusionMatrix(test1$diagnosis,predict(modelFitpca, testpca)) Mohan On Thu, Sep 18, 2014 at 12:43 PM, Mohan Radhakrishnan < radhakrishnan.mo...@gmail.com> wrote: > Oh. I understand now. There is nothing wrong with the logic. It is the > syntax. > > > library(AppliedPredictiveModeling) > > *Warning message:* > > *package ‘AppliedPredictiveModeling’ was built under R version 3.1.1 * > > > set.seed(3433) > > > data(AlzheimerDisease) > > > adData = data.frame(diagnosis,predictors) > > > inTrain = createDataPartition(adData$diagnosis, p = 3/4)[[1]] > > > training = adData[ inTrain,] > > > testing = adData[-inTrain,] > > > training1 <- training[,grepl("^IL|^diagnosis",names(training))] > > > > > > test1 <- testing[,grepl("^IL|^diagnosis",names(testing))] > > > modelFit <- train(diagnosis ~ .,method="glm",data=training1) > > > confusionMatrix(test1$diagnosis,predict(modelFit, test1)) > > Confusion Matrix and Statistics > > > Reference > > Prediction Impaired Control > > Impaired 2 20 > > Control 9 51 > > > > Accuracy : 0.6463 > > 95% CI : (0.533, 0.7488) > > No Information Rate : 0.8659 > > P-Value [Acc > NIR] : 1.00000 > > > > Kappa : -0.0702 > > Mcnemar's Test P-Value : 0.06332 > > > > Sensitivity : 0.18182 > > Specificity : 0.71831 > > Pos Pred Value : 0.09091 > > Neg Pred Value : 0.85000 > > Prevalence : 0.13415 > > Detection Rate : 0.02439 > > Detection Prevalence : 0.26829 > > Balanced Accuracy : 0.45006 > > > > 'Positive' Class : Impaired > > > Thanks, > > Mohan > > On Thu, Sep 18, 2014 at 12:21 AM, Max Kuhn <mxk...@gmail.com> wrote: > >> You have not shown all of your code and it is difficult to diagnose the >> issue. >> >> I assume that you are using the data from: >> >> library(AppliedPredictiveModeling) >> data(AlzheimerDisease) >> >> If so, there is example code to analyze these data in that package. See >> ?scriptLocation. >> >> We have no idea how you got to the `training` object (package versions >> would be nice too). >> >> I suspect that Dennis is correct. Try using more normal syntax without >> the $ indexing in the formula. I wouldn't say it is (absolutely) wrong but >> it doesn't look right either. >> >> Max >> >> >> On Wed, Sep 17, 2014 at 2:04 PM, Mohan Radhakrishnan < >> radhakrishnan.mo...@gmail.com> wrote: >> >>> Hi Dennis, >>> >>> Why is there that warning ? I think my syntax is >>> right. Isn't it not? So the warning can be ignored ? >>> >>> Thanks, >>> Mohan >>> >>> On Wed, Sep 17, 2014 at 9:48 PM, Dennis Murphy <djmu...@gmail.com> >>> wrote: >>> >>> > No reproducible example (i.e., no data) supplied, but the following >>> > should work in general, so I'm presuming this maps to the caret >>> > package as well. Thoroughly untested. >>> > >>> > library(caret) # something you failed to mention >>> > >>> > ... >>> > modelFit <- train(diagnosis ~ ., data = training1) # presumably a >>> > logistic regression >>> > confusionMatrix(test1$diagnosis, predict(modelFit, newdata = test1, >>> > type = "response")) >>> > >>> > For GLMs, there are several types of possible predictions. The default >>> > is 'link', which associates with the linear predictor. caret may have >>> > a different syntax so you should check its help pages re the supported >>> > predict methods. >>> > >>> > Hint: If a function takes a data = argument, you don't need to specify >>> > the variables as components of the data frame - the variable names are >>> > sufficient. You should also do some reading to understand why the >>> > model formula I used is correct if you're modeling one variable as >>> > response and all others in the data frame as covariates. >>> > >>> > Dennis >>> > >>> > On Tue, Sep 16, 2014 at 11:15 PM, Mohan Radhakrishnan >>> > <radhakrishnan.mo...@gmail.com> wrote: >>> > > I answered this question which was part of the online course >>> correctly by >>> > > executing some commands and guessing. >>> > > >>> > > But I didn't get the gist of this approach though my R code works. >>> > > >>> > > I have a training and test dataset. >>> > > >>> > >> nrow(training) >>> > > >>> > > [1] 251 >>> > > >>> > >> nrow(testing) >>> > > >>> > > [1] 82 >>> > > >>> > >> head(training1) >>> > > >>> > > diagnosis IL_11 IL_13 IL_16 IL_17E IL_1alpha IL_3 >>> > > IL_4 >>> > > >>> > > 6 Impaired 6.103215 1.282549 2.671032 3.637051 -8.180721 -3.863233 >>> > > 1.208960 >>> > > >>> > > 10 Impaired 4.593226 1.269463 3.476091 3.637051 -7.369791 -4.017384 >>> > > 1.808289 >>> > > >>> > > 11 Impaired 6.919778 1.274133 2.154845 4.749337 -7.849364 -4.509860 >>> > > 1.568616 >>> > > >>> > > 12 Impaired 3.218759 1.286356 3.593860 3.867347 -8.047190 -3.575551 >>> > > 1.916923 >>> > > >>> > > 13 Impaired 4.102821 1.274133 2.876338 5.731246 -7.849364 -4.509860 >>> > > 1.808289 >>> > > >>> > > 16 Impaired 4.360856 1.278484 2.776394 5.170380 -7.662778 -4.017384 >>> > > 1.547563 >>> > > >>> > > IL_5 IL_6 IL_6_Receptor IL_7 IL_8 >>> > > >>> > > 6 -0.4004776 0.1856864 -0.51727788 2.776394 1.708270 >>> > > >>> > > 10 0.1823216 -1.5342758 0.09668586 2.154845 1.701858 >>> > > >>> > > 11 0.1823216 -1.0965412 0.35404039 2.924466 1.719944 >>> > > >>> > > 12 0.3364722 -0.3987186 0.09668586 2.924466 1.675557 >>> > > >>> > > 13 0.0000000 0.4223589 -0.53219115 1.564217 1.691393 >>> > > >>> > > 16 0.2623643 0.4223589 0.18739989 1.269636 1.705116 >>> > > >>> > > The testing dataset is similar with 13 columns. Number of rows vary. >>> > > >>> > > >>> > > training1 <- training[,grepl("^IL|^diagnosis",names(training))] >>> > > >>> > > test1 <- testing[,grepl("^IL|^diagnosis",names(testing))] >>> > > >>> > > modelFit <- train(training1$diagnosis ~ training1$IL_11 + >>> > training1$IL_13 + >>> > > training1$IL_16 + training1$IL_17E + training1$IL_1alpha + >>> > training1$IL_3 + >>> > > training1$IL_4 + training1$IL_5 + training1$IL_6 + >>> > training1$IL_6_Receptor >>> > > + training1$IL_7 + training1$IL_8,method="glm",data=training1) >>> > > >>> > > confusionMatrix(test1$diagnosis,predict(modelFit, test1)) >>> > > >>> > > I get this error when I run the above command to get the confusion >>> > matrix. >>> > > >>> > > *'newdata' had 82 rows but variables found have 251 rows '* >>> > > >>> > > I thought this was simple. I train a model using the training >>> dataset and >>> > > predict using the test dataset and get the accuracy. >>> > > >>> > > Am I missing the obvious here ? >>> > > >>> > > Thanks, >>> > > >>> > > Mohan >>> > > >>> > > [[alternative HTML version deleted]] >>> > > >>> > > ______________________________________________ >>> > > R-help@r-project.org mailing list >>> > > https://stat.ethz.ch/mailman/listinfo/r-help >>> > > PLEASE do read the posting guide >>> > http://www.R-project.org/posting-guide.html >>> > > and provide commented, minimal, self-contained, reproducible code. >>> > >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.