[R] How can I understand this sentence,and express it by means of Mathematical approach?
This topic refer to independent variables reduction, as we know ,a lot of method can do with it,however, for pre-processing independent varibles, a method like the sentence below can reduce many variable, How can I understand it? what is significant correlation at 5% level, what is the criterion? P value?or what? "Independent variables whose correlation with the response variable was not significant at 5% level were removed" how can I calucate the correlation between them? thank you! -- View this message in context: http://n4.nabble.com/How-can-I-understand-this-sentence-and-express-it-by-means-of-Mathematical-approach-tp1584036p1584036.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] caret package, how can I deal with RFE+SVM wrong message?
Hello, I am learning caret package, and I want to use the RFE to reduce the feature. I want to use RFE coupled Random Forest (RFE+FR) to complete this task. As we know, there are a number of pre-defined sets of functions, like random Forest(rfFuncs), however,I want to tune the parameters (mtr) when RFE, and then I write code below, but there is something wrong message, How can I deal with it? > rfGrid<-expand.grid(.mtry=c(1:2)) > rfectrl<-rfeControl(functions=caretFuncs,method="cv",verbose=F,returnResamp="final",number=10) > subsets<-c(3,4) > set.seed(2) > rf.RFE<-rfe(trx,try,sizes=subsets,rfeControl=rfectrl,method="rf",tuneGrid=rfGrid) Loading required package: class Attaching package: 'class' The following object(s) are masked from package:reshape : condense Fitting: mtry=1 Fitting: mtry=2 Error in varImp.randomForest(object$finalModel, ...) : subscript out of bounds In addition: Warning message: package 'e1071' was built under R version 2.10.1 At the same time, If I want to use RFE+SVM, RFE+nnet, and so on ,how can I do? I have try RFE+SVM, also wrong message:> set.seed(1) > svmProfile<-rfe(trx,try,sizes=c(1:3), + rfeControl=rfeControl(functions=caretFuncs,method="cv", + verbose=F,returnResamp="final",number=10), + method="svmRadial",tuneLength=5) Fitting: sigma=0.009246713, C=0.1 Fitting: sigma=0.009246713, C=1 Fitting: sigma=0.009246713, C=10 Fitting: sigma=0.009246713, C=100 Fitting: sigma=0.009246713, C=1000 Error in rfeControl$functions$rank(fitObject, .x, y) : need importance columns for each class thank you! -- View this message in context: http://n4.nabble.com/caret-package-how-can-I-deal-with-RFE-SVM-wrong-message-tp1678800p1678800.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how can I plot the histogram like this using R?
I want to get the plot like this, http://n4.nabble.com/file/n1839303/%25E9%25A2%2591%25E7%258E%2587%25E5%2588%2586%25E5%25B8%2583%25E5%259B%25BE%25E6%25A0%2587%25E5%2587%2586.jpg %E9%A2%91%E7%8E%87%E5%88%86%E5%B8%83%E5%9B%BE%E6%A0%87%E5%87%86.jpg not this, http://n4.nabble.com/file/n1839303/R.jpg R.jpg and the data here, thank you! http://n4.nabble.com/file/n1839303/y1.txt y1.txt can R deal with this problem? how can I do? -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839303.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
thanks for your help. I can have a try. -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839534.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
thank you, I will try this function barplot. -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839541.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
Thanks for your reply, I just want to get the figure like y1.jpg using the data from y1.txt. Through the figure I want to obtain the split point like y1.jpg, and consider 2.5 as the plit point. This figure is drawn by other people, I just want to draw it using R, but I can not, so I hope, friends can help me. Best wishes! kevin http://n4.nabble.com/file/n1965378/y1.jpg http://n4.nabble.com/file/n1965378/y1.txt y1.txt -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1965378.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I plot the histogram like this using R?
thanks, it is ok! -- View this message in context: http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p2013782.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data frame is killing me! help
Usage data(gasoline) Format A data frame with 60 observations on the following 2 variables. octane a numeric vector. The octane number. NIR a matrix with 401 columns. The NIR spectrum and I see the gasoline data to see below NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 nm NIR.1698 nm NIR.1700 nm 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 1.221135 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 1.198851 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 1.208742 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 1.206696 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 1.202926 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 1.207576 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 1.200446 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 1.188174 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 1.196102 look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 nm NIR.1698 nm NIR.1700 nm how can I add letters NIR to my variable, because my 600 independents never have NIR as the prefix. however, it is needed to model the plsr. for example aa=plsr(y~NIR, data=data ,), the prefix NIR is necessary, how can I do with it? -- View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26015079.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame is killing me! help
Steve Lianoglou-6 wrote: > > Hi, > > On Oct 22, 2009, at 2:35 PM, bbslover wrote: > >> Usage >> data(gasoline) >> Format >> A data frame with 60 observations on the following 2 variables. >> octane >> a numeric vector. The octane number. >> NIR >> a matrix with 401 columns. The NIR spectrum >> >> and I see the gasoline data to see below >> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 >> nm >> NIR.1698 nm NIR.1700 nm >> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 >> 1.221135 >> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 >> 1.198851 >> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 >> 1.208742 >> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 >> 1.206696 >> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 >> 1.202926 >> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 >> 1.207576 >> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 >> 1.200446 >> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 >> 1.188174 >> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 >> 1.196102 >> >> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. >> 1694 nm >> NIR.1696 nm NIR.1698 nm NIR.1700 nm >> >> how can I add letters NIR to my variable, because my 600 >> independents never >> have NIR as the prefix. however, it is needed to model the plsr. for >> example aa=plsr(y~NIR, data=data ,), the prefix NIR is >> necessary, how >> can I do with it? > > I'm not really sue that I'm getting you, but if your problem is that > the column names of your data.frame don't match the variable names > you'd like to use in your formula, just change the colnames of your > data.frame to match your formula. > > BTW - I have no idea where to get this gasoline data set, so I'm just > imagining: > > eg. > colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', > 'you', 'want', 'here') > > -steve > > -- > Steve Lianoglou > Graduate Student: Computational Systems Biology >| Memorial Sloan-Kettering Cancer Center >| Weill Medical College of Cornell University > Contact Info: http://cbio.mskcc.org/~lianos/contact > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > thanks for you. but the numbers of indenpendence are so many, it is not easy to identify them one by one, is there some better way? -- View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame is killing me! help
I have read that one ,I want to this method to be used to my data.but I donot know how to put my data into R. James W. MacDonald wrote: > > > > bbslover wrote: >> >> >> Steve Lianoglou-6 wrote: >>> Hi, >>> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote: >>> >>>> Usage >>>> data(gasoline) >>>> Format >>>> A data frame with 60 observations on the following 2 variables. >>>> octane >>>> a numeric vector. The octane number. >>>> NIR >>>> a matrix with 401 columns. The NIR spectrum >>>> >>>> and I see the gasoline data to see below >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 >>>> nm >>>> NIR.1698 nm NIR.1700 nm >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 >>>> 1.221135 >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 >>>> 1.198851 >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 >>>> 1.208742 >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 >>>> 1.206696 >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 >>>> 1.202926 >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 >>>> 1.207576 >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 >>>> 1.200446 >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 >>>> 1.188174 >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 >>>> 1.196102 >>>> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. >>>> 1694 nm >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm >>>> >>>> how can I add letters NIR to my variable, because my 600 >>>> independents never >>>> have NIR as the prefix. however, it is needed to model the plsr. for >>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is >>>> necessary, how >>>> can I do with it? >>> I'm not really sue that I'm getting you, but if your problem is that >>> the column names of your data.frame don't match the variable names >>> you'd like to use in your formula, just change the colnames of your >>> data.frame to match your formula. >>> >>> BTW - I have no idea where to get this gasoline data set, so I'm just >>> imagining: >>> >>> eg. >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', >>> 'you', 'want', 'here') >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>>| Memorial Sloan-Kettering Cancer Center >>>| Weill Medical College of Cornell University >>> Contact Info: http://cbio.mskcc.org/~lianos/contact >>> >>> __ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> thanks for you. but the numbers of indenpendence are so many, it is not >> easy >> to identify them one by one, is there some better way? > > You don't need to identify anything. What you need to do is read the > help page for the function you want to use, so you (at the very least) > know how to use the function. > > > library(pls) > > data(gasoline) > > fit <- plsr(octane~NIR, data=gasoline, validation = "CV") > > summary(fit) > Data: X dimension: 60 401 > Y dimension: 60 1 > Fit method: kernelpls > Number of components considered: 53 > > VALIDATION: RMSEP > Cross-validated using 10 random segments. > (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps > CV 1.5431.372 0.3827 0.2522 0.2347 0.2455 0.2281 > adjCV1.5431.367 0.3740 0.2497 0.2360 0.2407 0.2243 > 7 comps 8 comps 9 comps 10 comps 11 comps 12 comps 13 comps > CV 0.2311 0.2352 0.24550.25340.27370.28140.2832 > adjCV 0.2257 0.2303 0.23950.24730.26
Re: [R] data frame is killing me! help
I have try it, past can add to wanted letter, but can not past the colume names. May be I should learn it hard. Don MacQueen wrote: > > At 4:57 AM -0700 10/23/09, bbslover wrote: >>Steve Lianoglou-6 wrote: >>> >>> Hi, >>> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote: >>> >>>> Usage >>>> data(gasoline) >>>> Format >>>> A data frame with 60 observations on the following 2 variables. >>>> octane >>>> a numeric vector. The octane number. >>>> NIR >>>> a matrix with 401 columns. The NIR spectrum >>>> >>>> and I see the gasoline data to see below >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 >>>> nm >>>> NIR.1698 nm NIR.1700 nm >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 >>>> 1.221135 >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 >>>> 1.198851 >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 >>>> 1.208742 >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 >>>> 1.206696 >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 >>>> 1.202926 >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 >>>> 1.207576 >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 >>>> 1.200446 >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 >>>> 1.188174 >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 >>>> 1.196102 >>>> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. >>>> 1694 nm >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm >>>> >>>> how can I add letters NIR to my variable, because my 600 >>>> independents never >>>> have NIR as the prefix. however, it is needed to model the plsr. for >>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is >>>> necessary, how >> >> can I do with it? > > Perhaps using paste(). Maybe something like: > > paste('NIR', 1:600,sep=''.) > or > paste('NIR', seq(1686,1700,2),sep='.') > >> > >>> I'm not really sue that I'm getting you, but if your problem is that >>> the column names of your data.frame don't match the variable names >>> you'd like to use in your formula, just change the colnames of your >>> data.frame to match your formula. >>> >>> BTW - I have no idea where to get this gasoline data set, so I'm just >>> imagining: >>> >>> eg. >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', >>> 'you', 'want', 'here') >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://*cbio.mskcc.org/~lianos/contact >>> >>> __ >>> R-help@r-project.org mailing list >>> https://*stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://*www.*R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >>thanks for you. but the numbers of indenpendence are so many, it is not easy >>to identify them one by one, is there some better way? >> >> >>-- >>View this message in context: >>http://*www.*nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html >>Sent from the R help mailing list archive at Nabble.com. >> >>__ >>R-help@r-project.org mailing list >>https://*stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > > -- > - > Don MacQueen > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > m...@llnl.gov > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26036875.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame is killing me! help
thank you Don MacQueen , I will try it. Don MacQueen wrote: > > At 4:57 AM -0700 10/23/09, bbslover wrote: >>Steve Lianoglou-6 wrote: >>> >>> Hi, >>> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote: >>> >>>> Usage >>>> data(gasoline) >>>> Format >>>> A data frame with 60 observations on the following 2 variables. >>>> octane >>>> a numeric vector. The octane number. >>>> NIR >>>> a matrix with 401 columns. The NIR spectrum >>>> >>>> and I see the gasoline data to see below >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 >>>> nm >>>> NIR.1698 nm NIR.1700 nm >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 >>>> 1.221135 >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 >>>> 1.198851 >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 >>>> 1.208742 >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 >>>> 1.206696 >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 >>>> 1.202926 >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 >>>> 1.207576 >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 >>>> 1.200446 >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 >>>> 1.188174 >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 >>>> 1.196102 >>>> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. >>>> 1694 nm >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm >>>> >>>> how can I add letters NIR to my variable, because my 600 >>>> independents never >>>> have NIR as the prefix. however, it is needed to model the plsr. for >>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is >>>> necessary, how >> >> can I do with it? > > Perhaps using paste(). Maybe something like: > > paste('NIR', 1:600,sep=''.) > or > paste('NIR', seq(1686,1700,2),sep='.') > >> > >>> I'm not really sue that I'm getting you, but if your problem is that >>> the column names of your data.frame don't match the variable names >>> you'd like to use in your formula, just change the colnames of your >>> data.frame to match your formula. >>> >>> BTW - I have no idea where to get this gasoline data set, so I'm just >>> imagining: >>> >>> eg. >>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', >>> 'you', 'want', 'here') >>> >>> -steve >>> >>> -- >>> Steve Lianoglou >>> Graduate Student: Computational Systems Biology >>> | Memorial Sloan-Kettering Cancer Center >>> | Weill Medical College of Cornell University >>> Contact Info: http://*cbio.mskcc.org/~lianos/contact >>> >>> __ >>> R-help@r-project.org mailing list >>> https://*stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://*www.*R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >>thanks for you. but the numbers of indenpendence are so many, it is not easy >>to identify them one by one, is there some better way? >> >> >>-- >>View this message in context: >>http://*www.*nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html >>Sent from the R help mailing list archive at Nabble.com. >> >>__ >>R-help@r-project.org mailing list >>https://*stat.ethz.ch/mailman/listinfo/r-help >>PLEASE do read the posting guide http://*www.*R-project.org/posting-guide.html >>and provide commented, minimal, self-contained, reproducible code. > > > -- > - > Don MacQueen > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > m...@llnl.gov > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26036836.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how can I kown which package is added, or updated?
there are many R packages, yesterday, 2031 but today 2033 packages. how can I kown which package is added, or updated? -- View this message in context: http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26037150.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I kown which package is added, or updated?
It is so dramatical. Thank Gabor Grothendieck . I got it. Gabor Grothendieck wrote: > > Google for > CRANberries aggregates > and check first hit. > > On Sat, Oct 24, 2009 at 4:44 AM, bbslover wrote: >> >> there are many R packages, yesterday, 2031 but today 2033 packages. how >> can I >> kown which package is added, or updated? >> -- >> View this message in context: >> http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26037150.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26038761.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] data frame is killing me! help
Thank you ,Petr It is a good answer,clearly. thanks! Petr Pikal wrote: > > Hi > >> data(gasoline) >> str(gasoline) > 'data.frame': 60 obs. of 2 variables: > $ octane: num 85.3 85.2 88.5 83.4 87.9 ... > $ NIR : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 > -0.050859 ... > ..- attr(*, "dimnames")=List of 2 > .. ..$ : chr "1" "2" "3" "4" ... > .. ..$ : chr "900 nm" "902 nm" "904 nm" "906 nm" ... >> str(gasoline$NIR) > AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ... > - attr(*, "dimnames")=List of 2 > ..$ : chr [1:60] "1" "2" "3" "4" ... > ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ... >> is.matrix(gasoline$NIR) > [1] TRUE > > so the second element of gasoline data frame is a matrix > >> ?AsIs > >> df<-data.frame(x=1:5, I(matrix(rnorm(10), 5,2))) >> df > x matrix.rnorm.10...5..2..1 matrix.rnorm.10...5..2..2 > 1 1 0.187703 0.213312 > 2 2 -0.66264 -0.47941 > 3 3 -0.82334 -0.04324 > 4 4 -0.37255 0.883027 > 5 5 -0.28700 -1.03431 >> str(df) > 'data.frame': 5 obs. of 2 variables: > $ x : int 1 2 3 4 5 > $ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264 > -0.82334 -0.37255 -0.28700 ... >> > > Regards > Petr > > r-help-boun...@r-project.org napsal dne 23.10.2009 18:43:56: > >> >> I have read that one ,I want to this method to be used to my data.but I > donot >> know how to put my data into R. >> >> James W. MacDonald wrote: >> > >> > >> > >> > bbslover wrote: >> >> >> >> >> >> Steve Lianoglou-6 wrote: >> >>> Hi, >> >>> >> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote: >> >>> >> >>>> Usage >> >>>> data(gasoline) >> >>>> Format >> >>>> A data frame with 60 observations on the following 2 variables. >> >>>> octane >> >>>> a numeric vector. The octane number. >> >>>> NIR >> >>>> a matrix with 401 columns. The NIR spectrum >> >>>> >> >>>> and I see the gasoline data to see below >> >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm > NIR.1696 >> >>>> nm >> >>>> NIR.1698 nm NIR.1700 nm >> >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 >> >>>> 1.221135 >> >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 >> >>>> 1.198851 >> >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 >> >>>> 1.208742 >> >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 >> >>>> 1.206696 >> >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 >> >>>> 1.202926 >> >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 >> >>>> 1.207576 >> >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 >> >>>> 1.200446 >> >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 >> >>>> 1.188174 >> >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 >> >>>> 1.196102 >> >>>> >> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. >> >>>> 1694 nm >> >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm >> >>>> >> >>>> how can I add letters NIR to my variable, because my 600 >> >>>> independents never >> >>>> have NIR as the prefix. however, it is needed to model the plsr. > for >> >>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is >> >>>> necessary, how >> >>>> can I do with it? >> >>> I'm not really sue that I'm getting you, but if your problem is that > >> >>> the column names of your data.frame don't match t
[R] how can I convert .csv format to matrix???
In my disk C:/ have a a.csv file, I want to read it to R, importantly, when I use x=read.csv("C:/a.csv") ,the x format is data.frame, I want to it to become matrix format, how can I do it ? thank you! -- View this message in context: http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26156643.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how can I convert .csv format to matrix???
thank you for your help,it is a good way. Steven Kang wrote: > > can try > > matrix.x <- as.matrix(x) > > On Mon, Nov 2, 2009 at 8:38 PM, bbslover wrote: > >> >> In my disk C:/ have a a.csv file, I want to read it to R, importantly, >> when >> I use x=read.csv("C:/a.csv") ,the x format is data.frame, I want to it >> to >> become matrix format, how can I do it ? >> >> thank you! >> -- >> View this message in context: >> http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26156643.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26191114.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] variable selectin---reduce the numbers of initial variable
hello, my problem is like this: now after processing the varibles, the remaining 160 varibles(independent) and a dependent y. when I used PLS method, with 10 components, the good r2 can be obtained. but I donot know how can I express my equation with the less varibles and the y. It is better to use less indepent varibles. that is how can I select my indepent varibles. Maybe GA is good method, but now I donot gasp it. and can you give me more good varibles selection's methods. and In R, which method can be used to select the potent varibles . and using the selected varibles to model a equation with higher r2, q2,and less RMSP. thank you! -- View this message in context: http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26195345.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable selectin---reduce the numbers of initial variable
thank you . I can try bayesian. PCA method that I used to is can get some pcs, but I donot know how can i use the original variables in that equation, maybe I should select those have high weight ones,and delete that less weight ones. right? Ricardo Gonçalves Silva wrote: > > Hi, > > Nowdays there's a lot o new variable selection methods, specially using > the > Bayes Paradigm. > For your problem, I think you could try the Bayesian Model Average BMA > package. > Or, you can reduce your data dimension by PCA, which also permits you see > the weight of > each variable in the PC. > > HTH > > Rick > > -- > From: "bbslover" > Sent: Wednesday, November 04, 2009 10:23 AM > To: > Subject: [R] variable selectin---reduce the numbers of initial variable > >> >> hello, >> >> my problem is like this: now after processing the varibles, the remaining >> 160 varibles(independent) and a dependent y. when I used PLS method, with >> 10 >> components, the good r2 can be obtained. but I donot know how can I >> express >> my equation with the less varibles and the y. It is better to use less >> indepent varibles. that is how can I select my indepent varibles. >> Maybe >> GA is good method, but now I donot gasp it. and can you give me more >> good >> varibles selection's methods. and In R, which method can be used to >> select >> the potent varibles . and using the selected varibles to model a >> equation >> with higher r2, q2,and less RMSP. >> >> thank you! >> -- >> View this message in context: >> http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26195345.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > >> >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 9.0.698 / Virus Database: 270.14.48/2479 - Release Date: >> 11/03/09 >> 17:38:00 >> > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26207750.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] variable selectin---reduce the numbers of initial variable
he number of variables. I've had > a lot of success with it. > > Max > > > 2009/11/5 Ricardo Gonçalves Silva : >> Hi Guys, >> >> Of course, a backward, forward, or other methods can be used directly. >> But >> concerning BMA, the model interpretation is far simple: >> >> "Bayesian Model Averaging accounts for the model uncertainty inherent in >> the >> variable selection problem by averaging over the best models in the model >> class according to approximate posterior model probability." >> >> If you want to learn a few more before continue, that a look at the BMA >> homepage: >> >> http://www2.research.att.com/~volinsky/bma.html >> >> But of course, you must do what you think is better for your problem. >> By the way what is the dimension of your problem? >> >> HTH, >> >> Rick >> -- >> From: "Frank E Harrell Jr" >> Sent: Thursday, November 05, 2009 4:12 PM >> To: "Ricardo Gonçalves Silva" >> Cc: "bbslover" ; >> Subject: Re: [R] variable selectin---reduce the numbers of initial >> variable >> >>> Ricardo Gonçalves Silva wrote: >>>> >>>> Yes, right. But I still prefer using BMA. >>>> Best, >>>> >>>> Rick >>> >>> If you are entertaining only one model family, them BMA is a long, >>> tedious, complex way to obtain shrinkage and the resulting averaged >>> model is very difficult to interpret. Consider a more direct approach. >>> >>> Frank >>> >>>> >>>> -- >>>> From: "bbslover" >>>> Sent: Wednesday, November 04, 2009 11:28 PM >>>> To: >>>> Subject: Re: [R] variable selectin---reduce the numbers of initial >>>> variable >>>> >>>>> >>>>> thank you . I can try bayesian. PCA method that I used to is can get >>>>> some >>>>> pcs, but I donot know how can i use the original variables in that >>>>> equation, >>>>> maybe I should select those have high weight ones,and delete that less >>>>> weight ones. right? >>>>> >>>>> Ricardo Gonçalves Silva wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Nowdays there's a lot o new variable selection methods, specially >>>>>> using >>>>>> the >>>>>> Bayes Paradigm. >>>>>> For your problem, I think you could try the Bayesian Model Average >>>>>> BMA >>>>>> package. >>>>>> Or, you can reduce your data dimension by PCA, which also permits you >>>>>> see >>>>>> the weight of >>>>>> each variable in the PC. >>>>>> >>>>>> HTH >>>>>> >>>>>> Rick >>>>>> >>>>>> -- >>>>>> From: "bbslover" >>>>>> Sent: Wednesday, November 04, 2009 10:23 AM >>>>>> To: >>>>>> Subject: [R] variable selectin---reduce the numbers of initial >>>>>> variable >>>>>> >>>>>>> >>>>>>> hello, >>>>>>> >>>>>>> my problem is like this: now after processing the varibles, the >>>>>>> remaining >>>>>>> 160 varibles(independent) and a dependent y. when I used PLS method, >>>>>>> with >>>>>>> 10 >>>>>>> components, the good r2 can be obtained. but I donot know how can I >>>>>>> express >>>>>>> my equation with the less varibles and the y. It is better to use >>>>>>> less >>>>>>> indepent varibles. that is how can I select my indepent varibles. >>>>>>> Maybe >>>>>>> GA is good method, but now I donot gasp it. and can you give me >>>>>>> more >>>>>>> good >>>>>>> varibles selection's methods. and In R, which method can be used >>>>>>> to >>>>>>> select >>>>>>> the potent varibles . and using the selected varibles to model a >>>>>>>
[R] how can I delete those columes with the same element in every row?
e.g. a= a b c d e 1 1 1 3 1 1 2 1 2 3 4 5 3 1 3 3 8 3 4 1 4 3 3 5 5 1 1 3 1 1I want to delete colume a and colume c, because they have the same values in every row, then ,I want to get this data.frame . b= b d e 1 1 1 1 2 2 4 5 3 3 8 3 4 4 3 5 5 1 1 1the following is my code but it's wrong. rm(list=ls()) a=c(1,1,1,1,1); b=c(1,2,3,4,1); c=c(3,3,3,3,3); d=c(1,4,8,3,1); e=c(1,5,3,5,1) data.f=data.frame(a,b,c,d,e) origin.data<-data.f dim.frame=dim(data.f) rn=dim.frame[1] n<-0 for (k in 1:(dim.frame[2]-n)) {if (data.f[1,k]==data.f[rn,k]) {data.f<-data.f[,-k] n<-n+1 k<-k-1 } } origin.data data.f how can i modify it and obtain my wanted result. thank you! -- View this message in context: http://old.nabble.com/how-can-I-delete-those-columes-with-the-same-element-in-every-row--tp26227873p26227873.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] another question: how to delete one of columes in two ones with high correlation(0.95)
my programe is below: a=c(1,2,1,1,1); b=c(1,2,3,4,1); c=c(3,4,3,3,3); d=c(1,2,3,5,1); e=c(1,5,3,5,1) data.f=data.frame(a,b,c,d,e) origin.data<-data.f cor.matrix<-cor(origin.data) origin.cor<-cor.matrix m<-0 for(i in 1:(cor.matrix[1]-1)) { for(j in (i+1):(cor.matrix[2])) { if (cor.matrix[i,j]>=0.95) { data.f<-data.f[,-i]; i<-i+1 } } } origin.cor data.f the result seems to be not righ. origin.cor a b c d e a 1.000 -0.0857493 1.000 -0.1336306 0.5590170 b -0.0857493 1.000 -0.0857493 0.9854509 0.7669650 c 1.000 -0.0857493 1.000 -0.1336306 0.5590170 d -0.1336306 0.9854509 -0.1336306 1.000 0.7470179 e 0.5590170 0.7669650 0.5590170 0.7470179 1.000 > data.f b c d e 1 1 3 1 1 2 2 4 2 5 3 3 3 3 3 4 4 3 5 5 5 1 3 1 1 either colume b or colume d shold be deleted ,for they hight correlation(0.9854509), but the result not,why? -- View this message in context: http://old.nabble.com/another-question%3A-how-to-delete-one-of-columes-in-two-ones-with-high-correlation%280.95%29-tp26228174p26228174.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] another question: how to delete one of columes in two ones with high correlation(0.95)
thank you. I need learn it, after that, maybe I can understant it well. thank Nikhil Nikhil Kaza-2 wrote: > > You need dim(cor.matrix)[1] > > Following might be better instead of a loop, to to get the row ids of > a matrix > > (which(cor.matrix >=0.95) %/% dim(cor.matrix)[1])+1 > > for column ids use modulus instead of integer divison. > > (which(cor.matrix >=0.95) %% dim(cor.matrix)[1]) > > There are probably better ways than this. > > Nikhil > > but probably a better way to do this would be > > On 6 Nov 2009, at 3:16AM, bbslover wrote: > >> for(i in 1:(cor.matrix[1]-1)) >> { >> for(j in (i+1):(cor.matrix[2])) >> { >> if (cor.matrix[i,j]>=0.95) >> { >> data.f<-data.f[,-i]; >> i<-i+1 >> } >> } >> } > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/another-question%3A-how-to-delete-one-of-columes-in-two-ones-with-high-correlation%280.95%29-tp26228174p26240884.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] after PCA, the pc values are so large, wrong?
rm(list=ls()) yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.') dim(yx.df) #get X matrix y<-yx.df[,1] x<-yx.df[,2:643] #conver to matrix mat<-as.matrix(x) #get row number rownum<-nrow(mat) #remove the constant parameters mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))] dim(yx.df) dim(mat1) #remove columns with numbers of zero >0.95 mat2<-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum>0.95))] dim(yx.df) dim(mat2) #remove colunms that sd<0.5 mat3<-mat2[,apply(mat2,2,function(.col)!all(sd(.col)<0.5))] dim(yx.df) dim(mat3) #PCA analysis mat3.pr<-prcomp(mat3,cor=T) summary(mat3.pr,loading=T) pre.cmp<-predict(mat3.pr) cmp<-pre.cmp[,1:3] cmp DF<-cbind(Y,cmp) DF<-as.data.frame(DF) names(DF)<-c('y','p1','p2','p3') DF summary(lm(y~p1+p2+p3,data=DF)) mat3.pr<-prcomp(DF,cor=T) summary(mat3.pr) pre<-predict(mat3.pr) pre1<-pre[,1:3] pre1 colnames(pre1)<-c("x1","x2","x3") pre1 pc<-cbind(y,pre1) pc<-as.data.frame(pc) lm.pc<-lm(y~x1+x2+x3,data=pc) summary(lm.pc) above, my code about pca, but after finishing it, the first three pcs are some large, why? and the fit value r2 are bad. belowe is my value on the firest 3 pcs. > pre1 PC1 PC2 PC3 [1,] -15181.5190 1944.392700 -1074.326182 [2,] -32152.4533 1007.113729 3201.361408 [3,] -15836.5362 2117.988273 -555.799383 [4,] -1618.5561 1481.020337 255.530132 [5,] -5407.5030 1975.779398 -84.646283 [6,] -9662.1949 2611.220928 -417.435782 [7,] -30488.2102 577.385588 1853.420297 [8,] -2135.2563 -4506.112873 1382.413284 [9,] -1584.2796 -4645.142062 929.146895 [10,] -668.7664 -4876.250486 177.691446 [11,] -2188.5914 -4495.203080 1432.428127 [12,] -19633.9581 2159.000138 -1598.710872 [13,] -26849.1088 -515.574085 -2683.552623 [14,] -9492.9503 -4868.648205 1236.986097 [15,] -13857.6517 -4810.228193 1296.342199 [16,] -11596.5097 -8181.631403 462.913210 [17,] -25948.6564 -746.442386 -3415.426682 [18,] 15386.4477 709.974524 555.160973 [19,] 21642.7516 1163.456075 -609.437740 [20,] 22236.7094 675.562564 -136.992578 [21,] 14354.9927 611.996274-4.867054 [22,] 12569.9493 .842240 585.540985 [23,] 20739.0219 3078.679745 1662.902248 [24,] 9472.0249 648.769910 381.487034 [25,] 17299.5307 1424.712428 1522.311676 [26,] 13231.2735 587.761915 170.448061 [27,] 10843.5590 705.485396 -79.931518 [28,] 9402.8803 -1978.216853 -1534.244078 [29,] 13094.9525 212.042937 -363.941664 [30,] 9337.3522 537.885230 189.558999 [31,] 7747.1347 -141.004825 -1664.082447 [32,] 4640.1161 -1489.652284 -3584.574135 [33,] 13241.5054 175.630689 -486.250927 [34,] 3867.2204 814.830143 1584.358007 [35,] 8614.5030 708.274447 814.295587 [36,] -18815.6774 -480.311541 1248.369916 [37,] -1860.0810 1195.557861 269.322703 [38,] 7172.0057 4.216905 -1191.448702 [39,] -7233.2271 -2361.951658 -235.293358 [40,] 1841.3548 1187.225488 632.116420 [41,] 12465.2336 367.822405 160.751014 [42,] -39021.7259 1972.333778 3167.504098 [43,] 13098.7736 -424.152058 -567.846037 [44,] 9793.7729 -559.084900 -210.696126 [45,] 13111.186122.772626 -318.242722 [46,] 13169.0604 7.808885 -363.995563 [47,] 3306.6293 -694.908211 -642.996604 [48,] 10779.8582 -989.175596 -1619.861931 [49,] 10872.6913 -747.979343 -1375.317959 [50,] -3057.5633 1838.449143 1454.886518 [51,] -6854.9316 2338.753165 1113.510561 [52,] -15077.1823 1917.776905 -1158.158633 [53,] -45862.8305 1173.157521 -1707.293955 [54,] -14294.1553 1716.708462 -1794.064434 [55,] 24645.0508 2519.904889 1424.233563 [56,] 23303.5998 2250.088386 839.587354 [57,] 18865.5231 897.56644636.240598 [58,]227.2659 -6582.661199 -712.892569 [59,] 15336.8371 722.953549 593.903314 [60,] 13030.8715 228.509670 -312.933654 [61,] 5826.0388 331.077814 -53.417878 [62,] 13150.4446 -437.612023 -608.342969 [63,] 11728.3897 -83.151510 569.007995 [64,] 11021.5720 -869.425283 -1216.724017 [65,] 9625.3142 137.388994 138.735249 [66,] -15905.2704 3735.547166 421.846379 [67,] -15539.7628 3331.399648 104.886572 [68,] -2294.9924 1648.164750 822.075221 [69,] -10120.0153 1558.766306 -333.378256 [70,] -24241.4554 -533.700229 1516.603088 [71,] -1036.6022 -4782.136067 475.195011 [72,] -24575.2244 2655.599986 -1965.946921 the fit result below: Call: lm(formula = y ~ x1 + x2 + x3, data = pc) Residuals: Min 1Q Median 3Q Max -1.29638 -0.47622 0.01059 0.49268 1.69335 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 5.613e+00 8.143e-02 68.932 < 2e-16 *** x1 -3.089e-05 5.150e-06 -5.998 8.58e-08 *** x2 -4.095e-05 3.448e-05 -1.1880.239 x3 -8.106e-05 6.412e-05 -1.2640.210 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.691 on 68 degrees of freedom Multiple R-squared: 0.
Re: [R] after PCA, the pc values are so large, wrong?
ok,I understand your means, maybe PLS is better for my aim. but I have done that, also bad. the most questions for me is how to select less variables from the independent to fit dependent. GA maybe is good way, but I do not learn it well. Ben Bolker wrote: > > bbslover yeah.net> writes: > >> > [snip] > >> the fit result below: >> Call: >> lm(formula = y ~ x1 + x2 + x3, data = pc) >> >> Residuals: >> Min 1Q Median 3Q Max >> -1.29638 -0.47622 0.01059 0.49268 1.69335 >> >> Coefficients: >> Estimate Std. Error t value Pr(>|t|) >> (Intercept) 5.613e+00 8.143e-02 68.932 < 2e-16 *** >> x1 -3.089e-05 5.150e-06 -5.998 8.58e-08 *** >> x2 -4.095e-05 3.448e-05 -1.1880.239 >> x3 -8.106e-05 6.412e-05 -1.2640.210 >> --- >> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 >> >> Residual standard error: 0.691 on 68 degrees of freedom >> Multiple R-squared: 0.3644, Adjusted R-squared: 0.3364 >> F-statistic: 12.99 on 3 and 68 DF, p-value: 8.368e-07 >> >> x2,x3 is not significance. by pricipal, after PCA, the pcs should >> significance, but my data is not, why? > > Why is it necessary that the first few principal components > should have significant relationships with some other response > values? The strength, and weakness, of PCA is that it is > calculated *without regard* to a response variable, so it > does not constitute "data snooping" ... > I may of course have misinterpreted your question, but at > a quick look, I don't see anything obviously wrong here. > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://old.nabble.com/after-PCA%2C-the-pc-values-are-so-large%2C-wrong--tp26240926p26251658.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to remove one of any two indices with a correlation greater than 0.90 in a matrix (or data.frame)
my code is not right below: rm(list=ls()) #define data.frame a=c(1,2,3,5,6); b=c(1,2,3,4,7); c=c(1,2,3,4,8); d=c(1,2,3,5,1); e=c(1,2,3,5,7) data.f=data.frame(a,b,c,d,e) #backup data.f origin.data<-data.f #get correlation matrix cor.matrix<-cor(origin.data) #backup correlation matrix origin.cor<-cor.matrix #get dim dim.cor<-dim(origin.cor) #perform Loop n<-0 for(i in 1:(dim.cor[1]-1)) { for(j in (i+1):(dim.cor[2])) { if (cor.matrix[i,j]>=0.95) { data.f<-data.f[,-(i-n)] n<-1 break } } } origin.cor origin.data data.f cor(data.f) how write the code to do with my questions? and have a simple way? -- View this message in context: http://old.nabble.com/how-to-remove-one-of-any-two-indices-with-a-correlation-greater-than-0.90-in-a-matrix-%28or-data.frame%29-tp26254082p26254082.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help, GA for varialbe selection. ?!
Dear all, I am learning the subselect package in R, now I want to use GA to select some potent variable, but some questions are puzzled. what i want to resolve is that I have one column dependent y and 219 columns independent x. A total 72 observations is contained in the dataset. I want to select some variables to fit to y. GA is good method to do it as many paper said. but they often test the model by some criteria(i.e. LOO r2 (q2), standard deviaton(sd), etc.). can the "subselect package" get these index(LOO r2,sd) ? The example in the subselcet manual: data(swiss) genetic(cor(swiss),3,4,popsize=10,nger=5,criterion="Rv") is just about matrix X, refer to my data is 72 X 219 matrix. how can I correlate with dependent y(72 X 1 matrix)? I think whether I can use GA to select some variable only from X matrix(independent), and then use the best subset to fit y(dependent) using another algorithm(not containing the subselect package),and then calculate LOO r2 and sd etc. ? The last questions is that, these 219 indepents have correlation each other, need they pre-deal with befor applying GA selection? e.g.remove one of any two indices with a correlation greater than 0.90. I send my data to the accessory. it is a csv file. the first column is dependent (y),and the remaining columns are independent. I want to model a equation with good statistic result(high LOO r2, low sd). HOw Can do. Thank you! Best wishes Kevinhttp://old.nabble.com/file/p26276493/mk1.df.csv mk1.df.csv -- View this message in context: http://old.nabble.com/help%2C-GA-for-varialbe-selection.--%21-tp26276493p26276493.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I remove one of variables which both variables have correlation coefficient more than 0.95 randomly?
http://old.nabble.com/file/p26443595/Edragonr.txt Edragonr.txt HI all, I have a 72*495 matrix, and the first column is the response, and the remaining are independences. Final I want to select some independence to fit y, but there are so many independences, the fit result is not meaning, so I want to reduce the independece,now. Which method or R package or algorithms in R package can deal with this problem? next question, firstly, I want to check the pair correlation coeficient,and want to remove one of variables which both variables have correlation coefficient more than 0.95 randomly? NOTE, This is random. Before I write a programe that can delete correlation variable onlly the first variable, It is not scientific.so I hope all friends can help me write a programe to, randomly,remove one of variables which both variables have correlation coefficient more than 0.95 ? At last, I use those selected variables to fit y, and hope the regression result is beter with correlation coefficient (r2) more than 0.7 at less. n<-0 for(i in 1:(dim.cor[1]-1)) { for(j in (i+1):(dim.cor[2])) { if (mat3.cor[i,j]>=0.90) { mat3<-mat3[,-(i-n)] n<-n+1 break } } } this is my code,but it is not scientific like I said above. And I upload my file. Hoping all friends can help me. -- View this message in context: http://old.nabble.com/How-can-I-remove-one-of-variables-which-both-variables-have-correlation-coefficient-more-than-0.95-randomly--tp26443595p26443595.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] p.value OR F.value?
Hi,all friends, Please help me understand this sentence below: “From this set, 858 columns not significantly correlated with the response variable TBG at the 5% level were removed, leaving a set of 390 columns.” and “ the F-test's value for the one-parameter correlation with the descriptor is below 1.0” is equal?? I want to perform this above sentence with R, how can I do? I just try it below. but I do not know right or wrong? about the above sentence, my idea is like this p.value<0.5, and i write a code to perform it below: xmat4<-xmat3[,apply(xmat3,2,function(.col)!all(var.test(.col,y)$p.value<0.05))] , is right? does the above sentence refer to p.value or F.value? I do not know, please help me! And how can I get the F.value? About this sentence "A further 367 columns with variance below 1.0 kcal/mol were removed as recommended,16 leaving 23 columns." my code below: xmat3<-xmat2[,apply(xmat2,2,function(.col)!all(var(.col)<1))], can I change the var to sd? I have tried it. They have the same result, generally speaking, which one will be used to see the variation of the data? Thank you! kevin -- View this message in context: http://old.nabble.com/p.value---OR---F.value--tp26456379p26456379.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Hello all, How can I get corss-validation MSE of SVM in e1071?
as known, svm need tune some parameters like cost,gamma and epsilon to get better performance,but one question appear, how can i monitor the performance . generally speaking ,we chose the cross-validation MSE in the training set, but It seems svm can not return the cross-validation MSE value, we just get it from "summary model.svm", if I write a loop, have no idear call the cros-validate MSE, and no way to monitor this performance ,how can I do? -- View this message in context: http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071?
thank you for your help, caret package is so powerful , it can do many things. I now, need learn how to apply to my problems. Max Kuhn wrote: > > You can get this using the caret package. There are a few package > vignettes that come with the package and a JSS article > > http://www.jstatsoft.org/v28/i05/paper > > about the package. > > Max > > On Fri, Dec 18, 2009 at 12:26 PM, bbslover wrote: >> >> as known, svm need tune some parameters like cost,gamma and epsilon to >> get >> better performance,but one question appear, how can i monitor the >> performance . generally speaking ,we chose the cross-validation MSE in >> the >> training set, but It seems svm can not return the cross-validation MSE >> value, we just get it from "summary model.svm", if I write a loop, have >> no >> idear call the cros-validate MSE, and no way to monitor this performance >> ,how can I do? >> >> >> -- >> View this message in context: >> http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Max > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975231.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071?
Hello, Max The caret package is so good, I am learning it, but one problem is that nearZeroVar function can be used to identify near zeroâvariance variables and it only identify, how can I remove those variables that were identify, because I have many zero- or near zero- ones, it is not realistic to removel it by hand, can this function can identify and removel those ones automatically? looking for your reply. kevin å¨2009-12-19ï¼"Max Kuhn [via R]" åéï¼ -åå§é®ä»¶- å件人:"Max Kuhn [via R]" åéæ¶é´:2009å¹´12æ19æ¥ ææå æ¶ä»¶äºº:bbslover 主é¢:Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071? You can get this using the caret package. There are a few package vignettes that come with the package and a JSS article http://www.jstatsoft.org/v28/i05/paper about the package. Max On Fri, Dec 18, 2009 at 12:26 PM, bbslover <[hidden email]> wrote: > > as known, svm need tune some parameters like cost,gamma and epsilon to get > better performance,but one question appear, how can i monitor the > performance . generally speaking ,we chose the cross-validation MSE in the > training set, but It seems svm can not return the cross-validation MSE > value, we just get it from "summary model.svm", if I write a loop, have no > idear call the cros-validate MSE, and no way to monitor this performance > ,how can I do? > > > -- > View this message in > context:http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html > Sent from the R help mailing list archive at Nabble.com. > > __ >[hidden email]mailing list >https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975038.html To unsubscribe from Hello all, How can I get corss-validation MSE of SVM in e1071?,click here. -- View this message in context: http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975955.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] what is criterion of removing independence?
Hello, all I have a lot of independents and one dependent, finally, I want to build one model using them, and predict the new samples value, that is regression. before it, I must remove some independents according to some criterion: 1. constant values independent. 2. variant near zero. 3. percentage of zero values > what?? 4. have other critorion? for 3. and 4. 3. I have no idea, generally sepeaking, the corresponding independent with what is percentage of zero values should be removed (20% or 50% or others, have paper support?). 4. statistical, have any critorions that are used to removed independent? give me a hand. Actually, my questions is about feature selections, it is so complex, I hope any friends can give me a guidance. how can I leave those independent which is good to correlate to dependent.(i.e. high correlation coefficent R and small predictive error etc.). And removel "bad" independents. thank you! -- View this message in context: http://n4.nabble.com/what-is-criterion-of-removing-independence-tp975987p975987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help,Suggest me some methods to identify t raining set and test set!!!
I want to split my whole dateset to training set and test set, building model in training set, and validate model using test set. Now, How can I split my dataset to them reasonally. Please give me a hand, It is better to give me some R code. and I see some ways like using SOM to project whole independents to 2-dimensions and find some to be training set and others are test set. like below. I also want to do this. and my date is in xls accessory. Please help me. and excel file is 218*47 matrix, 47 are indepents. I want to project it to 2D and label the corresponding sample label like that picture below. thank you! http://n4.nabble.com/file/n976245/SOM%2Btraining%2Bset%2Band%2Btest%2Bset.jpg SOM+training+set+and+test+set.jpg http://n4.nabble.com/file/n976245/matlab218x47.xls matlab218x47.xls -- View this message in context: http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976245.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help,Suggest me some methods to identify tr aining set and test set!!!
Thank you for all help. It is helpful for me. Max Kuhn wrote: > >> I noticed Max already pointed you to the caret package. >> >> Load the library and look at the help for the createFolds function, eg: >> >> library(caret) >> ?createFolds > > I think that the createDataPartition function in caret might work > better for you. > > There are a number of other packages with similar functions. > > Max > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976641.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected
I am learning the package "caret", after I do the "rfe" function, I get the error ,as follows: Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected In addition: Warning message: In predict.lm(object, x) : prediction from a rank-deficient fit may be misleading I try to that manual example, that is good, my data is wrong. I do not know what reanson? my code is : subsets<-c(1:5,10,15,20,25) ctrl<-rfeControl(functions=lmFuncs, method = "cv", verbose=FALSE,returnResamp="final") lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl) before it, I have do some pre-process and my data is in the attachment. Please help me. thank you! kevin http://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt http://n4.nabble.com/file/n996068/trainY.txt trainY.txt -- View this message in context: http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected
thanks, I have reduce the number of descriptors, and the erroe is none, my major is qsar, but what is the criterion to select descritors, and how many descriptors should be selected, It is a problem, I calculate my descriptors troungh E-dragon, and apply the wonderful package caret,but my result is poor, how can i improve my performance? Max is an expert in this field I think ,can you give me some suggestion in how can I well learn QSAR and build the perfect models based on nonlinear and linear. Here, only myself do QSAR research study lonely, and I have no some software to calculate descriptors except free ons, I just know e-dragon, have others? and good tools to do QSAR? thank you again. kevin! Max Kuhn wrote: > > Your data set has 217 predictors and 166 samples. If you read the > vignette on feature selection for this package, you'll see that the > default ranking mechanism that it uses for linear models requires a > linear model fit. The note that: > >> prediction from a rank-deficient fit may be misleading > > should tell you something. If it doesn't: the model fit is over > determined and there is no unique solution, so many of the parameter > estimates are NA. > > Either create a modified version of lmFuncs that suits your needs or > remove variables prior to modeling (or try some other method that > doesn't require more samples than predictors, such as the lasso or > elasticnet). > > Max > > On Fri, Jan 1, 2010 at 10:14 PM, bbslover wrote: >> >> I am learning the package "caret", after I do the "rfe" function, I get >> the >> error ,as follows: >> >> Error in `[.data.frame`(x, , retained, drop = FALSE) : >> undefined columns selected >> In addition: Warning message: >> In predict.lm(object, x) : >> prediction from a rank-deficient fit may be misleading >> >> >> I try to that manual example, that is good, my data is wrong. I do not >> know >> what reanson? >> >> my code is : >> >> subsets<-c(1:5,10,15,20,25) >> ctrl<-rfeControl(functions=lmFuncs, method = "cv", >> verbose=FALSE,returnResamp="final") >> lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl) >> >> before it, I have do some pre-process and my data is in the attachment. >> >> Please help me. thank you! >> >> kevin http://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt >> http://n4.nabble.com/file/n996068/trainY.txt trainY.txt >> -- >> View this message in context: >> http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html >> Sent from the R help mailing list archive at Nabble.com. >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > > > -- > > Max > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- View this message in context: http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997526.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected
thanks ï¼now i reduce the number of the descriptors, It is ok. å¨2010-01-03ï¼"Max Kuhn [via R]" åéï¼ -åå§é®ä»¶- å件人:"Max Kuhn [via R]" åéæ¶é´:2010å¹´1æ3æ¥ æææ¥ æ¶ä»¶äºº:bbslover 主é¢:Re: [R] Please help me Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected Your data set has 217 predictors and 166 samples. If you read the vignette on feature selection for this package, you'll see that the default ranking mechanism that it uses for linear models requires a linear model fit. The note that: > prediction from a rank-deficient fit may be misleading should tell you something. If it doesn't: the model fit is over determined and there is no unique solution, so many of the parameter estimates are NA. Either create a modified version of lmFuncs that suits your needs or remove variables prior to modeling (or try some other method that doesn't require more samples than predictors, such as the lasso or elasticnet). Max On Fri, Jan 1, 2010 at 10:14 PM, bbslover <[hidden email]> wrote: > > I am learning the package "caret", after I do the "rfe" function, I get the > error ,as follows: > > Error in `[.data.frame`(x, , retained, drop = FALSE) : > undefined columns selected > In addition: Warning message: > In predict.lm(object, x) : > prediction from a rank-deficient fit may be misleading > > > I try to that manual example, that is good, my data is wrong. I do not know > what reanson? > > my code is : > > subsets<-c(1:5,10,15,20,25) > ctrl<-rfeControl(functions=lmFuncs, method = "cv", >verbose=FALSE,returnResamp="final") > lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl) > > before it, I have do some pre-process and my data is in the attachment. > > Please help me. thank you! > > kevinhttp://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt >http://n4.nabble.com/file/n996068/trainY.txt trainY.txt > -- > View this message in > context:http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html > Sent from the R help mailing list archive at Nabble.com. > > __ >[hidden email]mailing list >https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997387.html To unsubscribe from Please help me Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected,click here. -- View this message in context: http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997622.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help, how self-oganizing map show 2D picture and put all samples into the picture?
http://n4.nabble.com/file/n998182/pca.jpg pca.jpg http://n4.nabble.com/file/n998182/som.jpg som.jpg http://n4.nabble.com/file/n998182/all%2Bindepents.xls all+indepents.xls As we know, som is a good tool to cluster hign demension to 2D and show as a 2D picture, just like in the attachment picutre. But now, I do not get that picture like attachment. Who can help me? that is take the all samples show in the picture and see how these samples distribute? pca just like som, 3D show the indepents space distribute, how can i get that picutre. thank you! -- View this message in context: http://n4.nabble.com/help-how-self-oganizing-map-show-2D-picture-and-put-all-samples-into-the-picture-tp998182p998182.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help me! using random Forest package, how to calculate Error Rates in the training set ?
now I am learining random forest and using random forest package, I can get the OOB error rates, and test set rate, now I want to get the training set error rate, how can I do? pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest=FALSE,do.trace=1e2) using the code can get oob and test set error rate, if I replace x.ts and y.ts with x.tr and y.tr,respectively, is the error rate in the training set ? pgp.rf<-randomForest(x.tr,y.tr,x.tr,y.tr,ntree=1e3,keep.forest=FALSE,do.trace=1e2) this time, I get oob error rates and training set error rate, is right? thank you! -- View this message in context: http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1010987.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help me! using random Forest package, how to calculate Error Rates in the training set ?
Thank you, Andy I just read a paper, and they try to compare error rate among oob, test set, and training set and throung a figure showing random forest is not overfitting. when error rate in the training set come to zero, and oob and test set error rate do not increase. I am just a beginner, so I need learn a lot. Thank you kevin å¨2010-01-12ï¼"Liaw, Andy [via R]" åéï¼ -åå§é®ä»¶- å件人:"Liaw, Andy [via R]" åéæ¶é´:2010å¹´1æ12æ¥ ææäº æ¶ä»¶äºº:bbslover 主é¢:Re: [R] Help me! using random Forest package, how to calculate Error Rates in the training set ? From: bbslover > > now I am learining random forest and using random forest > package, I can get > the OOB error rates, and test set rate, now I want to get the > training set > error rate, how can I do? > > pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest > =FALSE,do.trace=1e2) > using the code can get oob and test set error rate, if I > replace x.ts and > y.ts with x.tr and y.tr,respectively, is the error rate in > the training set > ? > > pgp.rf<-randomForest(x.tr,y.tr,x.tr,y.tr,ntree=1e3,keep.forest > =FALSE,do.trace=1e2) > > this time, I get oob error rates and training set error rate, > is right? Yes, or if you used keep.forest=TRUE, feed predict() with your x.tr and compare that with y.tr. However, I really don't understand why people compute "training error rate": what useful information can you get from it? Andy > thank you! > -- > View this message in context: >http://n4.nabble.com/Help-me-using-random-Forest-package-how-t > o-calculate-Error-Rates-in-the-training-set-tp1010987p1010987.html > Sent from the R help mailing list archive at Nabble.com. > > __ >[hidden email]mailing list >https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Notice: This e-mail message, together with any attachme...{{dropped:10}} __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1011366.html To unsubscribe from Help me! using random Forest package, how to calculate Error Rates in the training set ?,click here. -- View this message in context: http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1011752.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?
Hello, I am learning randomForest, now I want to boxplot mse and mtry using 20 5-fold cross-validation(using median value), but I have no a good method to do it, except a not good method. randomforest package itself did not contain cross-validating method, and caret package contain cross validation method, but how can I get the the all number of mtry , at the same time corresponding mse? -- View this message in context: http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?
thank Max. you are so responsible, every time, you give me a lot of help. On my learning road, you are my guide, though we do not know each other. best wishes kevin å¨2010-01-14ï¼"Max Kuhn [via R]" åéï¼ -åå§é®ä»¶- å件人:"Max Kuhn [via R]" åéæ¶é´:2010å¹´1æ14æ¥ ææå æ¶ä»¶äºº:bbslover 主é¢:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation? In caret, see ?trainControl. Use returnResamp = "all" Max On Wed, Jan 13, 2010 at 9:47 AM, bbslover <[hidden email]> wrote: > > Hello, > I am learning randomForest, now I want to boxplot mse and mtry using 20 > 5-fold cross-validation(using median value), but I have no a good method to > do it, except a not good method. > > randomforest package itself did not contain cross-validating method, and > caret package contain cross validation method, but how can I get the the all > number of mtry , at the same time corresponding mse? > > > -- > View this message in > context:http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html > Sent from the R help mailing list archive at Nabble.com. > > __ >[hidden email]mailing list >https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Max __ [hidden email]mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. View message @http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013265.html To unsubscribe from Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?,click here. -- View this message in context: http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013515.html Sent from the R help mailing list archive at Nabble.com. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help! kennard-stone algorithm in soil.spec packages does not work for my dataset!!!
http://r.789695.n4.nabble.com/file/n3031344/RSV.Rdata RSV.Rdata I want to split my dataset to training set and test set using kennard-stone(KS) algorithm, it is lucky there is R packages soil.spec to implement it. but when I used it to my dataset, it does not work, who can help me, how reasons is it, below, it is my code, and my data in the attachment. ks<-ken.sto(x,per="TRUE",per.n=0.3,va="FALSE",sav="FALSE") ks % results $`Chosen sample names` NULL $`Chosen row number` integer(0) $`Chosen calibration sample names` [1] "NULL" $`Chosen calibration row number` [1] "NULL" $`Chosen validation sample names` [1] "NULL" $`Chosen validation row number` [1] "NULL" attr(,"class") why it is all NULL ? and > ks<-ken.sto(x,per="TRUE",per.n=0.3,va="TRUE",sav="FALSE") Error in val.min[i] <- blub[sample(length(blub), 1)] : replacement has length zero In addition: Warning message: In min(prco[-cal.start.n, i]) : no non-missing arguments to min; returning Inf if I set va="TRUE", appearing the errors. I hope some friends can help me ! -- View this message in context: http://r.789695.n4.nabble.com/help-kennard-stone-algorithm-in-soil-spec-packages-does-not-work-for-my-dataset-tp3031344p3031344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] help! kennard-stone algorithm in soil.spec packages does not work for my dataset!!!
http://r.789695.n4.nabble.com/file/n3032045/rsv1.txt rsv1.txt I am very grateful to David's suggestion, here , I upload my dataset "rsv1.txt", also the question, ks<-ken.sto(rsv1,per="TRUE",per.n=0.3,va="FALSE",sav="FALSE") it does not work, all results are NULL, i do not known why it is ? hope, friends can give me a hand! thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/help-kennard-stone-algorithm-in-soil-spec-packages-does-not-work-for-my-dataset-tp3031344p3032045.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to get the plot like the attachment?
http://r.789695.n4.nabble.com/file/n3060425/fig_1.png fig. 1 http://r.789695.n4.nabble.com/file/n3060425/fig_2.png fig. 2 I want to the picture like the above one, the origin crossover together, while the following picture can be obtained by default and the origin is detached, but throgut pulling the window, I can get the one like fig_1. Now, I want to know how to use the code to obtain directly the formation in fig.1(orgin together)? thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-get-the-plot-like-the-attachment-tp3060425p3060425.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to get the plot like the attachment?
thanks, I succeed. kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-get-the-plot-like-the-attachment-tp3060425p3061217.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] how to replace my double for loop which is little efficient!
Dear all, My double for loop as follows, but it is little efficient, I hope all friends can give me a "vectorized" program to replace my code. thanks x: is a matrix 202*263, that is 202 samples, and 263 independent variables num.compd<-nrow(x); # number of compounds diss.all<-0 for( i in 1:num.compd) for (j in 1:num.compd) if (i!=j) { S1<-sum(x[i,]*x[j,]) S2<-sum(x[i,]^2) S3<-sum(x[j,]^2) sim2<-S1/(S2+S3-S1) diss2<-1-sim2 diss.all<-diss.all+diss2} it will cost a long time to finish this computation! i really need "rapid" code to replace my code. thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
thanks for your help, it is great. In addition, In the beginning, the format of x is dataframe, and i run my code, it is so slow, after your help, I change x for matirx, it is so quick. I am very grateful your kind help, and your code is so good! kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
thanks for your help. I am sorry I do not full understand your code, so i can not correct using your code to my data. here is the attachment of my data, and what I want to compute is the equation in the word document of the attachment: the code form Berend can get the answer i want to get. http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] how to replace my double for loop which is little efficient!
Thank Berend, It seems like that it is better to attach a PDF file for avoiding messy code. Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia" is about this coefficient. I also search R site about tanimoto coefficient and learn it more. About your code, I has saved and learned it. Thanks again Kevin -- View this message in context: http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164920.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] help! complete the reviewer's suggest: carry out GA+GP (gaussian process)!
Hello, all experts, My major is computer-aied drug design ( main QSAR). Now, my paper need be reviesed, and one reviewer ask me do genetic algorithm coupled with gaussian process method (GA+GP). my data: training set: 191*106 test set: 73*106 here, I need use GA+GP to do variable selection when building the model. In R, there are not GA package like in matlab GA-toolbox(http://www.sheffield.ac.uk/acse/research/ecrg/gat.html) . now, I just can use the matlab GA-tool box, however, I can not use GP-toolbox in matlab. so I search the internet, find R package "genalg" can do GA. and an example given is to do wavelength selection by GA+PLS, so I think i certainly do the GA+GP. unfortunately, in this genalg package, i do not know how to extract the selected variables, it seems likes there is not such function. So I want to all friends help me to solve the reviewer's suggestion: do GA+GP and extract the optimal variables and get the some statistical parameters (i.e., cross-validation R2, pred R2 etc). now, I can do GA+svm to do variable selection and build the models and get some statistical paramets depicted above. GA: matlab GA toolbox (http://www.sheffield.ac.uk/acse/research/ecrg/gat.html) svm: libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) now I want to know, how to get the predicted values : In libsvm for example: cmd = ['-v ',num2str(v),' -c',num2str(cgp(nind,1)), '-g ',num2str(cgp(nind,2)),' -p ',num2str(cgp(nind,3)),' -s 3']; model = svmtrain(train_y,train_data_best,cmd); train_pred = svmpredict(train_y,train_data_best,model); % get the predicted values for the training set I can get the train_pred, likewise I can get the test_pred (tes_pred = svmpredict(test_y,test_data_best,model);) If I have the obsved train_y,test_y and the predicted train_pred and test_pred, some statistical parameter can be calculated. But For GP, how can i get the predicted values? (from GP website: http://www.gaussianprocess.org/gpml/code/matlab/doc/) prediction: [ymu ys2 fmu fs2 ] = gp(hyp, inf, mean, cov, lik, x, y, xs); here, the "ymu" are the predicted values that similar to "test_pred" in libsvm? I hope all friends can give me a hand, sincere there are little days i should upload my revised manuscript, but until now this quest can not be soved. thanks for your help. kevin -- View this message in context: http://r.789695.n4.nabble.com/help-complete-the-reviewer-s-suggest-carry-out-GA-GP-gaussian-process-tp3229097p3229097.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to estimate whether overfitting?
1. is there some criterion to estimate overfitting? e.g. R2 and Q2 in the training set, as well as R2 in the test set, when means overfitting. for example, in my data, I have R2=0.94 for the training set and for the test set R2=0.70, is overfitting? 2. in this scatter, can one say this overfitting? 3. my result is obtained by svm, and the sample are 156 and 52 for the training and test sets, and predictors are 96, In this case, can svm be employed to perform prediction? whether the number of the predictors are too many ? 4.from this picture, can you give me some suggestion to improve model performance? and is the picture bad? 5. the picture and data below. thank you! http://n4.nabble.com/file/n2164417/scatter.jpg scatter.jpg http://n4.nabble.com/file/n2164417/pkc-svm.txt pkc-svm.txt -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164417.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] update R 2.11.0,there is error when usi ng plot(), how can I do?
> a<-1:5 > b<-2:6 > plot(a,b) Error in function (width, height, pointsize, record, rescale, xpinch, : Graphics API version mismatch before, R 2.10 , plot() is ok. Now, R 2.11.0 does not work -- View this message in context: http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-when-using-plot-how-can-I-do-tp2164517p2164517.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to estimate whether overfitting?
thanks for your suggestion. many I need to learn indeed. I will buy that good book. kevin -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164847.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to estimate whether overfitting?
thank you, I have downloaded it. studying -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164932.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How to estimate whether overfitting?
many thanks . I can try to use test set with 100 samples. anther question is that how can I rationally split my data to training set and test set? (training set with 108 samples, and test set with 100 samples) as I know, the test set should the same distribute to the training set. and what method can deal with it to rationally split? and what packages in R can deal with splitting training/test set rationally question? if the split is random. it seems to need many times splits, and the average results consider as the final results. however, I want to several methods to perform split and get the firm training set and test set instead of random split. training set and test set should like this:ideally, the division must be performed sunch that points representing both traing and training set are distributed within the hole feature space occupied by the entire dataset, and each point of the test set is close to at least one point of the training set. this approach ensures that the similarity principle can be enmployed for the output prediction of the test set. Certainly,this condition can not always be satistied. thus, generally, what algorithms often be perform to split? and more rational? some paper often say, they split the data set randomly, thus, what is randomly? just selection random? or have some clear method? e.g. output order, I really know, which package can do with split data rationally? other, if one want to get the better results, some "tips" can be done. e.g. they can select test set again and again, and use the test set with best results as final test set and say that the test set was selectd randomly, but it is not true random, it is false. thank you, sorry to so many questions. but it puzzled me always. up to now, I have no good method to split rationally my data into training set and test set. at last, split training and test set should be done before modeling, and it seems that this can be done just from featrue? (som) ( or feature and output?(alogorithm spxy. paper:"a method for calibration and validation subset partioning") or just output?(output order)). but always, often there are many features to be calculated. and some featrue is zero or low standard deviation(sd<0.5), should we delete these features before split the whole data? and use the remaining feature to split data, and just using the training set to build the regression model and to perform feature selection as well as to do cross-validation, and the independent test set just used to test the built model, yes? maybe, my thinking is not clear about the whole model precess. but I think it is like this: 1) get samples 2) calculate features 3) preprocess features calculated (e.g.remove zero) 4)rational split data into training and test set (always puzzle me, how to split on earth?) 5)build model and at the same time tune parameter of model based on the resample methods using just training set. and get the final model. 6) test the model performance using independent test set (unseen samples). 7) estimate the model. good? or bad? overfitting? (generally, what case is overfitting? can you give me a example? as i know, it is overfitting when the trainging set fit good, but the independent test set is bad,but what is good ? what is bad?r2=0.94 in the training set and r2=0.70 in the test, in this case, the model is overfitting? the model can be accepted? and generally what model can be well accetpt?) 8) conclusion. how is the model. above is my thinking. and many question wait for answering. thanks kevin -- View this message in context: http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164960.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] update R 2.11.0,there is error when usi ng plot(), how can I do?
now. it is ok. I uninstall R2.11.0, then delete an packages in the library, and install again R2.11.0. ok, it does works. thank you! -- View this message in context: http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-when-using-plot-how-can-I-do-tp2164517p2165235.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.