from:"bbslover"

[R] How can I understand this sentence,and express it by means of Mathematical approach？

2010-03-07 Thread bbslover


This topic refer to independent variables reduction, as we know ,a lot of
method can do with it,however, for pre-processing independent varibles, a
method like the sentence below can reduce many variable, How can I
understand it?

what is  significant correlation at 5% level, what is the criterion？ P
value？or what？


"Independent variables whose correlation with the response variable was not
significant at 5% level were removed"

how can I calucate the correlation between them?

thank you!
-- 
View this message in context: 
http://n4.nabble.com/How-can-I-understand-this-sentence-and-express-it-by-means-of-Mathematical-approach-tp1584036p1584036.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] caret package, how can I deal with RFE+SVM wrong message?

2010-03-23 Thread bbslover


Hello, 
   
I am learning caret package, and I want to use the RFE to reduce the
feature. I want to use RFE coupled Random Forest (RFE+FR) to complete this
task. As we know, there are a number of pre-defined sets of functions, like
random Forest(rfFuncs), however,I want to tune the parameters (mtr) when
RFE, and then I write code below, but there is something wrong message, How
can I deal with it?  
> rfGrid<-expand.grid(.mtry=c(1:2))
> rfectrl<-rfeControl(functions=caretFuncs,method="cv",verbose=F,returnResamp="final",number=10)
> subsets<-c(3,4)
> set.seed(2)
> rf.RFE<-rfe(trx,try,sizes=subsets,rfeControl=rfectrl,method="rf",tuneGrid=rfGrid)
Loading required package: class

Attaching package: 'class'


The following object(s) are masked from package:reshape :

 condense 

Fitting: mtry=1 
Fitting: mtry=2 
Error in varImp.randomForest(object$finalModel, ...) : 
  subscript out of bounds
In addition: Warning message:
package 'e1071' was built under R version 2.10.1 


At the same time, If I want to  use RFE+SVM,  RFE+nnet, and so on ,how can I
do? I have try RFE+SVM, also wrong message:> set.seed(1)
> svmProfile<-rfe(trx,try,sizes=c(1:3),
+ rfeControl=rfeControl(functions=caretFuncs,method="cv",
+ verbose=F,returnResamp="final",number=10),
+ method="svmRadial",tuneLength=5)
Fitting: sigma=0.009246713, C=0.1 
Fitting: sigma=0.009246713, C=1 
Fitting: sigma=0.009246713, C=10 
Fitting: sigma=0.009246713, C=100 
Fitting: sigma=0.009246713, C=1000 
Error in rfeControl$functions$rank(fitObject, .x, y) : 
  need importance columns for each class

thank you!
-- 
View this message in context: 
http://n4.nabble.com/caret-package-how-can-I-deal-with-RFE-SVM-wrong-message-tp1678800p1678800.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how can I plot the histogram like this using R?

2010-04-13 Thread bbslover


I want to get the plot like this,
http://n4.nabble.com/file/n1839303/%25E9%25A2%2591%25E7%258E%2587%25E5%2588%2586%25E5%25B8%2583%25E5%259B%25BE%25E6%25A0%2587%25E5%2587%2586.jpg
%E9%A2%91%E7%8E%87%E5%88%86%E5%B8%83%E5%9B%BE%E6%A0%87%E5%87%86.jpg 

not this, http://n4.nabble.com/file/n1839303/R.jpg R.jpg 

and  the data here, thank you!   http://n4.nabble.com/file/n1839303/y1.txt
y1.txt 

can R deal with this problem?  how can I do?
-- 
View this message in context: 
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839303.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I plot the histogram like this using R?

2010-04-14 Thread bbslover


thanks for your help. I can have a try.
-- 
View this message in context: 
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839534.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I plot the histogram like this using R?

2010-04-14 Thread bbslover


thank you, I will try this function barplot.
-- 
View this message in context: 
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839541.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I plot the histogram like this using R?

2010-04-16 Thread bbslover


 Thanks for your reply, I just want to get the figure like y1.jpg using the
data from y1.txt.
 Through the figure  I want to obtain the split point like y1.jpg, and
consider 2.5 as the plit point.  This figure is drawn by other people, I
just want to draw it using R, but I can not, so I hope, friends can help me.
 
Best wishes!
kevin http://n4.nabble.com/file/n1965378/y1.jpg 
http://n4.nabble.com/file/n1965378/y1.txt y1.txt 
-- 
View this message in context: 
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1965378.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I plot the histogram like this using R?

2010-04-17 Thread bbslover


thanks, it is ok!
-- 
View this message in context: 
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p2013782.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data frame is killing me! help

2009-10-22 Thread bbslover


Usage 
data(gasoline) 
Format 
A data frame with 60 observations on the following 2 variables. 
octane 
a numeric vector. The octane number. 
NIR 
a matrix with 401 columns. The NIR spectrum

and I see the gasoline data to see below
NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 nm
NIR.1698 nm NIR.1700 nm 
1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 1.221135 
2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 1.198851 
3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 1.208742 
4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 1.206696 
5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 1.202926 
6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 1.207576 
7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 1.200446 
8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 1.188174 
9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 1.196102

look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm
NIR.1696 nm NIR.1698 nm NIR.1700 nm 

how can I add letters NIR to my variable, because my 600 independents never
have NIR as the prefix. however, it is needed to model the plsr.   for
example aa=plsr(y~NIR, data=data ,), the prefix NIR is necessary, how
can I do with it?
-- 
View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26015079.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame is killing me! help

2009-10-23 Thread bbslover




Steve Lianoglou-6 wrote:
> 
> Hi,
> 
> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
> 
>> Usage
>> data(gasoline)
>> Format
>> A data frame with 60 observations on the following 2 variables.
>> octane
>> a numeric vector. The octane number.
>> NIR
>> a matrix with 401 columns. The NIR spectrum
>>
>> and I see the gasoline data to see below
>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696  
>> nm
>> NIR.1698 nm NIR.1700 nm
>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913  
>> 1.221135
>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985  
>> 1.198851
>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321  
>> 1.208742
>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655  
>> 1.206696
>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864  
>> 1.202926
>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763  
>> 1.207576
>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73  
>> 1.200446
>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947  
>> 1.188174
>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883  
>> 1.196102
>>
>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>> 1694 nm
>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>
>> how can I add letters NIR to my variable, because my 600  
>> independents never
>> have NIR as the prefix. however, it is needed to model the plsr.   for
>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is  
>> necessary, how
>> can I do with it?
> 
> I'm not really sue that I'm getting you, but if your problem is that  
> the column names of your data.frame don't match the variable names  
> you'd like to use in your formula, just change the colnames of your  
> data.frame to match your formula.
> 
> BTW - I have no idea where to get this gasoline data set, so I'm just  
> imagining:
> 
> eg.
> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',  
> 'you', 'want', 'here')
> 
> -steve
> 
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>|  Memorial Sloan-Kettering Cancer Center
>|  Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

thanks for you. but the numbers of indenpendence are so many, it is not easy
to identify them one by one,  is there some better way?


-- 
View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame is killing me! help

2009-10-23 Thread bbslover


I have read that one ,I want to this method to be used to my data.but I donot
know how to put my data into R. 

James W. MacDonald wrote:
> 
> 
> 
> bbslover wrote:
>> 
>> 
>> Steve Lianoglou-6 wrote:
>>> Hi,
>>>
>>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>> Usage
>>>> data(gasoline)
>>>> Format
>>>> A data frame with 60 observations on the following 2 variables.
>>>> octane
>>>> a numeric vector. The octane number.
>>>> NIR
>>>> a matrix with 401 columns. The NIR spectrum
>>>>
>>>> and I see the gasoline data to see below
>>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696  
>>>> nm
>>>> NIR.1698 nm NIR.1700 nm
>>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913  
>>>> 1.221135
>>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985  
>>>> 1.198851
>>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321  
>>>> 1.208742
>>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655  
>>>> 1.206696
>>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864  
>>>> 1.202926
>>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763  
>>>> 1.207576
>>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73  
>>>> 1.200446
>>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947  
>>>> 1.188174
>>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883  
>>>> 1.196102
>>>>
>>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>>>> 1694 nm
>>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>>
>>>> how can I add letters NIR to my variable, because my 600  
>>>> independents never
>>>> have NIR as the prefix. however, it is needed to model the plsr.   for
>>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is  
>>>> necessary, how
>>>> can I do with it?
>>> I'm not really sue that I'm getting you, but if your problem is that  
>>> the column names of your data.frame don't match the variable names  
>>> you'd like to use in your formula, just change the colnames of your  
>>> data.frame to match your formula.
>>>
>>> BTW - I have no idea where to get this gasoline data set, so I'm just  
>>> imagining:
>>>
>>> eg.
>>> colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that',  
>>> 'you', 'want', 'here')
>>>
>>> -steve
>>>
>>> --
>>> Steve Lianoglou
>>> Graduate Student: Computational Systems Biology
>>>|  Memorial Sloan-Kettering Cancer Center
>>>|  Weill Medical College of Cornell University
>>> Contact Info: http://cbio.mskcc.org/~lianos/contact
>>>
>>> __
>>> R-help@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>> 
>> thanks for you. but the numbers of indenpendence are so many, it is not
>> easy
>> to identify them one by one,  is there some better way?
> 
> You don't need to identify anything. What you need to do is read the 
> help page for the function you want to use, so you (at the very least) 
> know how to use the function.
> 
>  > library(pls)
>  > data(gasoline)
>  > fit <- plsr(octane~NIR, data=gasoline, validation = "CV")
>  > summary(fit)
> Data: X dimension: 60 401
>   Y dimension: 60 1
> Fit method: kernelpls
> Number of components considered: 53
> 
> VALIDATION: RMSEP
> Cross-validated using 10 random segments.
> (Intercept)  1 comps  2 comps  3 comps  4 comps  5 comps  6 comps
> CV   1.5431.372   0.3827   0.2522   0.2347   0.2455   0.2281
> adjCV1.5431.367   0.3740   0.2497   0.2360   0.2407   0.2243
> 7 comps  8 comps  9 comps  10 comps  11 comps  12 comps  13 comps
> CV  0.2311   0.2352   0.24550.25340.27370.28140.2832
> adjCV   0.2257   0.2303   0.23950.24730.26

Re: [R] data frame is killing me! help

2009-10-24 Thread bbslover


I have try it, past can add to wanted letter, but can not past the colume
names. May be I should learn it hard.

Don MacQueen wrote:
> 
> At 4:57 AM -0700 10/23/09, bbslover wrote:
>>Steve Lianoglou-6 wrote:
>>>
>>>  Hi,
>>>
>>>  On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>>  Usage
>>>>  data(gasoline)
>>>>  Format
>>>>  A data frame with 60 observations on the following 2 variables.
>>>>  octane
>>>>  a numeric vector. The octane number.
>>>>  NIR
>>>>  a matrix with 401 columns. The NIR spectrum
>>>>
>>>>  and I see the gasoline data to see below
>>>>  NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 
>>>>  nm
>>>>  NIR.1698 nm NIR.1700 nm
>>>>  1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 
>>>>  1.221135
>>>>  2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 
>>>>  1.198851
>>>>  3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 
>>>>  1.208742
>>>>  4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 
>>>>  1.206696
>>>>  5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 
>>>>  1.202926
>>>>  6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 
>>>>  1.207576
>>>>  7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 
>>>>  1.200446
>>>>  8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 
>>>>  1.188174
>>>>  9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 
>>>>  1.196102
>>>>
>>>>  look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.
>>>>  1694 nm
>>>>  NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>>
>>>>  how can I add letters NIR to my variable, because my 600 
>>>>  independents never
>>>>  have NIR as the prefix. however, it is needed to model the plsr.   for
>>>>  example aa=plsr(y~NIR, data=data ,), the prefix NIR is 
>>>>  necessary, how
>>  >> can I do with it?
> 
> Perhaps using paste(). Maybe something like:
> 
> paste('NIR', 1:600,sep=''.)
> or
> paste('NIR', seq(1686,1700,2),sep='.')
> 
>>  >
>>>  I'm not really sue that I'm getting you, but if your problem is that 
>>>  the column names of your data.frame don't match the variable names 
>>>  you'd like to use in your formula, just change the colnames of your 
>>>  data.frame to match your formula.
>>>
>>>  BTW - I have no idea where to get this gasoline data set, so I'm just 
>>>  imagining:
>>>
>>>  eg.
>>>  colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', 
>>>  'you', 'want', 'here')
>>>
>>>  -steve
>>>
>>>  --
>>>  Steve Lianoglou
>>>  Graduate Student: Computational Systems Biology
>>> |  Memorial Sloan-Kettering Cancer Center
>>> |  Weill Medical College of Cornell University
>>>  Contact Info: http://*cbio.mskcc.org/~lianos/contact
>>>
>>>  __
>>>  R-help@r-project.org mailing list
>>>  https://*stat.ethz.ch/mailman/listinfo/r-help
>>>  PLEASE do read the posting guide
>>>  http://*www.*R-project.org/posting-guide.html
>>>  and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>thanks for you. but the numbers of indenpendence are so many, it is not
easy
>>to identify them one by one,  is there some better way?
>>
>>
>>--
>>View this message in context: 
>>http://*www.*nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html
>>Sent from the R help mailing list archive at Nabble.com.
>>
>>__
>>R-help@r-project.org mailing list
>>https://*stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
http://*www.*R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> -
> Don MacQueen
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 925-423-1062
> m...@llnl.gov
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26036875.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame is killing me! help

2009-10-24 Thread bbslover


thank you Don MacQueen , I will try it.


Don MacQueen wrote:
> 
> At 4:57 AM -0700 10/23/09, bbslover wrote:
>>Steve Lianoglou-6 wrote:
>>>
>>>  Hi,
>>>
>>>  On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>>  Usage
>>>>  data(gasoline)
>>>>  Format
>>>>  A data frame with 60 observations on the following 2 variables.
>>>>  octane
>>>>  a numeric vector. The octane number.
>>>>  NIR
>>>>  a matrix with 401 columns. The NIR spectrum
>>>>
>>>>  and I see the gasoline data to see below
>>>>  NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1696 
>>>>  nm
>>>>  NIR.1698 nm NIR.1700 nm
>>>>  1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 
>>>>  1.221135
>>>>  2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 
>>>>  1.198851
>>>>  3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 
>>>>  1.208742
>>>>  4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 
>>>>  1.206696
>>>>  5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 
>>>>  1.202926
>>>>  6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 
>>>>  1.207576
>>>>  7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 
>>>>  1.200446
>>>>  8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 
>>>>  1.188174
>>>>  9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 
>>>>  1.196102
>>>>
>>>>  look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.
>>>>  1694 nm
>>>>  NIR.1696 nm NIR.1698 nm NIR.1700 nm
>>>>
>>>>  how can I add letters NIR to my variable, because my 600 
>>>>  independents never
>>>>  have NIR as the prefix. however, it is needed to model the plsr.   for
>>>>  example aa=plsr(y~NIR, data=data ,), the prefix NIR is 
>>>>  necessary, how
>>  >> can I do with it?
> 
> Perhaps using paste(). Maybe something like:
> 
> paste('NIR', 1:600,sep=''.)
> or
> paste('NIR', seq(1686,1700,2),sep='.')
> 
>>  >
>>>  I'm not really sue that I'm getting you, but if your problem is that 
>>>  the column names of your data.frame don't match the variable names 
>>>  you'd like to use in your formula, just change the colnames of your 
>>>  data.frame to match your formula.
>>>
>>>  BTW - I have no idea where to get this gasoline data set, so I'm just 
>>>  imagining:
>>>
>>>  eg.
>>>  colnames(gasoline) <- c('put', 'the', 'variable', 'names', 'that', 
>>>  'you', 'want', 'here')
>>>
>>>  -steve
>>>
>>>  --
>>>  Steve Lianoglou
>>>  Graduate Student: Computational Systems Biology
>>> |  Memorial Sloan-Kettering Cancer Center
>>> |  Weill Medical College of Cornell University
>>>  Contact Info: http://*cbio.mskcc.org/~lianos/contact
>>>
>>>  __
>>>  R-help@r-project.org mailing list
>>>  https://*stat.ethz.ch/mailman/listinfo/r-help
>>>  PLEASE do read the posting guide
>>>  http://*www.*R-project.org/posting-guide.html
>>>  and provide commented, minimal, self-contained, reproducible code.
>>>
>>>
>>
>>thanks for you. but the numbers of indenpendence are so many, it is not
easy
>>to identify them one by one,  is there some better way?
>>
>>
>>--
>>View this message in context: 
>>http://*www.*nabble.com/data-frame-is-killing-me%21-help-tp26015079p26024985.html
>>Sent from the R help mailing list archive at Nabble.com.
>>
>>__
>>R-help@r-project.org mailing list
>>https://*stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
http://*www.*R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
> 
> 
> -- 
> -
> Don MacQueen
> Lawrence Livermore National Laboratory
> Livermore, CA, USA
> 925-423-1062
> m...@llnl.gov
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/data-frame-is-killing-me%21-help-tp26015079p26036836.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how can I kown which package is added, or updated?

2009-10-24 Thread bbslover


there are many R packages, yesterday, 2031 but today 2033 packages. how can I
kown which package is added, or updated?
-- 
View this message in context: 
http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26037150.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I kown which package is added, or updated?

2009-10-24 Thread bbslover


It is so dramatical. Thank Gabor Grothendieck . I got it.

Gabor Grothendieck wrote:
> 
> Google for
> CRANberries aggregates
> and check first hit.
> 
> On Sat, Oct 24, 2009 at 4:44 AM, bbslover  wrote:
>>
>> there are many R packages, yesterday, 2031 but today 2033 packages. how
>> can I
>> kown which package is added, or updated?
>> --
>> View this message in context:
>> http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26037150.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26038761.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] data frame is killing me! help

2009-10-26 Thread bbslover


Thank you ,Petr
It is a good answer,clearly.

thanks! 

Petr Pikal wrote:
> 
> Hi
> 
>> data(gasoline)
>> str(gasoline)
> 'data.frame':   60 obs. of  2 variables:
>  $ octane: num  85.3 85.2 88.5 83.4 87.9 ...
>  $ NIR   : AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 
> -0.050859 ...
>   ..- attr(*, "dimnames")=List of 2
>   .. ..$ : chr  "1" "2" "3" "4" ...
>   .. ..$ : chr  "900 nm" "902 nm" "904 nm" "906 nm" ...
>> str(gasoline$NIR)
>  AsIs [1:60, 1:401] -0.050193 -0.044227 -0.046867 -0.046705 -0.050859 ...
>  - attr(*, "dimnames")=List of 2
>   ..$ : chr [1:60] "1" "2" "3" "4" ...
>   ..$ : chr [1:401] "900 nm" "902 nm" "904 nm" "906 nm" ...
>> is.matrix(gasoline$NIR)
> [1] TRUE
> 
> so the second element of gasoline data frame is a matrix
> 
>> ?AsIs
> 
>> df<-data.frame(x=1:5, I(matrix(rnorm(10), 5,2)))
>> df
>   x matrix.rnorm.10...5..2..1 matrix.rnorm.10...5..2..2
> 1 1  0.187703  0.213312
> 2 2  -0.66264  -0.47941
> 3 3  -0.82334  -0.04324
> 4 4  -0.37255  0.883027
> 5 5  -0.28700  -1.03431
>> str(df)
> 'data.frame':   5 obs. of  2 variables:
>  $ x  : int  1 2 3 4 5
>  $ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264 
> -0.82334 -0.37255 -0.28700 ...
>> 
> 
> Regards
> Petr
> 
> r-help-boun...@r-project.org napsal dne 23.10.2009 18:43:56:
> 
>> 
>> I have read that one ,I want to this method to be used to my data.but I 
> donot
>> know how to put my data into R. 
>> 
>> James W. MacDonald wrote:
>> > 
>> > 
>> > 
>> > bbslover wrote:
>> >> 
>> >> 
>> >> Steve Lianoglou-6 wrote:
>> >>> Hi,
>> >>>
>> >>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>> >>>
>> >>>> Usage
>> >>>> data(gasoline)
>> >>>> Format
>> >>>> A data frame with 60 observations on the following 2 variables.
>> >>>> octane
>> >>>> a numeric vector. The octane number.
>> >>>> NIR
>> >>>> a matrix with 401 columns. The NIR spectrum
>> >>>>
>> >>>> and I see the gasoline data to see below
>> >>>> NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm 
> NIR.1696 
>> >>>> nm
>> >>>> NIR.1698 nm NIR.1700 nm
>> >>>> 1 1.242645 1.250789 1.246626 1.250985 1.264189 1.244678 1.245913 
>> >>>> 1.221135
>> >>>> 2 1.189116 1.223242 1.253306 1.282889 1.215065 1.225211 1.227985 
>> >>>> 1.198851
>> >>>> 3 1.198287 1.237383 1.260979 1.276677 1.218871 1.223132 1.230321 
>> >>>> 1.208742
>> >>>> 4 1.201066 1.233299 1.262966 1.272709 1.211068 1.215044 1.232655 
>> >>>> 1.206696
>> >>>> 5 1.259616 1.273713 1.296524 1.299507 1.226448 1.230718 1.232864 
>> >>>> 1.202926
>> >>>> 6 1.24109 1.262138 1.288401 1.291118 1.229769 1.227615 1.22763 
>> >>>> 1.207576
>> >>>> 7 1.245143 1.265648 1.274731 1.292441 1.218317 1.218147 1.73 
>> >>>> 1.200446
>> >>>> 8 1.222581 1.245782 1.26002 1.290305 1.221264 1.220265 1.227947 
>> >>>> 1.188174
>> >>>> 9 1.234969 1.251559 1.272416 1.287405 1.211995 1.213263 1.215883 
>> >>>> 1.196102
>> >>>>
>> >>>> look at this NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR. 
>> >>>> 1694 nm
>> >>>> NIR.1696 nm NIR.1698 nm NIR.1700 nm
>> >>>>
>> >>>> how can I add letters NIR to my variable, because my 600 
>> >>>> independents never
>> >>>> have NIR as the prefix. however, it is needed to model the plsr. 
> for
>> >>>> example aa=plsr(y~NIR, data=data ,), the prefix NIR is 
>> >>>> necessary, how
>> >>>> can I do with it?
>> >>> I'm not really sue that I'm getting you, but if your problem is that 
>  
>> >>> the column names of your data.frame don't match t

[R] how can I convert .csv format to matrix???

2009-11-02 Thread bbslover


In my disk C:/ have a  a.csv file, I want to read it  to R, importantly, when
I use x=read.csv("C:/a.csv") ,the x format  is  data.frame,  I want to it to
become matrix format,  how can I do it ?

thank you! 
-- 
View this message in context: 
http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26156643.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how can I convert .csv format to matrix???

2009-11-03 Thread bbslover


thank you for your help,it is a good way.


Steven Kang wrote:
> 
> can try
> 
>  matrix.x <- as.matrix(x)
> 
> On Mon, Nov 2, 2009 at 8:38 PM, bbslover  wrote:
> 
>>
>> In my disk C:/ have a  a.csv file, I want to read it  to R, importantly,
>> when
>> I use x=read.csv("C:/a.csv") ,the x format  is  data.frame,  I want to it
>> to
>> become matrix format,  how can I do it ?
>>
>> thank you!
>> --
>> View this message in context:
>> http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26156643.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/how-can-I-convert-.csv-format-to-matrixtp26156643p26191114.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] variable selectin---reduce the numbers of initial variable

2009-11-04 Thread bbslover


hello, 

my problem is like this: now after processing the varibles, the remaining
160 varibles(independent) and a dependent y. when I used PLS method, with 10
components, the good r2 can be obtained. but I donot know how can I express
my equation with the less varibles and the y. It is better to use less
indepent varibles.  that is how can I select my indepent varibles.   Maybe
GA  is good method, but now I donot gasp it. and can you give me more good
varibles selection's methods.   and In R, which method can be used to select
the potent varibles .  and using the selected varibles to model a equation
with higher r2, q2,and less RMSP.

thank you!
-- 
View this message in context: 
http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26195345.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variable selectin---reduce the numbers of initial variable

2009-11-04 Thread bbslover


thank you . I can try bayesian. PCA method that I used to is can get some
pcs, but I donot know how can i use the original variables in that equation,
maybe I should select those have high weight ones,and delete that less
weight ones. right?

Ricardo Gonçalves Silva wrote:
> 
> Hi,
> 
> Nowdays there's a lot o new variable selection methods, specially using
> the 
> Bayes Paradigm.
> For your problem, I think you could try the Bayesian Model Average BMA 
> package.
> Or, you can reduce your data dimension by PCA, which also permits you see 
> the weight of
> each variable in the PC.
> 
> HTH
> 
> Rick
> 
> --
> From: "bbslover" 
> Sent: Wednesday, November 04, 2009 10:23 AM
> To: 
> Subject: [R]  variable selectin---reduce the numbers of initial variable
> 
>>
>> hello,
>>
>> my problem is like this: now after processing the varibles, the remaining
>> 160 varibles(independent) and a dependent y. when I used PLS method, with 
>> 10
>> components, the good r2 can be obtained. but I donot know how can I 
>> express
>> my equation with the less varibles and the y. It is better to use less
>> indepent varibles.  that is how can I select my indepent varibles.  
>> Maybe
>> GA  is good method, but now I donot gasp it. and can you give me more
>> good
>> varibles selection's methods.   and In R, which method can be used to 
>> select
>> the potent varibles .  and using the selected varibles to model a
>> equation
>> with higher r2, q2,and less RMSP.
>>
>> thank you!
>> -- 
>> View this message in context: 
>> http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26195345.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 9.0.698 / Virus Database: 270.14.48/2479 - Release Date:
>> 11/03/09 
>> 17:38:00
>>
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/variable-selectin---reduce-the-numbers-of-initial-variable-tp26195345p26207750.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] variable selectin---reduce the numbers of initial variable

2009-11-05 Thread bbslover

he number of variables. I've had
> a lot of success with it.
> 
> Max
> 
> 
> 2009/11/5 Ricardo Gonçalves Silva :
>> Hi Guys,
>>
>> Of course, a backward, forward, or other methods can be used directly.
>> But
>> concerning BMA, the model interpretation is far simple:
>>
>> "Bayesian Model Averaging accounts for the model uncertainty inherent in
>> the
>> variable selection problem by averaging over the best models in the model
>> class according to approximate posterior model probability."
>>
>> If you want to learn a few more before continue, that a look at the BMA
>> homepage:
>>
>> http://www2.research.att.com/~volinsky/bma.html
>>
>> But of course, you must do what you think is better for your problem.
>> By the way what is the dimension of your problem?
>>
>> HTH,
>>
>> Rick
>> --
>> From: "Frank E Harrell Jr" 
>> Sent: Thursday, November 05, 2009 4:12 PM
>> To: "Ricardo Gonçalves Silva" 
>> Cc: "bbslover" ; 
>> Subject: Re: [R] variable selectin---reduce the numbers of initial
>> variable
>>
>>> Ricardo Gonçalves Silva wrote:
>>>>
>>>> Yes, right. But I still prefer using BMA.
>>>> Best,
>>>>
>>>> Rick
>>>
>>> If you are entertaining only one model family, them BMA is a long,
>>> tedious, complex way to obtain shrinkage and the resulting averaged
>>> model is very difficult to interpret.  Consider a more direct approach.
>>>
>>> Frank
>>>
>>>>
>>>> --
>>>> From: "bbslover" 
>>>> Sent: Wednesday, November 04, 2009 11:28 PM
>>>> To: 
>>>> Subject: Re: [R] variable selectin---reduce the numbers of initial
>>>> variable
>>>>
>>>>>
>>>>> thank you . I can try bayesian. PCA method that I used to is can get
>>>>> some
>>>>> pcs, but I donot know how can i use the original variables in that
>>>>> equation,
>>>>> maybe I should select those have high weight ones,and delete that less
>>>>> weight ones. right?
>>>>>
>>>>> Ricardo Gonçalves Silva wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Nowdays there's a lot o new variable selection methods, specially
>>>>>> using
>>>>>> the
>>>>>> Bayes Paradigm.
>>>>>> For your problem, I think you could try the Bayesian Model Average
>>>>>> BMA
>>>>>> package.
>>>>>> Or, you can reduce your data dimension by PCA, which also permits you
>>>>>> see
>>>>>> the weight of
>>>>>> each variable in the PC.
>>>>>>
>>>>>> HTH
>>>>>>
>>>>>> Rick
>>>>>>
>>>>>> --
>>>>>> From: "bbslover" 
>>>>>> Sent: Wednesday, November 04, 2009 10:23 AM
>>>>>> To: 
>>>>>> Subject: [R]  variable selectin---reduce the numbers of initial
>>>>>> variable
>>>>>>
>>>>>>>
>>>>>>> hello,
>>>>>>>
>>>>>>> my problem is like this: now after processing the varibles, the
>>>>>>> remaining
>>>>>>> 160 varibles(independent) and a dependent y. when I used PLS method,
>>>>>>> with
>>>>>>> 10
>>>>>>> components, the good r2 can be obtained. but I donot know how can I
>>>>>>> express
>>>>>>> my equation with the less varibles and the y. It is better to use
>>>>>>> less
>>>>>>> indepent varibles.  that is how can I select my indepent varibles.
>>>>>>> Maybe
>>>>>>> GA  is good method, but now I donot gasp it. and can you give me
>>>>>>> more
>>>>>>> good
>>>>>>> varibles selection's methods.   and In R, which method can be used
>>>>>>> to
>>>>>>> select
>>>>>>> the potent varibles .  and using the selected varibles to model a
>>>>>>>

[R] how can I delete those columes with the same element in every row?

2009-11-06 Thread bbslover


e.g.  

a=
  a b c d e
1 1 1 3 1 1
2 1 2 3 4 5
3 1 3 3 8 3
4 1 4 3 3 5
5 1 1 3 1 1I want to delete  colume a  and colume c, because they
have the same values in every row, then ,I want to get this data.frame .

b=
  b d  e
1 1 1  1
2 2 4  5
3 3 8  3
4 4 3  5
5 1 1  1the following is my code but it's wrong.

rm(list=ls())
a=c(1,1,1,1,1); b=c(1,2,3,4,1); c=c(3,3,3,3,3); d=c(1,4,8,3,1);
e=c(1,5,3,5,1) 
data.f=data.frame(a,b,c,d,e)
origin.data<-data.f
dim.frame=dim(data.f)
rn=dim.frame[1] 
n<-0
for (k in 1:(dim.frame[2]-n)) 
{if (data.f[1,k]==data.f[rn,k]) 
  {data.f<-data.f[,-k]
   n<-n+1
   k<-k-1
   }
}
origin.data
data.f  


how can i modify it and obtain my wanted result.

thank you!
-- 
View this message in context: 
http://old.nabble.com/how-can-I-delete-those-columes-with-the-same-element-in-every-row--tp26227873p26227873.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] another question: how to delete one of columes in two ones with high correlation(0.95)

2009-11-06 Thread bbslover


my programe is below:
a=c(1,2,1,1,1); b=c(1,2,3,4,1); c=c(3,4,3,3,3); d=c(1,2,3,5,1);
e=c(1,5,3,5,1) 
data.f=data.frame(a,b,c,d,e)
origin.data<-data.f
cor.matrix<-cor(origin.data)
origin.cor<-cor.matrix
m<-0
for(i in 1:(cor.matrix[1]-1))
{
  for(j in (i+1):(cor.matrix[2]))
   {
  if (cor.matrix[i,j]>=0.95)
  {
  data.f<-data.f[,-i];
   i<-i+1
  } 
   }
}
origin.cor
data.f

the result seems to be not righ.
 origin.cor
   a  b  c  d e
a  1.000 -0.0857493  1.000 -0.1336306 0.5590170
b -0.0857493  1.000 -0.0857493  0.9854509 0.7669650
c  1.000 -0.0857493  1.000 -0.1336306 0.5590170
d -0.1336306  0.9854509 -0.1336306  1.000 0.7470179
e  0.5590170  0.7669650  0.5590170  0.7470179 1.000
> data.f
  b c d e
1 1 3 1 1
2 2 4 2 5
3 3 3 3 3
4 4 3 5 5
5 1 3 1 1

either colume b or colume d shold be deleted ,for they hight
correlation(0.9854509), but the result not,why?

-- 
View this message in context: 
http://old.nabble.com/another-question%3A-how-to-delete-one-of-columes-in-two-ones-with-high-correlation%280.95%29-tp26228174p26228174.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] another question: how to delete one of columes in two ones with high correlation(0.95)

2009-11-07 Thread bbslover


thank you. I need learn it, after that, maybe I can understant it well.

thank Nikhil

Nikhil Kaza-2 wrote:
> 
> You need  dim(cor.matrix)[1]
> 
> Following might be better instead of a loop, to to get the row ids of  
> a matrix
> 
> (which(cor.matrix >=0.95) %/% dim(cor.matrix)[1])+1
> 
> for column ids use modulus instead of integer divison.
> 
> (which(cor.matrix >=0.95) %% dim(cor.matrix)[1])
> 
> There are probably better ways than this.
> 
> Nikhil
> 
> but probably a better way to do this would be
> 
> On 6 Nov 2009, at 3:16AM, bbslover wrote:
> 
>> for(i in 1:(cor.matrix[1]-1))
>> {
>>  for(j in (i+1):(cor.matrix[2]))
>>   {
>>  if (cor.matrix[i,j]>=0.95)
>>  {
>>  data.f<-data.f[,-i];
>>   i<-i+1
>>  }
>>   }
>> }
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/another-question%3A-how-to-delete-one-of-columes-in-two-ones-with-high-correlation%280.95%29-tp26228174p26240884.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] after PCA, the pc values are so large, wrong?

2009-11-07 Thread bbslover


rm(list=ls())
yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y<-yx.df[,1]
x<-yx.df[,2:643]
#conver to matrix
mat<-as.matrix(x)
#get row number
rownum<-nrow(mat)
#remove the constant parameters
mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum])))]
dim(yx.df)
dim(mat1)
#remove columns with numbers of zero >0.95 
mat2<-mat1[,apply(mat1,2,function(.col)!(sum(.col==0)/rownum>0.95))] 
dim(yx.df)
dim(mat2)
#remove colunms that sd<0.5
mat3<-mat2[,apply(mat2,2,function(.col)!all(sd(.col)<0.5))]
dim(yx.df)
dim(mat3)
#PCA analysis
mat3.pr<-prcomp(mat3,cor=T)
summary(mat3.pr,loading=T)
pre.cmp<-predict(mat3.pr)
cmp<-pre.cmp[,1:3]
cmp
DF<-cbind(Y,cmp) 
DF<-as.data.frame(DF)
names(DF)<-c('y','p1','p2','p3')
DF
summary(lm(y~p1+p2+p3,data=DF))
mat3.pr<-prcomp(DF,cor=T)
summary(mat3.pr)
pre<-predict(mat3.pr)
pre1<-pre[,1:3]
pre1
colnames(pre1)<-c("x1","x2","x3")
pre1
pc<-cbind(y,pre1)
pc<-as.data.frame(pc)
lm.pc<-lm(y~x1+x2+x3,data=pc)
summary(lm.pc)

above, my code about pca, but after finishing it, the first three pcs are
some large, why? and the fit value

r2 are bad.   belowe is my value on the firest 3 pcs.
> pre1
  PC1  PC2  PC3
 [1,] -15181.5190  1944.392700 -1074.326182
 [2,] -32152.4533  1007.113729  3201.361408
 [3,] -15836.5362  2117.988273  -555.799383
 [4,]  -1618.5561  1481.020337   255.530132
 [5,]  -5407.5030  1975.779398   -84.646283
 [6,]  -9662.1949  2611.220928  -417.435782
 [7,] -30488.2102   577.385588  1853.420297
 [8,]  -2135.2563 -4506.112873  1382.413284
 [9,]  -1584.2796 -4645.142062   929.146895
[10,]   -668.7664 -4876.250486   177.691446
[11,]  -2188.5914 -4495.203080  1432.428127
[12,] -19633.9581  2159.000138 -1598.710872
[13,] -26849.1088  -515.574085 -2683.552623
[14,]  -9492.9503 -4868.648205  1236.986097
[15,] -13857.6517 -4810.228193  1296.342199
[16,] -11596.5097 -8181.631403   462.913210
[17,] -25948.6564  -746.442386 -3415.426682
[18,]  15386.4477   709.974524   555.160973
[19,]  21642.7516  1163.456075  -609.437740
[20,]  22236.7094   675.562564  -136.992578
[21,]  14354.9927   611.996274-4.867054
[22,]  12569.9493  .842240   585.540985
[23,]  20739.0219  3078.679745  1662.902248
[24,]   9472.0249   648.769910   381.487034
[25,]  17299.5307  1424.712428  1522.311676
[26,]  13231.2735   587.761915   170.448061
[27,]  10843.5590   705.485396   -79.931518
[28,]   9402.8803 -1978.216853 -1534.244078
[29,]  13094.9525   212.042937  -363.941664
[30,]   9337.3522   537.885230   189.558999
[31,]   7747.1347  -141.004825 -1664.082447
[32,]   4640.1161 -1489.652284 -3584.574135
[33,]  13241.5054   175.630689  -486.250927
[34,]   3867.2204   814.830143  1584.358007
[35,]   8614.5030   708.274447   814.295587
[36,] -18815.6774  -480.311541  1248.369916
[37,]  -1860.0810  1195.557861   269.322703
[38,]   7172.0057 4.216905 -1191.448702
[39,]  -7233.2271 -2361.951658  -235.293358
[40,]   1841.3548  1187.225488   632.116420
[41,]  12465.2336   367.822405   160.751014
[42,] -39021.7259  1972.333778  3167.504098
[43,]  13098.7736  -424.152058  -567.846037
[44,]   9793.7729  -559.084900  -210.696126
[45,]  13111.186122.772626  -318.242722
[46,]  13169.0604 7.808885  -363.995563
[47,]   3306.6293  -694.908211  -642.996604
[48,]  10779.8582  -989.175596 -1619.861931
[49,]  10872.6913  -747.979343 -1375.317959
[50,]  -3057.5633  1838.449143  1454.886518
[51,]  -6854.9316  2338.753165  1113.510561
[52,] -15077.1823  1917.776905 -1158.158633
[53,] -45862.8305  1173.157521 -1707.293955
[54,] -14294.1553  1716.708462 -1794.064434
[55,]  24645.0508  2519.904889  1424.233563
[56,]  23303.5998  2250.088386   839.587354
[57,]  18865.5231   897.56644636.240598
[58,]227.2659 -6582.661199  -712.892569
[59,]  15336.8371   722.953549   593.903314
[60,]  13030.8715   228.509670  -312.933654
[61,]   5826.0388   331.077814   -53.417878
[62,]  13150.4446  -437.612023  -608.342969
[63,]  11728.3897   -83.151510   569.007995
[64,]  11021.5720  -869.425283 -1216.724017
[65,]   9625.3142   137.388994   138.735249
[66,] -15905.2704  3735.547166   421.846379
[67,] -15539.7628  3331.399648   104.886572
[68,]  -2294.9924  1648.164750   822.075221
[69,] -10120.0153  1558.766306  -333.378256
[70,] -24241.4554  -533.700229  1516.603088
[71,]  -1036.6022 -4782.136067   475.195011
[72,] -24575.2244  2655.599986 -1965.946921

the fit result below:
Call:
lm(formula = y ~ x1 + x2 + x3, data = pc)

Residuals:
 Min   1Q   Median   3Q  Max 
-1.29638 -0.47622  0.01059  0.49268  1.69335 

Coefficients:
  Estimate Std. Error t value Pr(>|t|)
(Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
x1  -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
x2  -4.095e-05  3.448e-05  -1.1880.239
x3  -8.106e-05  6.412e-05  -1.2640.210
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.691 on 68 degrees of freedom
Multiple R-squared: 0.

Re: [R] after PCA, the pc values are so large, wrong?

2009-11-08 Thread bbslover


ok,I understand your means, maybe PLS is better for my aim. but I have done
that, also bad. the most questions for me is how to select less variables
from the independent to fit dependent. GA maybe is good way, but I do not
learn it well.

Ben Bolker wrote:
> 
> bbslover  yeah.net> writes:
> 
>> 
> [snip]
> 
>> the fit result below:
>> Call:
>> lm(formula = y ~ x1 + x2 + x3, data = pc)
>> 
>> Residuals:
>>  Min   1Q   Median   3Q  Max 
>> -1.29638 -0.47622  0.01059  0.49268  1.69335 
>> 
>> Coefficients:
>>   Estimate Std. Error t value Pr(>|t|)
>> (Intercept)  5.613e+00  8.143e-02  68.932  < 2e-16 ***
>> x1  -3.089e-05  5.150e-06  -5.998 8.58e-08 ***
>> x2  -4.095e-05  3.448e-05  -1.1880.239
>> x3  -8.106e-05  6.412e-05  -1.2640.210
>> ---
>> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
>> 
>> Residual standard error: 0.691 on 68 degrees of freedom
>> Multiple R-squared: 0.3644, Adjusted R-squared: 0.3364 
>> F-statistic: 12.99 on 3 and 68 DF,  p-value: 8.368e-07 
>> 
>> x2,x3 is not significance. by pricipal, after PCA, the pcs should
>> significance, but my data is not, why? 
> 
>   Why is it necessary that the first few principal components
> should have significant relationships with some other response
> values?  The strength, and weakness, of PCA is that it is
> calculated *without regard* to a response variable, so it
> does not constitute "data snooping" ... 
>   I may of course have misinterpreted your question, but at
> a quick look, I don't see anything obviously wrong here.
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/after-PCA%2C-the-pc-values-are-so-large%2C-wrong--tp26240926p26251658.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to remove one of any two indices with a correlation greater than 0.90 in a matrix (or data.frame)

2009-11-08 Thread bbslover


my code is not right below:
rm(list=ls())
#define data.frame
a=c(1,2,3,5,6); b=c(1,2,3,4,7); c=c(1,2,3,4,8); d=c(1,2,3,5,1);
e=c(1,2,3,5,7) 
data.f=data.frame(a,b,c,d,e) 
#backup data.f
origin.data<-data.f 
#get correlation matrix
cor.matrix<-cor(origin.data) 
#backup correlation matrix
origin.cor<-cor.matrix
#get dim
dim.cor<-dim(origin.cor) 
#perform Loop
n<-0
for(i in 1:(dim.cor[1]-1)) 
{ 
  for(j in (i+1):(dim.cor[2]))
   { 
  if (cor.matrix[i,j]>=0.95) 
  { 
  data.f<-data.f[,-(i-n)] 
  n<-1
  break
  } 
} 
} 
origin.cor 
origin.data
data.f 
cor(data.f)

how write the code to do with my questions? and have a simple way?  
-- 
View this message in context: 
http://old.nabble.com/how-to-remove-one-of-any-two-indices-with-a-correlation-greater-than-0.90-in-a-matrix-%28or-data.frame%29-tp26254082p26254082.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help, GA for varialbe selection. ?!

2009-11-09 Thread bbslover


Dear all,
   I am learning the subselect package in R, now I want to use GA to select
some potent variable, but some questions are puzzled.
   what i want to resolve is that I have one column dependent y and 219
columns independent x.   A total 72 observations is contained in the
dataset. I want to select some variables to fit to y. GA is good method to
do it as many paper said. but they often test the model by some
criteria(i.e. LOO r2 (q2), standard deviaton(sd), etc.). can the "subselect
package" get these index(LOO r2,sd) ?
The example in the subselcet manual:
  data(swiss)
  genetic(cor(swiss),3,4,popsize=10,nger=5,criterion="Rv") is just
about matrix X, refer to my data is 72 X 219 matrix. how can I correlate
with dependent y(72 X 1 matrix)?  I think whether I can use GA to select
some variable only from X matrix(independent), and then use the best subset
to fit y(dependent) using another algorithm(not containing the subselect
package),and then calculate LOO r2 and sd etc. ?
 The last questions is that, these 219 indepents have correlation each
other, need they pre-deal with befor applying GA selection? e.g.remove one
of any two indices  with a correlation greater than 0.90.  
   I send my data to  the accessory. it is a csv file. the first column is
dependent (y),and the remaining columns are independent. I want to model a
equation with good statistic result(high LOO r2, low sd). HOw Can do.
Thank you!

Best wishes
Kevinhttp://old.nabble.com/file/p26276493/mk1.df.csv mk1.df.csv 
-- 
View this message in context: 
http://old.nabble.com/help%2C-GA-for-varialbe-selection.--%21-tp26276493p26276493.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How can I remove one of variables which both variables have correlation coefficient more than 0.95 randomly?

2009-11-20 Thread bbslover


http://old.nabble.com/file/p26443595/Edragonr.txt Edragonr.txt 
HI all,

I have a 72*495 matrix, and the first column is the response, and the
remaining are independences. Final I want to select some independence to fit
y, but there are so many independences, the fit result is not meaning, so I
want to reduce the independece,now. Which method or R package or algorithms
in R package can deal with this problem?

next question, firstly, I want to check the pair correlation coeficient,and
want to  remove one of variables which both variables have correlation
coefficient more than 0.95 randomly?  NOTE, This is random. Before I write a
programe that can delete correlation variable onlly the first variable, It
is not scientific.so I hope all friends can help me write a programe to,
randomly,remove one of variables which both variables have correlation
coefficient more than 0.95 ?  At last, I use those selected variables to fit
y, and hope the regression result is beter with correlation coefficient (r2)
more than 0.7  at less.

n<-0
for(i in 1:(dim.cor[1]-1)) 
{ 
  for(j in (i+1):(dim.cor[2]))
   { 
  if (mat3.cor[i,j]>=0.90) 
  { 
  mat3<-mat3[,-(i-n)] 
  n<-n+1
  break 
  } 
} 
}   this is my code,but it is not scientific like I said above.

And I upload my file.  Hoping all friends can help me.
-- 
View this message in context: 
http://old.nabble.com/How-can-I-remove-one-of-variables-which-both-variables-have-correlation-coefficient-more-than-0.95-randomly--tp26443595p26443595.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] p.value OR F.value?

2009-11-21 Thread bbslover


Hi，all friends,
 
Please help me understand this sentence below:
   “From this set, 858 columns not significantly correlated with the
response variable TBG at the 5% level were removed, leaving a set of 390
columns.”  and “ the F-test's value for the one-parameter correlation with
the descriptor is below 1.0” is equal??  I want to perform this above
sentence with R, how can I do?  I just try it below. but I do not know right
or wrong?

about the above sentence, my idea is like this  p.value<0.5, and i write a
code to perform it below:


xmat4<-xmat3[,apply(xmat3,2,function(.col)!all(var.test(.col,y)$p.value<0.05))] 
, is right?  does the above sentence refer to p.value or F.value?  I do not
know, please help me!   And how can I get the F.value?

   About this sentence "A further 367 columns with variance below 1.0
kcal/mol were removed as recommended,16 leaving 23 columns."  
my code below:   
xmat3<-xmat2[,apply(xmat2,2,function(.col)!all(var(.col)<1))],  can I change
the var to sd?  I have tried it. They have the same result, generally
speaking， which one will be used to see the variation of the data?

Thank you!

kevin
-- 
View this message in context: 
http://old.nabble.com/p.value---OR---F.value--tp26456379p26456379.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Hello all, How can I get corss-validation MSE of SVM in e1071?

2009-12-18 Thread bbslover


as known, svm need tune some parameters like  cost,gamma and epsilon to get
better performance,but one question appear, how can i monitor the
performance . generally speaking ,we chose the cross-validation MSE in the
training set, but It seems svm can not return the cross-validation MSE
value, we just get it from "summary model.svm", if I write a loop, have no
idear call the cros-validate MSE, and no way to monitor this performance
,how can I do?  


-- 
View this message in context: 
http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071?

2009-12-19 Thread bbslover


thank you for your help, caret package is so powerful , it can do many
things. I now, need learn how to apply to my problems.

Max Kuhn wrote:
> 
> You can get this using the caret package. There are a few package
> vignettes that come with the package and a JSS article
> 
>   http://www.jstatsoft.org/v28/i05/paper
> 
> about the package.
> 
> Max
> 
> On Fri, Dec 18, 2009 at 12:26 PM, bbslover  wrote:
>>
>> as known, svm need tune some parameters like  cost,gamma and epsilon to
>> get
>> better performance,but one question appear, how can i monitor the
>> performance . generally speaking ,we chose the cross-validation MSE in
>> the
>> training set, but It seems svm can not return the cross-validation MSE
>> value, we just get it from "summary model.svm", if I write a loop, have
>> no
>> idear call the cros-validate MSE, and no way to monitor this performance
>> ,how can I do?
>>
>>
>> --
>> View this message in context:
>> http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> 
> Max
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975231.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071?

2009-12-20 Thread bbslover

Hello, Max
  The caret package is so good, I am learning it, but one problem is that 
nearZeroVar function can be used to identify near zeroâvariance variables and 
it only identify, how can I remove those variables that were identify, because 
I have many zero- or near zero- ones, it is not realistic to removel it by 
hand, can this function can identify and removel those ones automatically?

looking for your reply.

kevin

å¨2009-12-19ï¼"Max Kuhn [via R]"  
åéï¼ -åå§é®ä»¶-
åä»¶äºº:"Max Kuhn [via R]" 
åéæ¶é´:2009å¹´12æ19æ¥ ææå
æ¶ä»¶äºº:bbslover 
ä¸»é¢:Re: [R] Hello all, How can I get corss-validation MSE of SVM in e1071?

You can get this using the caret package. There are a few package 
vignettes that come with the package and a JSS article 

 http://www.jstatsoft.org/v28/i05/paper

about the package. 

Max 

On Fri, Dec 18, 2009 at 12:26 PM, bbslover <[hidden email]> wrote: 

> 
> as known, svm need tune some parameters like  cost,gamma and epsilon to get 
> better performance,but one question appear, how can i monitor the 
> performance . generally speaking ,we chose the cross-validation MSE in the 
> training set, but It seems svm can not return the cross-validation MSE 
> value, we just get it from "summary model.svm", if I write a loop, have no 
> idear call the cros-validate MSE, and no way to monitor this performance 
> ,how can I do? 
> 
> 
> -- 
> View this message in 
> context:http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p974942.html
> Sent from the R help mailing list archive at Nabble.com. 
> 
> __ 
>[hidden email]mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
> 

-- 

Max 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

View message 
@http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975038.html
To unsubscribe from Hello all, How can I get corss-validation MSE of SVM in 
e1071?,click here. 

-- 
View this message in context: 
http://n4.nabble.com/Hello-all-How-can-I-get-corss-validation-MSE-of-SVM-in-e1071-tp974942p975955.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] what is criterion of removing independence?

2009-12-20 Thread bbslover


Hello, all
I have a lot of independents and one dependent, finally, I want to build
one model using them, and predict the new samples value, that is regression.

   before it, I must remove some independents according to  some criterion: 
1. constant values independent. 2. variant near zero. 3. percentage of zero
values > what?? 4. have other critorion?

for 3. and 4. 

3.  I have no idea,  generally sepeaking, the corresponding independent with
what is percentage of zero values should be removed (20% or 50% or others,
have paper support?).

4.  statistical, have any critorions that are used to removed independent? 
give me a hand.

Actually, my questions is about feature selections, it is so complex, I hope
any friends can give me a guidance.

how can I leave those independent which is good to correlate to
dependent.(i.e. high correlation coefficent R and small predictive error
etc.). And removel "bad" independents.

thank  you!
-- 
View this message in context: 
http://n4.nabble.com/what-is-criterion-of-removing-independence-tp975987p975987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help，Suggest me some methods to identify t raining set and test set!!!

2009-12-21 Thread bbslover


I want to split my whole dateset to training set and test set, building model
in training set, and validate model using test set. Now, How can I split my
dataset to them reasonally. Please give me a hand, It is better to give me
some R code.

and I see some ways like using SOM to project whole independents to
2-dimensions and find some to be training set and others are test set.  like
below. I also want to do this. and my date is in xls accessory. Please help
me.  and excel file is 218*47 matrix, 47 are indepents. I want to project it
to 2D and label the corresponding sample label like that picture below.

thank you!
http://n4.nabble.com/file/n976245/SOM%2Btraining%2Bset%2Band%2Btest%2Bset.jpg
SOM+training+set+and+test+set.jpg 
http://n4.nabble.com/file/n976245/matlab218x47.xls matlab218x47.xls 
-- 
View this message in context: 
http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976245.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help，Suggest me some methods to identify tr aining set and test set!!!

2009-12-21 Thread bbslover


Thank you for all help. It is helpful for me.

Max Kuhn wrote:
> 
>> I noticed Max already pointed you to the caret package.
>>
>> Load the library and look at the help for the createFolds function, eg:
>>
>> library(caret)
>> ?createFolds
> 
> I think that the createDataPartition function in caret might work
> better for you.
> 
> There are a number of other packages with similar functions.
> 
> Max
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://n4.nabble.com/Help-Suggest-me-some-methods-to-identify-training-set-and-test-set-tp976245p976641.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected

2010-01-01 Thread bbslover


I am learning the package "caret", after I do the "rfe" function, I get the
error ,as follows:

Error in `[.data.frame`(x, , retained, drop = FALSE) : 
  undefined columns selected
In addition: Warning message:
In predict.lm(object, x) :
  prediction from a rank-deficient fit may be misleading


I try to that manual example, that is good, my data is wrong. I do not know
what reanson?

my code is :

  subsets<-c(1:5,10,15,20,25)
  ctrl<-rfeControl(functions=lmFuncs, method  = "cv", 
verbose=FALSE,returnResamp="final")
  lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl)

before it, I have do some pre-process and my data is in the attachment.

Please help me.  thank you!

kevin http://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt 
http://n4.nabble.com/file/n996068/trainY.txt trainY.txt 
-- 
View this message in context: 
http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected

2010-01-02 Thread bbslover


thanks,
  
  I have reduce the  number of descriptors, and the erroe is none, my major
is qsar, but what is the criterion to select descritors, and how many
descriptors should be selected, It is a problem, I calculate my descriptors
troungh E-dragon, and apply the wonderful package caret,but my result is
poor, how can i improve my performance?

Max is an expert in this field I think ,can you give me some suggestion in
how can I well learn QSAR and build the perfect models based on nonlinear
and linear. Here, only myself do QSAR research study lonely, and I have no
some software to calculate descriptors except free ons, I just know
e-dragon,  have others?

and good tools to do QSAR?   thank you  again.

kevin!

Max Kuhn wrote:
> 
> Your data set has 217 predictors and 166 samples. If you read the
> vignette on feature selection for this package, you'll see that the
> default ranking mechanism that it uses for linear models requires a
> linear model fit. The note that:
> 
>>  prediction from a rank-deficient fit may be misleading
> 
> should tell you something. If it doesn't: the model fit is over
> determined and there is no unique solution, so many of the parameter
> estimates are NA.
> 
> Either create a modified version of lmFuncs that suits your needs or
> remove variables prior to modeling (or try some other method that
> doesn't require more samples than predictors, such as the lasso or
> elasticnet).
> 
> Max
> 
> On Fri, Jan 1, 2010 at 10:14 PM, bbslover  wrote:
>>
>> I am learning the package "caret", after I do the "rfe" function, I get
>> the
>> error ,as follows:
>>
>> Error in `[.data.frame`(x, , retained, drop = FALSE) :
>>  undefined columns selected
>> In addition: Warning message:
>> In predict.lm(object, x) :
>>  prediction from a rank-deficient fit may be misleading
>>
>>
>> I try to that manual example, that is good, my data is wrong. I do not
>> know
>> what reanson?
>>
>> my code is :
>>
>>  subsets<-c(1:5,10,15,20,25)
>>  ctrl<-rfeControl(functions=lmFuncs, method  = "cv",
>>            verbose=FALSE,returnResamp="final")
>>  lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl)
>>
>> before it, I have do some pre-process and my data is in the attachment.
>>
>> Please help me.  thank you!
>>
>> kevin http://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt
>> http://n4.nabble.com/file/n996068/trainY.txt trainY.txt
>> --
>> View this message in context:
>> http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html
>> Sent from the R help mailing list archive at Nabble.com.
>>
>> __
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
> 
> 
> 
> -- 
> 
> Max
> 
> __
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> 

-- 
View this message in context: 
http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997526.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Please help me!!!! Error in `[.data.frame`(x, , retained, drop = FALSE) : undefined columns selected

2010-01-03 Thread bbslover

thanks ï¼now i reduce the number of the descriptors, It is ok.

å¨2010-01-03ï¼"Max Kuhn [via R]"  
åéï¼ -åå§é®ä»¶-
åä»¶äºº:"Max Kuhn [via R]" 
åéæ¶é´:2010å¹´1æ3æ¥ æææ¥
æ¶ä»¶äºº:bbslover 
ä¸»é¢:Re: [R] Please help me Error in `[.data.frame`(x, , retained, drop = 
FALSE) : undefined columns selected

Your data set has 217 predictors and 166 samples. If you read the 
vignette on feature selection for this package, you'll see that the 
default ranking mechanism that it uses for linear models requires a 
linear model fit. The note that: 

   >  prediction from a rank-deficient fit may be misleading 

should tell you something. If it doesn't: the model fit is over 
determined and there is no unique solution, so many of the parameter 
estimates are NA. 

Either create a modified version of lmFuncs that suits your needs or 
remove variables prior to modeling (or try some other method that 
doesn't require more samples than predictors, such as the lasso or 
elasticnet). 

Max 

On Fri, Jan 1, 2010 at 10:14 PM, bbslover <[hidden email]> wrote: 

> 
> I am learning the package "caret", after I do the "rfe" function, I get the 
> error ,as follows: 
> 
> Error in `[.data.frame`(x, , retained, drop = FALSE) : 
>  undefined columns selected 
> In addition: Warning message: 
> In predict.lm(object, x) : 
>  prediction from a rank-deficient fit may be misleading 
> 
> 
> I try to that manual example, that is good, my data is wrong. I do not know 
> what reanson? 
> 
> my code is : 
> 
>  subsets<-c(1:5,10,15,20,25) 
>  ctrl<-rfeControl(functions=lmFuncs, method  = "cv", 
>verbose=FALSE,returnResamp="final") 
>  lmProfile<-rfe(trainDescr,trainY,sizes=subsets,rfeControl=ctrl) 
> 
> before it, I have do some pre-process and my data is in the attachment. 
> 
> Please help me.  thank you! 
> 
> kevinhttp://n4.nabble.com/file/n996068/trainDescr.txt trainDescr.txt 
>http://n4.nabble.com/file/n996068/trainY.txt trainY.txt 
> -- 
> View this message in 
> context:http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p996068.html
> Sent from the R help mailing list archive at Nabble.com. 
> 
> __ 
>[hidden email]mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
> 

-- 

Max 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

View message 
@http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997387.html
To unsubscribe from Please help me Error in `[.data.frame`(x, , retained, 
drop = FALSE) : undefined columns selected,click here. 

-- 
View this message in context: 
http://n4.nabble.com/Please-help-me-Error-in-data-frame-x-retained-drop-FALSE-undefined-columns-selected-tp996068p997622.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help, how self-oganizing map show 2D picture and put all samples into the picture?

2010-01-04 Thread bbslover


http://n4.nabble.com/file/n998182/pca.jpg pca.jpg 
http://n4.nabble.com/file/n998182/som.jpg som.jpg 
http://n4.nabble.com/file/n998182/all%2Bindepents.xls all+indepents.xls 


As we know, som is  a good tool to cluster hign demension to 2D and show as
a 2D picture, just like in the attachment picutre.  But now, I do not get
that picture like attachment.  Who can help me? that is take the all samples
show in the picture and see how these samples distribute?

pca just like som, 3D show the indepents space distribute, how can i get
that picutre.

thank you!
-- 
View this message in context: 
http://n4.nabble.com/help-how-self-oganizing-map-show-2D-picture-and-put-all-samples-into-the-picture-tp998182p998182.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help me! using random Forest package, how to calculate Error Rates in the training set ?

2010-01-10 Thread bbslover


now I am learining random forest and using random forest package, I can get
the OOB error rates, and test set rate, now I want to get the training set
error rate, how can I do?

pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest=FALSE,do.trace=1e2)

using the code can get oob and test set error rate, if I replace x.ts and
y.ts with x.tr and y.tr,respectively,  is the error rate in the training set
?   

pgp.rf<-randomForest(x.tr,y.tr,x.tr,y.tr,ntree=1e3,keep.forest=FALSE,do.trace=1e2)
 

this time, I get oob error rates and training set error rate, is  right?

thank you!
-- 
View this message in context: 
http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1010987.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help me! using random Forest package, how to calculate Error Rates in the training set ?

2010-01-11 Thread bbslover


Thank you, Andy
 
I just read a paper, and they try to compare error rate among oob, test set, 
and training set and throung a figure showing random forest is not overfitting.
 when error rate in the training set come to zero, and oob and test set error 
rate do not increase. 
 
I am just a beginner, so I need learn a lot.
 
Thank you 
 
kevin


å¨2010-01-12ï¼"Liaw, Andy [via R]"  
åéï¼ -åå§é®ä»¶-
åä»¶äºº:"Liaw, Andy [via R]" 
åéæ¶é´:2010å¹´1æ12æ¥ ææäº
æ¶ä»¶äºº:bbslover 
ä¸»é¢:Re: [R] Help me! using random Forest package, how to calculate Error 
Rates in the training set ?

From: bbslover 

> 
> now I am learining random forest and using random forest 
> package, I can get 
> the OOB error rates, and test set rate, now I want to get the 
> training set 
> error rate, how can I do? 
> 
> pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest 
> =FALSE,do.trace=1e2) 
> using the code can get oob and test set error rate, if I 
> replace x.ts and 
> y.ts with x.tr and y.tr,respectively,  is the error rate in 
> the training set 
> ?   
> 
> pgp.rf<-randomForest(x.tr,y.tr,x.tr,y.tr,ntree=1e3,keep.forest 
> =FALSE,do.trace=1e2) 
> 
> this time, I get oob error rates and training set error rate, 
> is  right? 

Yes, or if you used keep.forest=TRUE, feed predict() with your x.tr and 
compare that with y.tr. 

However, I really don't understand why people compute "training error 
rate": what useful information can you get from it? 

Andy 
  

> thank you! 
> -- 
> View this message in context: 
>http://n4.nabble.com/Help-me-using-random-Forest-package-how-t
> o-calculate-Error-Rates-in-the-training-set-tp1010987p1010987.html 
> Sent from the R help mailing list archive at Nabble.com. 
> 
> __ 
>[hidden email]mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
>http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
> 
Notice:  This e-mail message, together with any attachme...{{dropped:10}} 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 



View message 
@http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1011366.html
To unsubscribe from Help me! using random Forest package, how to calculate 
Error Rates in the training set ?,click here. 



-- 
View this message in context: 
http://n4.nabble.com/Help-me-using-random-Forest-package-how-to-calculate-Error-Rates-in-the-training-set-tp1010987p1011752.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover


 Hello,
   I am learning randomForest, now I want to boxplot mse and mtry using 20
5-fold cross-validation(using median value), but I have no a good method to
do it, except a not good method.

randomforest package itself did not contain cross-validating method, and
caret package contain cross validation method, but how can I get the the all
number of mtry , at the same time corresponding mse?


-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold cross-validation?

2010-01-13 Thread bbslover

thank Max.
   you are so responsible, every time, you give me a lot of help. On my 
learning road, you are my guide, though we do not know each other.

best wishes

kevin

å¨2010-01-14ï¼"Max Kuhn [via R]"  
åéï¼ -åå§é®ä»¶-
åä»¶äºº:"Max Kuhn [via R]" 
åéæ¶é´:2010å¹´1æ14æ¥ ææå
æ¶ä»¶äºº:bbslover 
ä¸»é¢:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?

In caret, see ?trainControl. Use returnResamp = "all" 

Max 

On Wed, Jan 13, 2010 at 9:47 AM, bbslover <[hidden email]> wrote: 

> 
>  Hello, 
>   I am learning randomForest, now I want to boxplot mse and mtry using 20 
> 5-fold cross-validation(using median value), but I have no a good method to 
> do it, except a not good method. 
> 
> randomforest package itself did not contain cross-validating method, and 
> caret package contain cross validation method, but how can I get the the all 
> number of mtry , at the same time corresponding mse? 
> 
> 
> -- 
> View this message in 
> context:http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013058.html
> Sent from the R help mailing list archive at Nabble.com. 
> 
> __ 
>[hidden email]mailing list 
>https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code. 
> 

-- 

Max 

__ 
[hidden email]mailing list 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code. 

View message 
@http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013265.html
To unsubscribe from Help, How can I boxplot mse and mtry using 20 5-fold 
cross-validation?,click here. 

-- 
View this message in context: 
http://n4.nabble.com/Help-How-can-I-boxplot-mse-and-mtry-using-20-5-fold-cross-validation-tp1013058p1013515.html
Sent from the R help mailing list archive at Nabble.com.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help! kennard-stone algorithm in soil.spec packages does not work for my dataset!!!

2010-11-07 Thread bbslover


http://r.789695.n4.nabble.com/file/n3031344/RSV.Rdata RSV.Rdata 
I want to split my dataset to training set and test set using
kennard-stone(KS) algorithm, it is lucky there is R packages soil.spec to
implement it.

but when I used it to my dataset, it does not work, who can help me, how
reasons is it, below, it is my code, and my data in the attachment.

ks<-ken.sto(x,per="TRUE",per.n=0.3,va="FALSE",sav="FALSE")
ks

% results

$`Chosen sample names`
NULL

$`Chosen row number`
integer(0)

$`Chosen calibration sample names`
[1] "NULL"

$`Chosen calibration row number`
[1] "NULL"

$`Chosen validation sample names`
[1] "NULL"

$`Chosen validation row number`
[1] "NULL"

attr(,"class")

why it is all NULL ?


and

> ks<-ken.sto(x,per="TRUE",per.n=0.3,va="TRUE",sav="FALSE")
Error in val.min[i] <- blub[sample(length(blub), 1)] : 
  replacement has length zero
In addition: Warning message:
In min(prco[-cal.start.n, i]) :
  no non-missing arguments to min; returning Inf

if I set va="TRUE", appearing the errors.

I hope some friends can help me !
-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-kennard-stone-algorithm-in-soil-spec-packages-does-not-work-for-my-dataset-tp3031344p3031344.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] help! kennard-stone algorithm in soil.spec packages does not work for my dataset!!!

2010-11-08 Thread bbslover


http://r.789695.n4.nabble.com/file/n3032045/rsv1.txt rsv1.txt 


I am very grateful to David's suggestion, here , I upload my dataset
"rsv1.txt", also the question, 

ks<-ken.sto(rsv1,per="TRUE",per.n=0.3,va="FALSE",sav="FALSE")

it does not work, all results are NULL, i do not known why it is ?

hope, friends can give me a hand!

thanks 

kevin

-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-kennard-stone-algorithm-in-soil-spec-packages-does-not-work-for-my-dataset-tp3031344p3032045.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to get the plot like the attachment?

2010-11-26 Thread bbslover


http://r.789695.n4.nabble.com/file/n3060425/fig_1.png  fig. 1
http://r.789695.n4.nabble.com/file/n3060425/fig_2.png  fig. 2

I want to the picture like the above one, the origin crossover together,
while the following picture can be obtained by default and the origin is
detached, but throgut pulling the window, I can get the one like fig_1. 
Now, I want to know how to use the code to obtain directly the formation in
fig.1（orgin together）?

thanks 

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-get-the-plot-like-the-attachment-tp3060425p3060425.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to get the plot like the attachment?

2010-11-26 Thread bbslover


thanks, I succeed.


kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-get-the-plot-like-the-attachment-tp3060425p3061217.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover


Dear all,

My double for loop as follows, but it is little efficient, I hope all
friends can give me a "vectorized" program to replace my code. thanks

x: is a matrix  202*263,  that is 202 samples, and 263 independent variables

num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:num.compd)
   for (j in 1:num.compd)
  if (i!=j) {
S1<-sum(x[i,]*x[j,])
S2<-sum(x[i,]^2)
S3<-sum(x[j,]^2)
sim2<-S1/(S2+S3-S1)
diss2<-1-sim2
diss.all<-diss.all+diss2}

it will cost a long time to finish this computation! i really need "rapid"
code to replace my code.

thanks

kevin


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164222.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover


thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164732.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to replace my double for loop which is little efficient!

2010-12-26 Thread bbslover


thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:

the code form Berend can get the answer i want to get.

http://r.789695.n4.nabble.com/file/n3164741/my_data.rar my_data.rar 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164741.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] how to replace my double for loop which is little efficient!

2010-12-27 Thread bbslover


Thank Berend,

It seems like that it is better to attach a PDF file for avoiding messy
code.

Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia"
is about this coefficient. I also search R site about tanimoto coefficient
and learn it more. 

About your code, I has saved and learned it. 

Thanks again


Kevin


-- 
View this message in context: 
http://r.789695.n4.nabble.com/how-to-replace-my-double-for-loop-which-is-little-efficient-tp3164222p3164920.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] help! complete the reviewer's suggest: carry out GA+GP (gaussian process)!

2011-01-21 Thread bbslover


Hello, all experts,

My major is computer-aied drug design ( main QSAR).

Now, my paper need be reviesed, and one reviewer ask me do genetic algorithm
coupled with gaussian process method (GA+GP).

my data:
training set: 191*106
test set: 73*106

here, I need use GA+GP to do variable selection when building the model. 

In R, there are not GA package like in matlab
GA-toolbox(http://www.sheffield.ac.uk/acse/research/ecrg/gat.html) .

now, I just can use the matlab GA-tool box, however, I can not use
GP-toolbox in matlab. so I search the internet, find R package "genalg" can
do GA. and an example given is to do wavelength selection by GA+PLS, so I
think i certainly do the GA+GP. unfortunately, in this genalg package, i do
not know how to extract the selected variables, it seems likes there is not
such function. So I want to all friends help me to solve the reviewer's
suggestion: do GA+GP and extract the optimal variables and get the some
statistical parameters (i.e., cross-validation R2, pred R2 etc).

now, I can do GA+svm to do variable selection and build the models and get
some statistical paramets depicted above.

GA: matlab GA toolbox
(http://www.sheffield.ac.uk/acse/research/ecrg/gat.html)
svm: libsvm (http://www.csie.ntu.edu.tw/~cjlin/libsvm/)

now I want to know, how to get the predicted values :

In libsvm for example:
cmd = ['-v ',num2str(v),' -c',num2str(cgp(nind,1)),  '-g
',num2str(cgp(nind,2)),' -p ',num2str(cgp(nind,3)),' -s 3'];
model = svmtrain(train_y,train_data_best,cmd);
train_pred = svmpredict(train_y,train_data_best,model); % get the predicted
values for the training set

I can get the train_pred, likewise I can get the test_pred (tes_pred =
svmpredict(test_y,test_data_best,model);)

If I have the obsved train_y,test_y and the predicted train_pred and
test_pred, some statistical parameter can be calculated. 

But For GP, how can i get the predicted values?

(from GP website:  http://www.gaussianprocess.org/gpml/code/matlab/doc/)

prediction: [ymu ys2 fmu fs2   ] = gp(hyp, inf, mean, cov, lik, x, y, xs); 

here, the "ymu" are the predicted values that similar to "test_pred" in
libsvm?  

I hope all friends  can give me a hand, sincere there are little days i
should upload my revised manuscript, but until now this quest can not be
soved.

thanks for your help.

kevin




-- 
View this message in context: 
http://r.789695.n4.nabble.com/help-complete-the-reviewer-s-suggest-carry-out-GA-GP-gaussian-process-tp3229097p3229097.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] How to estimate whether overfitting?

2010-05-09 Thread bbslover


1. is there some criterion to estimate overfitting?  e.g. R2 and Q2 in the
training set, as well as R2 in the test set, when means overfitting.   for
example,  in my data, I have R2=0.94 for the training set and  for the test
set R2=0.70, is overfitting?
2. in this scatter, can one say this overfitting? 

3. my result is obtained by svm, and the sample are 156 and 52 for the
training and test sets, and predictors are 96,   In this case, can svm be
employed to perform prediction?   whether the number of the predictors are
too many ?

4.from this picture, can you give me some suggestion to improve model
performance？ and is the picture bad?

 
5. the picture and data below.
thank you!


http://n4.nabble.com/file/n2164417/scatter.jpg scatter.jpg 

http://n4.nabble.com/file/n2164417/pkc-svm.txt pkc-svm.txt 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164417.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] update R 2.11.0，there is error when usi ng plot(), how can I do?

2010-05-09 Thread bbslover


> a<-1:5
> b<-2:6
> plot(a,b)
Error in function (width, height, pointsize, record, rescale, xpinch,  : 
  Graphics API version mismatch


before, R 2.10  , plot() is ok.   Now, R 2.11.0 does not work
-- 
View this message in context: 
http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-when-using-plot-how-can-I-do-tp2164517p2164517.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to estimate whether overfitting?

2010-05-09 Thread bbslover


thanks for your suggestion.
 many I need to learn indeed. I will buy that good book.

kevin
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164847.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to estimate whether overfitting?

2010-05-09 Thread bbslover


thank you, I have downloaded it. studying
-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164932.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to estimate whether overfitting?

2010-05-09 Thread bbslover


many thanks .  I can try to use test set with 100 samples.

anther question is that how can I rationally split my data to training set
and test set? (training set with 108 samples, and test set with 100 samples)

as I  know, the test set should the same distribute to the training set. and
what method can deal with it to rationally split?

and what packages in R can deal with splitting training/test set rationally
question?


if the split is random. it seems to need many times splits, and the average
results consider as the final results.

however, I want to several methods to perform split and get the firm
training set and test set instead of random split.

training set and test set should like this：ideally, the division must be
performed sunch that points representing both traing and training set are
distributed within the hole feature space occupied by the entire dataset,
and each point of the test set is close to at least one point of the
training set. this approach ensures that the similarity principle can be
enmployed for the output prediction of the test set. Certainly,this
condition can not always be satistied.

thus, generally, what algorithms often be perform to split? and more
rational? some paper often say, they split the data set  randomly, thus,
what is randomly?  just selection random? or have some clear method? e.g.
output order,  I really know, which package can do with split data
rationally?

other, if one want to get the better results, some "tips" can be done. e.g.
they can select test set again and again, and use the test set with best
results as final test set and say that the test set was selectd randomly,
but it is not true random, it is false.

thank you, sorry to so many questions. but it puzzled me always.  up to now,
I have no good method to split rationally my data into training set and test
set.

at last, split training and test set should be done before modeling, and it
seems that this can be done just from featrue? (som)  ( or feature and
output?(alogorithm spxy. paper:"a method for calibration and validation
subset partioning")  or just output?(output order)).

but always, often there are many features to be calculated. and some featrue
is zero or low standard deviation(sd<0.5),  should we delete these features
before split the whole data?

and use the remaining feature to split data, and just using the training set
to build the regression model and to perform feature selection as well as to
do cross-validation,  and the independent test set just used to test the
built model, yes?

maybe, my thinking is not clear about the whole model precess. but I think
it is like this:
1) get samples
2) calculate features
3) preprocess features calculated (e.g.remove zero)
4)rational split data into training and test set (always puzzle me, how to
split on earth?)
5)build model and at the same time tune parameter of model  based on the
resample methods using just training set. and get the final model.
6) test the model performance using independent test set (unseen samples).
7) estimate the model. good? or bad?  overfitting?  (generally, what case is
overfitting? can you give me a example? as i know, it is overfitting when
the trainging set fit good, but the independent test set is bad,but what is
good ? what is bad?r2=0.94 in the training set and r2=0.70 in the test,
in this case, the model is overfitting?  the model can be accepted?  and
generally what model can be well accetpt?)
8) conclusion. how is the model.

above is my thinking.  and many question wait for answering.

thanks 

kevin 


-- 
View this message in context: 
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164960.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] update R 2.11.0，there is error when usi ng plot(), how can I do?

2010-05-10 Thread bbslover


now. it is ok.  I uninstall R2.11.0, then delete an packages in  the library,
and install again R2.11.0. ok, it does works.  

thank you!
-- 
View this message in context: 
http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-when-using-plot-how-can-I-do-tp2164517p2165235.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

58 matches

Mail list logo