Hello, all experts,
My major is computer-aied drug design ( main QSAR).
Now, my paper need be reviesed, and one reviewer ask me do genetic algorithm
coupled with gaussian process method (GA+GP).
my data:
training set: 191*106
test set: 73*106
here, I need use GA+GP to do variable selection
Thank Berend,
It seems like that it is better to attach a PDF file for avoiding messy
code.
Yes, I want to obtain is Tanimoto coefficient and your web site "wikipedia"
is about this coefficient. I also search R site about tanimoto coefficient
and learn it more.
About your code, I has saved and
thanks for your help. I am sorry I do not full understand your code, so i can
not correct using your code to my data. here is the attachment of my data,
and what I want to compute is the equation in the word document of the
attachment:
the code form Berend can get the answer i want to get.
http:
thanks for your help, it is great. In addition, In the beginning, the format
of x is dataframe, and i run my code, it is so slow, after your help, I
change x for matirx, it is so quick. I am very grateful your kind help, and
your code is so good!
kevin
--
View this message in context:
http://r.
Dear all,
My double for loop as follows, but it is little efficient, I hope all
friends can give me a "vectorized" program to replace my code. thanks
x: is a matrix 202*263, that is 202 samples, and 263 independent variables
num.compd<-nrow(x); # number of compounds
diss.all<-0
for( i in 1:nu
thanks, I succeed.
kevin
--
View this message in context:
http://r.789695.n4.nabble.com/how-to-get-the-plot-like-the-attachment-tp3060425p3061217.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailing list
http
http://r.789695.n4.nabble.com/file/n3060425/fig_1.png fig. 1
http://r.789695.n4.nabble.com/file/n3060425/fig_2.png fig. 2
I want to the picture like the above one, the origin crossover together,
while the following picture can be obtained by default and the origin is
detached, but throgut pulli
http://r.789695.n4.nabble.com/file/n3032045/rsv1.txt rsv1.txt
I am very grateful to David's suggestion, here , I upload my dataset
"rsv1.txt", also the question,
ks<-ken.sto(rsv1,per="TRUE",per.n=0.3,va="FALSE",sav="FALSE")
it does not work, all results are NULL, i do not known why it is ?
http://r.789695.n4.nabble.com/file/n3031344/RSV.Rdata RSV.Rdata
I want to split my dataset to training set and test set using
kennard-stone(KS) algorithm, it is lucky there is R packages soil.spec to
implement it.
but when I used it to my dataset, it does not work, who can help me, how
reasons i
now. it is ok. I uninstall R2.11.0, then delete an packages in the library,
and install again R2.11.0. ok, it does works.
thank you!
--
View this message in context:
http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-when-using-plot-how-can-I-do-tp2164517p2165235.html
Sent from th
many thanks . I can try to use test set with 100 samples.
anther question is that how can I rationally split my data to training set
and test set? (training set with 108 samples, and test set with 100 samples)
as I know, the test set should the same distribute to the training set. and
what met
thank you, I have downloaded it. studying
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164932.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailin
thanks for your suggestion.
many I need to learn indeed. I will buy that good book.
kevin
--
View this message in context:
http://r.789695.n4.nabble.com/How-to-estimate-whether-overfitting-tp2164417p2164847.html
Sent from the R help mailing list archive at Nabble.com.
> a<-1:5
> b<-2:6
> plot(a,b)
Error in function (width, height, pointsize, record, rescale, xpinch, :
Graphics API version mismatch
before, R 2.10 , plot() is ok. Now, R 2.11.0 does not work
--
View this message in context:
http://r.789695.n4.nabble.com/update-R-2-11-0-there-is-error-wh
1. is there some criterion to estimate overfitting? e.g. R2 and Q2 in the
training set, as well as R2 in the test set, when means overfitting. for
example, in my data, I have R2=0.94 for the training set and for the test
set R2=0.70, is overfitting?
2. in this scatter, can one say this overfi
thanks, it is ok!
--
View this message in context:
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p2013782.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailing list
https://stat.et
Thanks for your reply, I just want to get the figure like y1.jpg using the
data from y1.txt.
Through the figure I want to obtain the split point like y1.jpg, and
consider 2.5 as the plit point. This figure is drawn by other people, I
just want to draw it using R, but I can not, so I hope, frie
thank you, I will try this function barplot.
--
View this message in context:
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839541.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org m
thanks for your help. I can have a try.
--
View this message in context:
http://n4.nabble.com/how-can-I-plot-the-histogram-like-this-using-R-tp1839303p1839534.html
Sent from the R help mailing list archive at Nabble.com.
__
R-help@r-project.org mailin
I want to get the plot like this,
http://n4.nabble.com/file/n1839303/%25E9%25A2%2591%25E7%258E%2587%25E5%2588%2586%25E5%25B8%2583%25E5%259B%25BE%25E6%25A0%2587%25E5%2587%2586.jpg
%E9%A2%91%E7%8E%87%E5%88%86%E5%B8%83%E5%9B%BE%E6%A0%87%E5%87%86.jpg
not this, http://n4.nabble.com/file/n1839303/R.jp
Hello,
I am learning caret package, and I want to use the RFE to reduce the
feature. I want to use RFE coupled Random Forest (RFE+FR) to complete this
task. As we know, there are a number of pre-defined sets of functions, like
random Forest(rfFuncs), however,I want to tune the parameters (mtr
This topic refer to independent variables reduction, as we know ,a lot of
method can do with it,however, for pre-processing independent varibles, a
method like the sentence below can reduce many variable, How can I
understand it?
what is significant correlation at 5% level, what is the criterion
åéæ¶é´:2010å¹´1æ14æ¥ ææå
æ¶ä»¶äºº:bbslover
主é¢:Re: [R] Help, How can I boxplot mse and mtry using 20 5-fold
cross-validation?
In caret, see ?trainControl. Use returnResamp = "all"
Max
On Wed, Jan 13, 2010 at 9:47 AM, bbslover <[hidden email]> wrot
Hello,
I am learning randomForest, now I want to boxplot mse and mtry using 20
5-fold cross-validation(using median value), but I have no a good method to
do it, except a not good method.
randomforest package itself did not contain cross-validating method, and
caret package contain cross vali
late Error
Rates in the training set ?
From: bbslover
>
> now I am learining random forest and using random forest
> package, I can get
> the OOB error rates, and test set rate, now I want to get the
> training set
> error rate, how can I do?
>
> pgp.rf<-randomF
now I am learining random forest and using random forest package, I can get
the OOB error rates, and test set rate, now I want to get the training set
error rate, how can I do?
pgp.rf<-randomForest(x.tr,y.tr,x.ts,y.ts,ntree=1e3,keep.forest=FALSE,do.trace=1e2)
using the code can get oob and t
http://n4.nabble.com/file/n998182/pca.jpg pca.jpg
http://n4.nabble.com/file/n998182/som.jpg som.jpg
http://n4.nabble.com/file/n998182/all%2Bindepents.xls all+indepents.xls
As we know, som is a good tool to cluster hign demension to 2D and show as
a 2D picture, just like in the attachment pic
eds or
remove variables prior to modeling (or try some other method that
doesn't require more samples than predictors, such as the lasso or
elasticnet).
Max
On Fri, Jan 1, 2010 at 10:14 PM, bbslover <[hidden email]> wrote:
>
> I am learning the package "caret", a
o unique solution, so many of the parameter
> estimates are NA.
>
> Either create a modified version of lmFuncs that suits your needs or
> remove variables prior to modeling (or try some other method that
> doesn't require more samples than predictors, such as the lasso or
> elasticne
I am learning the package "caret", after I do the "rfe" function, I get the
error ,as follows:
Error in `[.data.frame`(x, , retained, drop = FALSE) :
undefined columns selected
In addition: Warning message:
In predict.lm(object, x) :
prediction from a rank-deficient fit may be misleading
I
Thank you for all help. It is helpful for me.
Max Kuhn wrote:
>
>> I noticed Max already pointed you to the caret package.
>>
>> Load the library and look at the help for the createFolds function, eg:
>>
>> library(caret)
>> ?createFolds
>
> I think that the createDataPartition function in care
I want to split my whole dateset to training set and test set, building model
in training set, and validate model using test set. Now, How can I split my
dataset to them reasonally. Please give me a hand, It is better to give me
some R code.
and I see some ways like using SOM to project whole ind
Hello, all
I have a lot of independents and one dependent, finally, I want to build
one model using them, and predict the new samples value, that is regression.
before it, I must remove some independents according to some criterion:
1. constant values independent. 2. variant near zero. 3
ec 18, 2009 at 12:26 PM, bbslover <[hidden email]> wrote:
>
> as known, svm need tune some parameters like cost,gamma and epsilon to get
> better performance,but one question appear, how can i monitor the
> performance . generally speaking ,we chose the cross-validation MSE in
t;
> http://www.jstatsoft.org/v28/i05/paper
>
> about the package.
>
> Max
>
> On Fri, Dec 18, 2009 at 12:26 PM, bbslover wrote:
>>
>> as known, svm need tune some parameters like cost,gamma and epsilon to
>> get
>> better performance,but one question appear,
as known, svm need tune some parameters like cost,gamma and epsilon to get
better performance,but one question appear, how can i monitor the
performance . generally speaking ,we chose the cross-validation MSE in the
training set, but It seems svm can not return the cross-validation MSE
value, we
Hi,all friends,
Please help me understand this sentence below:
“From this set, 858 columns not significantly correlated with the
response variable TBG at the 5% level were removed, leaving a set of 390
columns.” and “ the F-test's value for the one-parameter correlation with
the descriptor i
http://old.nabble.com/file/p26443595/Edragonr.txt Edragonr.txt
HI all,
I have a 72*495 matrix, and the first column is the response, and the
remaining are independences. Final I want to select some independence to fit
y, but there are so many independences, the fit result is not meaning, so
Dear all,
I am learning the subselect package in R, now I want to use GA to select
some potent variable, but some questions are puzzled.
what i want to resolve is that I have one column dependent y and 219
columns independent x. A total 72 observations is contained in the
dataset. I want t
my code is not right below:
rm(list=ls())
#define data.frame
a=c(1,2,3,5,6); b=c(1,2,3,4,7); c=c(1,2,3,4,8); d=c(1,2,3,5,1);
e=c(1,2,3,5,7)
data.f=data.frame(a,b,c,d,e)
#backup data.f
origin.data<-data.f
#get correlation matrix
cor.matrix<-cor(origin.data)
#backup corre
ok,I understand your means, maybe PLS is better for my aim. but I have done
that, also bad. the most questions for me is how to select less variables
from the independent to fit dependent. GA maybe is good way, but I do not
learn it well.
Ben Bolker wrote:
>
> bbslover yeah.net&g
rm(list=ls())
yx.df<-read.csv("c:/MK-2-72.csv",sep=',',header=T,dec='.')
dim(yx.df)
#get X matrix
y<-yx.df[,1]
x<-yx.df[,2:643]
#conver to matrix
mat<-as.matrix(x)
#get row number
rownum<-nrow(mat)
#remove the constant parameters
mat1<-mat[,apply(mat,2,function(.col)!(all(.col[1]==.col[2:rownum]))
r.matrix)[1])+1
>
> for column ids use modulus instead of integer divison.
>
> (which(cor.matrix >=0.95) %% dim(cor.matrix)[1])
>
> There are probably better ways than this.
>
> Nikhil
>
> but probably a better way to do this would be
>
> On 6 Nov 200
my programe is below:
a=c(1,2,1,1,1); b=c(1,2,3,4,1); c=c(3,4,3,3,3); d=c(1,2,3,5,1);
e=c(1,5,3,5,1)
data.f=data.frame(a,b,c,d,e)
origin.data<-data.f
cor.matrix<-cor(origin.data)
origin.cor<-cor.matrix
m<-0
for(i in 1:(cor.matrix[1]-1))
{
for(j in (i+1):(cor.matrix[2]))
{
if (cor.matri
e.g.
a=
a b c d e
1 1 1 3 1 1
2 1 2 3 4 5
3 1 3 3 8 3
4 1 4 3 3 5
5 1 1 3 1 1I want to delete colume a and colume c, because they
have the same values in every row, then ,I want to get this data.frame .
b=
b d e
1 1 1 1
2 2 4 5
3 3 8 3
4 4 3 5
5 1 1 1the following i
>>
>> http://www2.research.att.com/~volinsky/bma.html
>>
>> But of course, you must do what you think is better for your problem.
>> By the way what is the dimension of your problem?
>>
>> HTH,
>>
>> Rick
>> --
weight of
> each variable in the PC.
>
> HTH
>
> Rick
>
> --
> From: "bbslover"
> Sent: Wednesday, November 04, 2009 10:23 AM
> To:
> Subject: [R] variable selectin---reduce the numbers of initial variable
&g
hello,
my problem is like this: now after processing the varibles, the remaining
160 varibles(independent) and a dependent y. when I used PLS method, with 10
components, the good r2 can be obtained. but I donot know how can I express
my equation with the less varibles and the y. It is better to
thank you for your help,it is a good way.
Steven Kang wrote:
>
> can try
>
> matrix.x <- as.matrix(x)
>
> On Mon, Nov 2, 2009 at 8:38 PM, bbslover wrote:
>
>>
>> In my disk C:/ have a a.csv file, I want to read it to R, importantly,
>> when
>
In my disk C:/ have a a.csv file, I want to read it to R, importantly, when
I use x=read.csv("C:/a.csv") ,the x format is data.frame, I want to it to
become matrix format, how can I do it ?
thank you!
--
View this message in context:
http://old.nabble.com/how-can-I-convert-.csv-format-to
1
>> str(df)
> 'data.frame': 5 obs. of 2 variables:
> $ x : int 1 2 3 4 5
> $ matrix.rnorm.10...5..2.: AsIs [1:5, 1:2] 0.187703.... -0.66264
> -0.82334 -0.37255 -0.28700 ...
>>
>
> Regards
> Petr
>
>
It is so dramatical. Thank Gabor Grothendieck . I got it.
Gabor Grothendieck wrote:
>
> Google for
> CRANberries aggregates
> and check first hit.
>
> On Sat, Oct 24, 2009 at 4:44 AM, bbslover wrote:
>>
>> there are many R packages, yesterday, 2031 but tod
there are many R packages, yesterday, 2031 but today 2033 packages. how can I
kown which package is added, or updated?
--
View this message in context:
http://www.nabble.com/how-can-I-kown-which-package-is-added%2C-or-updated--tp26037150p26037150.html
Sent from the R help mailing list archive at
thank you Don MacQueen , I will try it.
Don MacQueen wrote:
>
> At 4:57 AM -0700 10/23/09, bbslover wrote:
>>Steve Lianoglou-6 wrote:
>>>
>>> Hi,
>>>
>>> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>>>
>>>> Usage
>>
I have try it, past can add to wanted letter, but can not past the colume
names. May be I should learn it hard.
Don MacQueen wrote:
>
> At 4:57 AM -0700 10/23/09, bbslover wrote:
>>Steve Lianoglou-6 wrote:
>>>
>>> Hi,
>>>
>>> On Oct 22, 2
I have read that one ,I want to this method to be used to my data.but I donot
know how to put my data into R.
James W. MacDonald wrote:
>
>
>
> bbslover wrote:
>>
>>
>> Steve Lianoglou-6 wrote:
>>> Hi,
>>>
>>> On Oct 22, 2009, at 2
Steve Lianoglou-6 wrote:
>
> Hi,
>
> On Oct 22, 2009, at 2:35 PM, bbslover wrote:
>
>> Usage
>> data(gasoline)
>> Format
>> A data frame with 60 observations on the following 2 variables.
>> octane
>> a numeric vector. The octane number.
&g
Usage
data(gasoline)
Format
A data frame with 60 observations on the following 2 variables.
octane
a numeric vector. The octane number.
NIR
a matrix with 401 columns. The NIR spectrum
and I see the gasoline data to see below
NIR.1686 nm NIR.1688 nm NIR.1690 nm NIR.1692 nm NIR.1694 nm NIR.1
58 matches
Mail list logo