[R] Classification methods - which one?

Pedro Silva Tue, 20 Nov 2012 04:12:42 -0800

Dear Peter,

There are several packages that try to address this type of problem (although 
the remarks made
by Max are something that we should always keep in mind), and I also recommend 
those with perform
some form of regularized, penalized or shrunken linear discriminant analysis 
with a preliminary variable
selection step .

You can take a look at the  hda, rda, sda,  SDDA, HDclassif or my own HiDimDA, 
packages for some of the
 most important alternatives.
Hope this helps.

Best,
Pedro

Pedro Duarte Silva
Associate Professor of Statistics and Operations Research
Faculdade de Economia e Gestão
Universidade Católica Portuguesa / Porto
www.feg.porto.ucp.pt

Date: Mon, 19 Nov 2012 20:53:10 +0100
From: Peter Kupfer <peter.kup...@me.com>
To: Max Kuhn <mxk...@gmail.com>
Cc: "r-help@r-project.org" <r-help@r-project.org>
Subject: Re: [R] Classification methods - which one?
Message-ID: <ed56664a-e8ef-4733-a12b-35117f347...@me.com>
Content-Type: text/plain; CHARSET=US-ASCII

Dear Max,
first: Thanks a lot for your suggestion and the open words about methods in 
real life. I guess: Thats my problem.
Regarding my analysis: Yes, thats the problem and I have to coerce to do this 
analysis regarding lack of time to start something/other methods.
So you suggest Linear Discriminant Analysis. Is there a special packages you 
recommend? Nearest Shrunken Centroids i checked with the package PAMR 
(http://www-stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html)
The example works fine but I guess i have to many rows (or in this case genes) 
for the analysis. My main problem is that i cannot reduce the amount of the 
genes because some of the bosses want to compare the output of classification 
methods with a ruled-based algorithm which works with all genes (after P/A 
calls and an alternative CDF) on the array. So an reduction of the 17 000 genes 
is only possible in a limited way (around 7000 genes after some pre-processing 
steps).
For all tips and suggestions I am more than happy.
Best
Peter

Am 19.11.2012 um 16:36 schrieb Max Kuhn <mxk...@gmail.com>:

> My suggestion is not to do any predictive modeling. Basically, the
> data doesn't support a sensible and reproducible model. Yes, the
> literature is saturated with this type of analysis but almost none of
> the examples have any utility in real life.
>
> Stick to differential expression analysis, investigate the results
> statistically and biologically then design a prospective experiment
> with a specific set of genes and a more refined measurement system.
>
> If you are doing this analysis to learn something from the data (as
> opposed to generating accurate predictions), a predictive model is one
> of the worst ways of going about it.
>
> If you are coerced to do this analysis, stick to linear methods
> (regularized LDA, nearest shrunken centroids, etc) that are less
> likely to over-fit and bias yourself towards those that have embedded
> feature selection.
>
> Max
>
>
> On Mon, Nov 19, 2012 at 10:16 AM, Peter Kupfer <peter.kup...@me.com> wrote:
>> Dear all,
>> i searched for some classification methods and I have no glue if i took the 
>> right once.
>> My problem: I have a matrix with 17000 rows and 33 colums (genes and 
>> patients). The patients are grouped into 3 diseases.
>> No I want to classify the patients and for sure i want to know which rows 
>> are more helpful for the classification than others.
>>
>> I tried SVM and random forest. Do you think this are the right 
>> classification methods? Maybe there are some hints you can give me. I am 
>> more familiar with the Bioconductor packages. Furthermore: This is/was not 
>> my field of study in the past but I want to understand it and I am willing 
>> to deal with this field.
>> Would be amazing if one of the (more) mathematical people can give me a hint.
>> Thanks and all the best
>>
>> Peter
>>
>>
>> PS: I can upload my underlying data if somebody is interested
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

________________________________

AVISO DE CONFIDENCIALIDADE\ Esta mensagem (incluindo qua...{{dropped:16}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Classification methods - which one?

Reply via email to