Re: [R] Classification methods - which one?

Peter Kupfer Mon, 19 Nov 2012 11:54:44 -0800

Dear Max, 
first: Thanks a lot for your suggestion and the open words about methods in 
real life. I guess: Thats my problem.
Regarding my analysis: Yes, thats the problem and I have to coerce to do this 
analysis regarding lack of time to start something/other methods. 
So you suggest Linear Discriminant Analysis. Is there a special packages you 
recommend? Nearest Shrunken Centroids i checked with the package PAMR 
(http://www-stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html)
The example works fine but I guess i have to many rows (or in this case genes) 
for the analysis. My main problem is that i cannot reduce the amount of the 
genes because some of the bosses want to compare the output of classification 
methods with a ruled-based algorithm which works with all genes (after P/A 
calls and an alternative CDF) on the array. So an reduction of the 17 000 genes 
is only possible in a limited way (around 7000 genes after some pre-processing 
steps).
For all tips and suggestions I am more than happy.
Best
Peter




Am 19.11.2012 um 16:36 schrieb Max Kuhn <mxk...@gmail.com>:

> My suggestion is not to do any predictive modeling. Basically, the
> data doesn't support a sensible and reproducible model. Yes, the
> literature is saturated with this type of analysis but almost none of
> the examples have any utility in real life.
> 
> Stick to differential expression analysis, investigate the results
> statistically and biologically then design a prospective experiment
> with a specific set of genes and a more refined measurement system.
> 
> If you are doing this analysis to learn something from the data (as
> opposed to generating accurate predictions), a predictive model is one
> of the worst ways of going about it.
> 
> If you are coerced to do this analysis, stick to linear methods
> (regularized LDA, nearest shrunken centroids, etc) that are less
> likely to over-fit and bias yourself towards those that have embedded
> feature selection.
> 
> Max
> 
> 
> On Mon, Nov 19, 2012 at 10:16 AM, Peter Kupfer <peter.kup...@me.com> wrote:
>> Dear all,
>> i searched for some classification methods and I have no glue if i took the 
>> right once.
>> My problem: I have a matrix with 17000 rows and 33 colums (genes and 
>> patients). The patients are grouped into 3 diseases.
>> No I want to classify the patients and for sure i want to know which rows 
>> are more helpful for the classification than others.
>> 
>> I tried SVM and random forest. Do you think this are the right 
>> classification methods? Maybe there are some hints you can give me. I am 
>> more familiar with the Bioconductor packages. Furthermore: This is/was not 
>> my field of study in the past but I want to understand it and I am willing 
>> to deal with this field.
>> Would be amazing if one of the (more) mathematical people can give me a hint.
>> Thanks and all the best
>> 
>> Peter
>> 
>> 
>> PS: I can upload my underlying data if somebody is interested
>> 
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> -- 
> 
> Max

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Classification methods - which one?

Reply via email to