Dear Max, first: Thanks a lot for your suggestion and the open words about methods in real life. I guess: Thats my problem. Regarding my analysis: Yes, thats the problem and I have to coerce to do this analysis regarding lack of time to start something/other methods. So you suggest Linear Discriminant Analysis. Is there a special packages you recommend? Nearest Shrunken Centroids i checked with the package PAMR (http://www-stat.stanford.edu/~tibs/PAM/Rdist/doc/readme.html) The example works fine but I guess i have to many rows (or in this case genes) for the analysis. My main problem is that i cannot reduce the amount of the genes because some of the bosses want to compare the output of classification methods with a ruled-based algorithm which works with all genes (after P/A calls and an alternative CDF) on the array. So an reduction of the 17 000 genes is only possible in a limited way (around 7000 genes after some pre-processing steps). For all tips and suggestions I am more than happy. Best Peter
Am 19.11.2012 um 16:36 schrieb Max Kuhn <mxk...@gmail.com>: > My suggestion is not to do any predictive modeling. Basically, the > data doesn't support a sensible and reproducible model. Yes, the > literature is saturated with this type of analysis but almost none of > the examples have any utility in real life. > > Stick to differential expression analysis, investigate the results > statistically and biologically then design a prospective experiment > with a specific set of genes and a more refined measurement system. > > If you are doing this analysis to learn something from the data (as > opposed to generating accurate predictions), a predictive model is one > of the worst ways of going about it. > > If you are coerced to do this analysis, stick to linear methods > (regularized LDA, nearest shrunken centroids, etc) that are less > likely to over-fit and bias yourself towards those that have embedded > feature selection. > > Max > > > On Mon, Nov 19, 2012 at 10:16 AM, Peter Kupfer <peter.kup...@me.com> wrote: >> Dear all, >> i searched for some classification methods and I have no glue if i took the >> right once. >> My problem: I have a matrix with 17000 rows and 33 colums (genes and >> patients). The patients are grouped into 3 diseases. >> No I want to classify the patients and for sure i want to know which rows >> are more helpful for the classification than others. >> >> I tried SVM and random forest. Do you think this are the right >> classification methods? Maybe there are some hints you can give me. I am >> more familiar with the Bioconductor packages. Furthermore: This is/was not >> my field of study in the past but I want to understand it and I am willing >> to deal with this field. >> Would be amazing if one of the (more) mathematical people can give me a hint. >> Thanks and all the best >> >> Peter >> >> >> PS: I can upload my underlying data if somebody is interested >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > > Max ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.