Dear mailing list, I'm still quite a newbie in the statistical analysis of genotype/allele data, resp. more generally in the analysis of categorical variables. Moreover, I'm currently totally confused by the many R packages available to do such analysis.
Here is my case: I've got a list of genes, and a number of case-control population pairs, and for each population and gene, the various genotypes that have been found. I've got both aggregate data (ex. gene1: homozygote wildtype: 201, heterozygote mutation carrier: 34, homozygote mutation carrier: 5) and per-gene data (i.e. for gene1 a list of e.g. "V/V", "V/I", "II" etc). The question asked is whether there is a difference in the mutation pattern between the case and the control groups influencing the outcome, both at the level of a single gene, and at the level of their combination. Moreover, I would like to check for linkage desequilibrium (LD), as I know that some of these genes are located quite closely on the chromosome. OK, so up to now I've been doing the Chi-square tests, McNemar matched pairs test, Fisher test if my numbers were too small. As for the LD question, if I have understood correctly, I have to use log-linear regression. I have been trying several R packages, and I'm so confused now, because I don't know which one is best suited for my problem. I have to add that I'm new also to log-linear regression... I've used "hwde", and read the paper on which it is based (see hwde doc), but the package leaves out certain output rows that are shown in the paper, and it doesn't show which of the output rows is significant, as the paper does. Is there any simply way to interpret "hwde" output (something like a p-value)? Then there are the "GeneticsBase", "Genetics", "mapLD", "Hardy-Weinberg" packages. Some work only for a single gene, some apply a thing called "MLE", some "general linearized models", etc. I know these questions are as much basic statistical than R questions. But I'd be glad if you could help me find the best solution for my type of analysis, resp. point me to good resources that show me how to do this. The problem is that most resources show "how to" do the analysis, but they don't explain at all how to *interpret* their output. Thanks a lot in advance, Anne-Marie ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.