Hadassa,
You may want to check out the snpMatrix package in Bioconductor

http://bioconductor.org/packages/2.3/bioc/html/snpMatrix.html
http://bioconductor.org/packages/2.4/bioc/html/snpMatrix.html

It contains classes that manage this type of information and should minimize your coding effort.


Patrick


Quoting Thomas Lumley <tlum...@u.washington.edu>:


The first step is to convert your data to all uppercase with toupper().

Then it depends on how tidy the data are: are there missing data, are
some SNPs monomorphic in your sample, etc.

If there are no missing data you can use

N<-ncol(the_data)
halfN <- N/2

maf_one_row <-function(arow) {
   rval<-numeric(N)
   if (sum(i<-arow=="A")>halfN) {
        rval[]<-1
   } else if (sum(i<-arow=="C")>halfN){
        rval[i]<-1
   } else if (sum(i<-arow=="T"))>halfN){
        rval[i]<-1
   } else if (sum(i<-arow=="G")>halfN){
        rval[i]<-1
   }
   rval
}

apply(the_data, 1, maf_one_row)

YOu could also use table() to find the two alleles, but you have to
make sure that the code still works when there is only one allele.

     -thomas

On Thu, 29 Jan 2009, Hadassa Brunschwig wrote:

Hi

An example is as follows. Consider the character 3x6 matrix:

a A a T A t
G g t T T t
A a C C c c

For each row I would like to identify the most frequent letter and
assign a 1 to it and 0
to the less frequent character. That is, in row 1 the most frequent
letter is A (I do not differentiate between capital and non-capital
letters), in row 2 T and in row 3 C. After the binary conversion
the resulting matrix would look like that:

1 1 1 0 1 0
0 0 1 1 1 1
0 0 1 1 1 1

Any suggestions on how to do that (and I am sure I am not the first
one to try this).

Thanks
Hadassa


On Thu, Jan 29, 2009 at 1:50 AM, Jorge Ivan Velez
<jorgeivanve...@gmail.com> wrote:

Hi Hadassa,
Do you have a sample of your data and the output you want? It might be
useful for us in order to provide any help to you.
Regards,

Jorge


On Wed, Jan 28, 2009 at 8:36 AM, Hadassa Brunschwig
<hadassa.brunsch...@mail.huji.ac.il> wrote:

Hi

I am sure there is a function out there already but I couldn't find it.
I have SNP data, that is, a matrix which contains in each row two
characters (they are different in each row) and I would like to
convert this matrix to a binary one according to the minor allele
frequency. For non-geneticists: I want to have a binary matrix
for which in each row the 0 stands for the less frequent character
and 1 for the more frequent character.

Thanks for any suggestions.
Hadassa

--
Hadassa Brunschwig
PhD Student
Department of Statistics
The Hebrew University of Jerusalem
http://www.stat.huji.ac.il

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.





--
Hadassa Brunschwig
PhD Student
Department of Statistics
The Hebrew University of Jerusalem
http://www.stat.huji.ac.il

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Thomas Lumley                   Assoc. Professor, Biostatistics
tlum...@u.washington.edu        University of Washington, Seattle

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to