[R] resampling from distributions

Grant Gillis Sat, 19 Apr 2008 13:30:07 -0700

Hello All,

Once again thanks for all of the help to date.  I am climbing my R learning
curve.  I've got a few more questions that I hope I can get some guidance on
though.   I am not sure whether the etiquette is to break up multiple
questions or not but I'll keep them together here for now as it may help put
the questions in context despite the fact that the post may get a little
long.



Question 1:


My first goal is to calculate the proportion of shared 1) behaviours and 2)
alleles between numerous individuals.  Pasted below ('propshared' function)
is what I have now and and works very well for calculating the proportion of
shared behaviours where the data is formatted with each column as a
behaviour and each row an individual.  Microsatellite genotypes are
formatted differently.  An example is below.  Each row is an individual and
each column is one allele from a single locus.  From the below values L1
and L1.1 each give a copy of an allele for same locus.  Occasionally values
from different loci will have the same value altough these are not actually
the same allele.

I would like the calculation of the proportion of shared values for alleles
to be restricted to the proportion of shared alleles within loci for all
individuals (pairs of columns L1 and L1.1, L2 and L2.2....)  What I have now
calculates the proportion of shared values for alleles across loci.  A
specific example is that I would like the value *2* for individual *w *at *
L1* to be considered the same as the value* 2* for individual *y* at
*L1.1*but not the same as the value
*2* for any other individual within any other pair of columns.


genos<- data.frame(

    L1 = c(2,NA,1,3),
    L1 = c(1,NA,2,3),
    L2 = c(5,2,5,3),
    L2 = c(3,4,2,4),
    L3 = c(4,5,7,2),
    L3 = c(4,6,6,6) )

rownames(genos) = c("w","x","y","z")

> genos
     L1   L1.1 L2  L2.1 L3   L3.1
w    2    1     5    3      4      4
x   NA   NA  2    4      5      6
y    1    2     5     2      7      6
z    3    3     3     4      2      6



propshared<-function(genos){

    sapply( rownames(genos), function(ind1)
    sapply( rownames(genos), function(ind2)
    (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE ))) /length(genos[1,]))->x
    is.na(diag(x))<-TRUE
    x

}

> propshared(genos)
          w         x         y         z
w        NA 0.0000000 0.1666667 0.1666667
x 0.0000000        NA 0.1666667 0.3333333
y 0.1666667 0.1666667        NA 0.3333333
z 0.1666667 0.3333333 0.3333333        NA


The matrix I would like to have would look like this.
      w                   x                        y
   z
w    NA                 0                      0.333333333     0.166666667
x    0                    NA                   0.166666667      0.166666667
y    0.333333333    0.166666667    NA                    0.166666667
z    0.166666667    0.166666667    0.166666667      NA


Question 2:  Thanks if you have made it this far..........Next I would like
to calculate a randomized value of the mean proportion of shared alleles.
To do this I thought I would randomize the original data (genos above say
1000 times ), recalculate the proportion of shared alleles at each step and
then take the mean (my attempt below).   When I do this I get the same mean
proportion of shared alleles (or behaviours) as the original for every
randomization.  I assume that this is due to some property of permuting this
type of data that I do not know.  Does anyone have a recommendation as to
how I might get a value of the proportion of shared alleles if alleles were
distributed (again within loci) at random?


randomize <- function(genos){
    x <- apply(genos, 2, sample)
    rownames(x) <- rownames(genos)
    x
}


allele.permute<-function(genos, n){

    list<-replicate(n,randomize(genos), simplify = FALSE)
    sapply(list, propshared, simplify = FALSE)
}






I hope this is clear.  I appreciate all insights and input
Thanks

Grant

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] resampling from distributions

Reply via email to