Re: [R] resampling from distributions

Grant Gillis Sat, 19 Apr 2008 13:38:47 -0700

I am sorry for the incorrect subject.  My subject autofilled without my
noticing in time.  I suppose a better subject would be Calculating
proportion of shared occurances and randomizations.


Grant

2008/4/19 Grant Gillis <[EMAIL PROTECTED]>:

> Hello All,
>
> Once again thanks for all of the help to date.  I am climbing my R
> learning curve.  I've got a few more questions that I hope I can get some
> guidance on though.   I am not sure whether the etiquette is to break up
> multiple questions or not but I'll keep them together here for now as it may
> help put the questions in context despite the fact that the post may get a
> little long.
>
>
> Question 1:
>
>
> My first goal is to calculate the proportion of shared 1) behaviours and
> 2) alleles between numerous individuals.  Pasted below ('propshared'
> function) is what I have now and and works very well for calculating the
> proportion of shared behaviours where the data is formatted with each column
> as a behaviour and each row an individual.  Microsatellite genotypes are
> formatted differently.  An example is below.  Each row is an individual and
> each column is one allele from a single locus.  From the below values L1
> and L1.1 each give a copy of an allele for same locus.  Occasionally values
> from different loci will have the same value altough these are not actually
> the same allele.
>
> I would like the calculation of the proportion of shared values for
> alleles to be restricted to the proportion of shared alleles within loci for
> all individuals (pairs of columns L1 and L1.1, L2 and L2.2....)  What I have
> now calculates the proportion of shared values for alleles across loci.  A
> specific example is that I would like the value *2* for individual *w *at
> *L1* to be considered the same as the value* 2* for individual *y* at *
> L1.1* but not the same as the value *2* for any other individual within
> any other pair of columns.
>
>
> genos<- data.frame(
>
>     L1 = c(2,NA,1,3),
>     L1 = c(1,NA,2,3),
>     L2 = c(5,2,5,3),
>     L2 = c(3,4,2,4),
>     L3 = c(4,5,7,2),
>     L3 = c(4,6,6,6) )
>
> rownames(genos) = c("w","x","y","z")
>
> > genos
>      L1   L1.1 L2  L2.1 L3   L3.1
> w    2    1     5    3      4      4
> x   NA   NA  2    4      5      6
> y    1    2     5     2      7      6
> z    3    3     3     4      2      6
>
>
>
> propshared<-function(genos){
>
>     sapply( rownames(genos), function(ind1)
>     sapply( rownames(genos), function(ind2)
>     (sum( genos[ind1,] == genos[ind2,],na.rm=TRUE )))
> /length(genos[1,]))->x
>     is.na(diag(x))<-TRUE
>     x
>
> }
>
> > propshared(genos)
>           w         x         y         z
> w        NA 0.0000000 0.1666667 0.1666667
> x 0.0000000        NA 0.1666667 0.3333333
> y 0.1666667 0.1666667        NA 0.3333333
> z 0.1666667 0.3333333 0.3333333        NA
>
>
> The matrix I would like to have would look like this.
>       w                   x                        y
>      z
> w    NA                 0                      0.333333333     0.166666667
> x    0                    NA                   0.166666667
> 0.166666667
> y    0.333333333    0.166666667    NA                    0.166666667
> z    0.166666667    0.166666667    0.166666667      NA
>
>
> Question 2:  Thanks if you have made it this far..........Next I would
> like to calculate a randomized value of the mean proportion of shared
> alleles.   To do this I thought I would randomize the original data (genos
> above say 1000 times ), recalculate the proportion of shared alleles at each
> step and then take the mean (my attempt below).   When I do this I get the
> same mean proportion of shared alleles (or behaviours) as the original for
> every randomization.  I assume that this is due to some property of
> permuting this type of data that I do not know.  Does anyone have a
> recommendation as to how I might get a value of the proportion of shared
> alleles if alleles were distributed (again within loci) at random?
>
>
> randomize <- function(genos){
>     x <- apply(genos, 2, sample)
>     rownames(x) <- rownames(genos)
>     x
> }
>
>
> allele.permute<-function(genos, n){
>
>     list<-replicate(n,randomize(genos), simplify = FALSE)
>     sapply(list, propshared, simplify = FALSE)
> }
>
>
>
>
>
>
> I hope this is clear.  I appreciate all insights and input
> Thanks
>
> Grant
>
>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] resampling from distributions

Reply via email to