[R] Combining multiple probability weights for the sample() function.

Benjamin Ward (ENV) Tue, 02 Jun 2015 06:08:11 -0700

Dear R-List,

I have a set of possibilities I want to sample from:


bases <- list(c('A', 'C'), c('A', 'G'), c('C', 'T'))
possibilities <- as.matrix(expand.grid(bases))

>possibilities
Var1 Var2 Var3
[1,] "A"  "A"  "C"
[2,] "C"  "A"  "C"
[3,] "A"  "G"  "C"
[4,] "C"  "G"  "C"
[5,] "A"  "A"  "T"
[6,] "C"  "A"  "T"
[7,] "A"  "G"  "T"
[8,] "C"  "G"  "T"

If I want to randomly sample one of these rows. If I do this, I find that it is 
25% likely that my choice will have an identical first and last letter (e.g. 
[1,] "A"  "A"  "C"). It is also 25% likely that my choice will have an 
identical first and third letter (e.g. [4,] "C"  "G"  "C"). It is not likely at 
all that the second and third letter of my choice could be identical.

What I would like to do, is sample one of the rows, but given the constraint 
that the probability of drawing identical letters 1 and 2 should be 50% or 0.5, 
and at the same time the probability of drawing identical letters 1 and 3 
should be 50%. I am unsure on how to do this, but I know it involves coming up 
with a modified set of weights for the sample() function. My progress is below, 
any advice is much appreciated.

Best Wishes,

Ben Ward, UEA.


So I have used the following code to come up with a matrix, which contains 
weighting according to each criteria:

possibilities <- as.matrix(expand.grid(bases))
  identities <- apply(possibilities, 1, function(x) c(x[1] == x[2], x[1] == 
x[3], x[2] == x[3]))
  prob <- matrix(rep(0, length(identities)), ncol = ncol(identities))
  consProb <- apply(identities, 1, function(x){0.5 / length(which(x))})
  polProb <- apply(identities, 1, function(x){0.5 / length(which(!x))})
  for(i in 1:nrow(identities)){
    prob[i, which(identities[i,])] <- consProb[i]
    prob[i, which(!identities[i,])] <- polProb[i]
  }
  rownames(prob) <- c("1==2", "1==3", "2==3")
  colnames(prob) <- apply(possibilities, 1, function(x)paste(x, collapse = ", 
"))

This code gives the following matrix:

                A, A, C    C, A, C          A, G, C        C, G, C       A, A, 
T         C, A, T       A, G, T       C, G, T
1==2 0.25000000 0.08333333 0.08333333 0.08333333 0.25000000 0.08333333 
0.08333333 0.08333333
1==3 0.08333333 0.25000000 0.08333333 0.25000000 0.08333333 0.08333333 
0.08333333 0.08333333
2==3 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 0.06250000 
0.06250000 0.06250000

Each column is one of the choices from 'possibilities', and each row gives a 
series of weights based on three different criteria:

Row 1, that if it possible from the choices for letter 1 == letter 2, that 
combined chance be 50%.
Row 2, that if it possible from the choices for letter 1 == letter 3, that 
combined chance be 50%.
Row 3, that if it possible from the choices for letter 2 == letter 3, that 
combined chance be 50%.

So:

 If I used sample(x = 1:now(possibilities), size = 1, prob = prob[1,]) 
repeatedly, I expect about half the choices to contain identical letters 1 and 
2.

 If I used sample(x = 1:now(possibilities), size = 1, prob = prob[2,]) 
repeatedly, I expect about half the choices to contain identical letters 1 and 
3.

If I used sample(x = 1:now(possibilities), size = 1, prob = prob[3,]) 
repeatedly, I expect about half the choices to contain identical letters 2 and 
3. Except that in this case, since it is not possible.

Note each row sums to 1.

What I would like to do - if it is possible - is combine these three sets of 
weights into one set, that when used with
sample(x = 1:nrow(possibilities, size = 1, prob = MAGICPROB) will give me a 
list of choices, where ~50% of them contain identical letters 1 and 2, AND ~50% 
of them contain identical letters 1 and 3, AND ~50% again contain identical 
letters 2 and 3 (except in this example as it is not possible from the choices).

Can multiple probability weightings be combined in such a manner?




        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Combining multiple probability weights for the sample() function.

Reply via email to