It is mysterious to me why the procedure proposed by Stefan Evert works.
It appears to work --- once you modify the call to binom.test() to have the
correct syntax. In a sequence of 1000 trials with random values of N, x,
and p0, the answers from Evert's procedure agreed with the answer given
by uniroot() to within +/- 3.045e-05.
However your question was (in effect) how to solve the equation
Pr(X <= x) = p0
for p, where X ~ Binom(N,p), with N and x known. What this has to do with
confidence intervals for p is, to my mind at least, completely opaque.
In contrast it is obvious why the procedure using uniroot() works.
I would suggest that you stick with the uniroot() procedure in that it is
readily comprehensible.
cheers,
Rolf Turner
On 10/11/13 03:56, Benjamin Ward (ENV) wrote:
Hi,
Thank you for your answers, I'm not completely sure if it's to bino.test I need
or the uniroot. Perhaps I should explain more the idea behind the code and the
actual task I'm trying to do. The idea is to calculate a confidence interval as
to the age of two DNA sequences which have diverged, where I know the number of
mutations that happened in them, and I know the mutation rate.
The binomial probability can be used since, mutations have a probability of
occurring or being observed so many times in a sequence. This is dependent on
the length of the DNA stretch (which equates to the number of trials since each
base is a possibility of observing a mutation), the probability of a single
mutation occurring which is p = t * u, since more time means a higher
probability a mutation may have occurred.
So my code, using pbinom, is supposed to calculate the probability that my DNA
stretches contain the number of mutations observed P(X = k), given their size
(trials) and the probability of a single mutation (p = t * u). However I'm
interested in finding t: t is what is unknown, so the loop repeatedly evaluates
the calculation, increasing t each time and checking P(X=k), when it is 0.05,
0.50 and 0.95, we record t.
Ideally I'd like to rearrange this so I can get the probability of a single
success (mutation) p, and then divide by the mutation rate to get my t. My
supervisor gave my the loopy code but I imagine there is a way to plug in
P(X=k) as 0.05 and 0.95 and get my upper and lower t estimates.
According to the R built in docs:
binom.test
Description:
Performs an exact test of a simple null hypothesis about the
probability of success in a Bernoulli experiment.
Perhaps this is the one I need rather than uniroot?
Best,
Ben.
________________________________________
From: Stefan Evert [stefa...@collocations.de]
Sent: 10 October 2013 09:37
To: R-help Mailing List
Cc: Benjamin Ward (ENV)
Subject: Re: [R] Small p from binomial probability function.
Sounds like you want a 95% binomial confidence interval:
binom.test(N, P)
will compute this for you, and you can get the bounds directly with
binom.test(N, P)$conf.int
Actually, binom.test computes a two-sided confidence interval, which
corresponds roughly to 2.5 and 97.5 percentages in your approach. It doesn't
give you the 50% point either, but I don't think that's a meaningful quantity
with a two-sided test.
Hope this helps,
Stefan
On 9 Oct 2013, at 15:53, Benjamin Ward (ENV) <b.w...@uea.ac.uk> wrote:
I got given some code that uses the R function pbionom:
p <- mut * t
sumprobs <- pbinom( N, B, p ) * 1000
Which gives the output of a probability as a percentage like 5, 50, 95.
What the code currently does is find me the values of t I need, by using the
above two code lines in a loop, each iteration it increaces t by one and runs
the two lines. When sumprobs equals 5, it records the value t, then again when
sumprobs is equal to 50, and again when sumprobs is equal to 95 - giving me
three t values. This is not an efficient way of doing this if t is large. Is it
possible to rearrange pbinom so it gives me the small p (made of mut*t) as the
result of plugging in the sumprobs instead, and is there an R function that
already does this?
Since pbinom is the binomial probability equation I suppose the question is -
in more mathematical terminology - can I change this code so that instead of
calculating the Probability of N successes given the number of trials and the
probability of a single success, can I instead calculate the probability of a
single success using the probability of N successes and number of trials, and
the number of successes? Can R do this for me. So instead I plug in 5, 50, and
95, and then get the small p out?
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.