Karl,
I strongly support Chuck's recommendations.
If you do still want to compute such probabilities 'by hand',
you could consider the lchoose() function which does work
for your example.
-Peter Ehlers
On 2010-03-30 9:55, Charles C. Berry wrote:
> On Tue, 30 Mar 2010, Karl Brand wrote:
>
> > Dear R Users,
> >
> > I employed the phyper() function to estimate the likelihood that the
> > number of genes overlapping between 2 different lists of genes is due
> > to chance. This appears to work appropriately.
> >
> > Now i want to try this with 3 lists of genes which phyper() does not
> > appear to support.
> >
> > Some googling suggests i can utilize the Multivariate hypergeometric
> > distribution to achieve this. eg.:
> >
> > http://en.wikipedia.org/wiki/Hypergeometric_distribution
> >
> > But when i try to do this manually using the choose() function (see
> > attempt below example with just two gene lists) i'm unable to perform
> > the calculations- the numbers hit infinity before getting an answer.
> >
> > Searching cran archives for "Multivariate hypergeometric" show this
> > term in the vignettes of package's ‘combinat’ and ‘forward’. But i'm
> > unable to make sense of the these pachakege functions in the context
> > of my aforementioned apllication.
> >
> > Can some one suggest a function, script or method to achieve my goal
> > of estimating the likelyhood of overlap between 3 lists of genes,
> > ideally using the multivariate hypergeometric, or anything else for
> > that matter?
>
>
> Two suggestions:
>
> 1) Don't! Likely the theory is unsuited for the application. In
> most applications that generate lists of genes, the genes are
> not iid realizations and the hypergeometric gives results that
> are astonishingly anticonservative. As an alternative , the
> block bootstrap may be suitable. See
> http://171.66.122.45/cgi/content/abstract/17/6/760
>
> and Google (scholar) 'genomic "block bootstrap"' for some
> starting points.
>
>
> 2) Take this thread to the bioconductor list. You are much
> more likely to get pointers to useful packages and functions
> for genomic statistical software there.
>
> HTH,
>
> Chuck
>
>
> >
> > cheers in advance,
> >
> > Karl
> >
> >
> >
> > #example attempt with two gene lists m & n
> > N <- 45101 # total number balls in urn
> > m <- 720 # number of 'white' or 'special' balls in urn, aka 'success'
> > n <- 801 # number balls drawn or number of samples
> > k <- 40 # number of 'white' or 'special' balls DRAWN
> >
> > a <- choose(m,k)
> > b <- choose((N-m),(n-k))
> > z <- choose(N,n)
> > prK <- (a*b)/z #'the answer'
> > print(prK)
> > [1] NaN
> >
> > > a
> > [1] 7.985852e+65
> > > b
> > [1] Inf
> > > z
> > [1] Inf
> >
> >
> > --
> > Karl Brand
> > Department of Genetics
> > Erasmus MC
> > Dr Molewaterplein 50
> > 3015 GE Rotterdam
> > T +31 (0)10 704 3457 | F +31 (0)10 704 4743 | M +31 (0)642 777 268
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
> Charles C. Berry (858) 534-2098
> Dept of Family/Preventive Medicine
> E mailto:cbe...@tajo.ucsd.edu UC San Diego
> http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego
> 92093-0901
>
>
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.