On 02/01/2009 6:37 AM, Allan Clark wrote:
hello all
happy new year and hope you r having a good holiday.
i would like to calculate the expectation of a particular random variable and would like to approximate it using a number of the functions contained in R. decided to do some experimentation on a trivial example.
example
========
suppose x(i)~N(0,s2) where s2 = the variance
the prior for s2 = p(s2)~IG(a,b)
so the posterior is IG(.5*n+a, b+.5*sum(x^2) )
and the posterior expectation of s2 = (b+sum(x^2))/(.5*n+a-1)
I want to calculate the value of this expectation using integrals.
ie
let L(X|s2) = the likelihood function
so E(s2|X) = (integral(s2*L(X|s2)*p(s2)))/(integral(L(X|s2)*p(s2))) . the
bounds on both of the integrals is (0,Inf)
=========================================================================
three methods are used to calculate E(s2|X) for different sample sizes:
Method 1
=========
(integral(s2*L(X|s2)*p(s2)))/(integral(L(X|s2)*p(s2))) where the integrals are
calculated using the integrate function
Method 2
=========
transform s2 to tau (from (0,Inf) to (-1,1))
f(s2) = tau = 4*arctan(s2)/pi-1
so
E(s2|X) = (integral(jac*finv(s2)*L(X|finv(s2))*p(finv(s2))))/(integral(L(X|finv(s2))*p(finv(s2)))) . the bounds on both integrals is (-1,1)
jac = the jacobian and finv is the inverse function of f(s2)
once again the integrals are calculated using the intergrate function
Method 3
=========
legendre gaussian quadriture is used to calculate each of the integrals using Method 2.
Some results and comments are below.
method 3 seems to give the correct solution for n<=343
differences between the different methods can be detected from as early as n=3
for method 1
IS THERE A WAY TO MODIFY THE PROBLEM SO THAT THE EXPECTATION CAN BE CALCULATED FOR LARGE n without using MC or MCMC methods?
For large n, the density is highly peaked near its mode. The
integrate() function doesn't work well on functions like that, but you
can improve it by breaking the range up into three parts: the centre,
to the left, and to the right. If you define the centre so that almost
all of the mass will be there, then it doesn't matter if the other two
integrals are inaccurate. I'd suggest using the MLE plus or minus 5
standard errors. This should cover the mode unless the prior is very
informative.
You'll still have a problem of overflow when evaluating it. To fix
that, just rescale the density, so that its max is 1. (Since the
density occurs in both numerator and denominator, this won't affect the
final answer). In general it might be hard to find the max of the
density, but it's probably good enough to find the max of the
likelihood, or some other rough approximation, e.g. the density at the
MLE. The value 1 isn't important, all you need is an upper bound that
doesn't cause overflow.
Duncan Murdoch
P(1)
$correct
[1] 2.201234
$App.2
[1] 2.201234
$App.trans
[1] 2.201234
$App.gq
[1] 2.201234
#=========================================
P(100)
$correct
[1] 18.85535
$App.2
[1] 15.65814
$App.trans
[1] 18.85028
$App.gq
[1] 18.85535
#=========================================
P(300)
$correct
[1] 22.59924
$App.2
[1] NaN
$App.trans
[1] 18.85405
$App.gq
[1] 22.59924
#=========================================
P(500)
Error in integrate(num., 0, Inf) : non-finite function value
The code is included below:
P=function(n=1,howmany=500)
{
#you do need to have stamod
#howmany is used to set the number of points used for the gaussian quadriture
set.seed(1)
sigma=5
x=rnorm(n)*sigma
a=5
b=5
#the correct expectation
#===========================================================
correct=(b+sum(x^2)*.5)/(.5*n+a-1)
#Method 1
#===========================================================
num.=function(s2)
{
L=((2*pi*s2)^(-n*.5))*exp(-.5*sum(x^2)/s2)
P=(s2^(-(a+1)))*exp(-b/s2)*(b^a)/gamma(a)
return(s2*L*P)
}
den.=function(s2)
{
L=((2*pi*s2)^(-n*.5))*exp(-.5*sum(x^2)/s2)
P=(s2^(-(a+1)))*exp(-b/s2)*(b^a)/gamma(a)
return(L*P)
}
App.2=integrate(num.,0,Inf)$value/integrate(den.,0,Inf)$value
#Method 2
#===========================================================
num.t=function(s2.)
{
s2=tan(.25*pi*(1+s2.))
jac=.25*pi*(((tan(.25*pi*(1+s2.)))^2)+1)
num.tv=jac*exp(-(a+.5*n)*log(s2)-(b+sum(x^2)*.5)/s2)
return(num.tv)
}
den.t=function(s2.)
{
s2=tan(.25*pi*(1+s2.))
jac=.25*pi*(((tan(.25*pi*(1+s2.)))^2)+1)
den.tv=jac*exp(-(a+.5*n+1)*log(s2)-(b+sum(x^2)*.5)/s2)
return(den.tv)
}
App.trans=integrate(num.t,-1,1)$value/ integrate(den.t,-1,1)$value
#Method 3
#===========================================================
library(statmod)
ad.=gauss.quad(n=howmany,kind="legendre",alpha=-1,beta=-1)
n.=ad.$nodes
w.=ad.$weights
App.gq=sum(w.*num.t(n.))/sum(w.*den.t(n.))
list(correct=correct,App.2=App.2,App.trans=App.trans, App.gq=App.gq)
}
Allan Clark
========
Lecturer in Statistical Sciences Department
University of Cape Town
7701 Rondebosch
South Africa
TEL (Office): +27-21-650-3228
FAX: +27-21-650-4773
http://web.uct.ac.za/depts/stats/aclark.htm
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.