On Oct 18, 2013, at 1:12 AM, peter dalgaard <pda...@gmail.com> wrote:

> 
> On Oct 18, 2013, at 08:37 , David Winsemius wrote:
> 
>> 
>> On Oct 17, 2013, at 9:11 PM, Steven LeBlanc wrote:
>> 
>>> Greets,
>>> 
>>> I'm using nlminb() to estimate the parameters of a multivariate normal 
>>> random sample with missing values and ran into an unexpected result from my 
>>> call to dmvnorm()
>> 
>> There are at least 5 different version of dmvnorm. None of them are in the 
>> default packages.
>> 
>>> within the likelihood function. Particular details are provided below.
>> 
>> Complete? Except for the name of the package that has `dmvnorm`.
>> 
> 
> More importantly, it gives no clue as to the connection between sigma and the 
> data set. It is not the covariance matrix:

u and sigma in my original post come from the parameter vector passed by 
nlminb(); which is why I mentioned nlminb() above and provided the u and sigma 
explicitly passed by nlminb(). Sorry for any confusion.

> 
>> s <- scan(what=list("",0,0))
> 1: [1,]  0.84761637  3.994261
> 2: [2,]  0.91487059  4.952595
> 3: [3,]  0.84527267  4.521837
> ....
> 40: [40,]  0.65938218  5.209301
> 41: 
> Read 40 records
>> cor(s[[2]],s[[3]])
> [1] 0.8812403
>> colMeans(cbind(s[[2]],s[[3]]))
> [1] 1.252108 5.540686
>> var(cbind(s[[2]],s[[3]]))
>         [,1]     [,2]
> [1,] 1.284475 2.536627
> [2,] 2.536627 6.450582
> 
> These are not the u and sigma stated.
> 
> Furthermore the matrix given as sigma is not a covariance matrix. Try working 
> out the correlation coefficient:
> 
>> 2.2289513/sqrt(0.6464647*5.697834)
> [1] 1.161377
> 
> That should be enough to make any version of dmvnorm complain…

I hadn't thought to check the validity of the sigma constructed from nlminb(). 
Thank you for pointing this out. So it appears the problem lies in the 
parameters being estimated by nlminb() and has nothing to do with dmvnorm(). 
Perhaps it is something in the likelihood function I wrote or in nlminb() or my 
use of nlminb()? 

As I alluded to in my original post, there are two parts to the data set: 
'complete' (bivariate normal with no missing data), and 'deleted' (bivariate 
normal with one of the two samples missing). I used mvrnorm() to generate a 
sample of size 50, then deleted 10 values. I then split the result into the 
'complete' and 'deleted' you see below. The original undeleted data set can be 
generated with:

> mu<-c(1,5)
> sigma<-c(1,2,2,6)
> dim(sigma)<-c(2,2)
> set.seed(83165026)
> sample.full<-mvrnorm(50,mu,sigma)

Additional details are as follows. Pertinent facts about the output below: 
'theta.hat.em' is the result of an EM algorithm I wrote and I was trying to use 
nlminb() to get a sense of whether or not my answer is reasonable. Thus, when 
nlminb() choked with a different starting value I used something I thought to 
be close to the result. exact() intends to be the complete likelihood function 
for the 'complete' and 'missing' data sets passed to it.

Perhaps also noteworthy, when I use the complete 'undeleted' data set (n=50 
bivariate normal) and the missing data portion ('deleted' below), nlminb() 
returns what appears to be a reasonable result.

Best Regards,
Steven

> theta.hat.em
[1] 1.243821 5.536775 1.125628 5.823366 2.228952
> exact
function(theta,complete,deleted){
    one.only<-deleted[!(is.na(deleted[,1])),1]
    two.only<-deleted[!(is.na(deleted[,2])),2]
    u<-c(theta[1],theta[2])
    sigma<-c(theta[3],theta[5],theta[5],theta[4])
    dim(sigma)<-c(2,2)
    -sum(log(dmvnorm(x=complete,mean=u,sigma=sigma)))-
        sum(log(dnorm(one.only,u[1],sigma[1,1])))-
            sum(log(dnorm(two.only,u[2],sigma[2,2])))
}
> complete
             [,1]      [,2]
 [1,]  0.84761637  3.994261
 [2,]  0.91487059  4.952595
 [3,]  0.84527267  4.521837
 [4,]  2.53821358  8.374880
 [5,]  1.16646209  6.255022
 [6,]  0.94706527  4.169510
 [7,]  0.48813564  3.349230
 [8,]  3.71828469  9.441518
 [9,]  0.08953357  1.651497
[10,]  0.68530515  5.498403
[11,]  1.52771645  8.484671
[12,]  1.55710697  5.231272
[13,]  1.89091603  4.152658
[14,]  1.08483541  5.401544
[15,]  0.58125385  5.340141
[16,]  0.24473250  2.965046
[17,]  1.59954401  8.095561
[18,]  1.57656436  5.335744
[19,]  2.73976992  8.572871
[20,]  0.87720252  6.067468
[21,]  1.18403087  3.526790
[22,] -1.03145244  1.776478
[23,]  2.88197343  7.720838
[24,]  0.60705218  4.406073
[25,]  0.58083464  3.374075
[26,]  0.87913427  5.247637
[27,]  1.10832692  3.534508
[28,]  2.92698371  8.682130
[29,]  4.04115277 11.827360
[30,] -0.57913297  1.476586
[31,]  0.84804365  7.009075
[32,]  0.79497940  3.671164
[33,]  1.58837762  5.535409
[34,]  0.63412821  3.932767
[35,]  3.14032433  9.271014
[36,] -0.18183869  1.666647
[37,]  0.57535770  6.881830
[38,]  3.21417723 10.901636
[39,]  0.29207932  4.120408
[40,]  0.65938218  5.209301
> deleted
           [,1]     [,2]
 [1,]        NA 9.688308
 [2,]        NA 2.663027
 [3,]        NA 5.468419
 [4,]        NA 6.392642
 [5,] 0.9426628       NA
 [6,] 0.9009366       NA
 [7,] 0.2946175       NA
 [8,] 1.6123423       NA
 [9,] 1.3166623       NA
[10,] 1.2737043       NA
> nlminb(start=theta.hat.em,objective=exact,complete=complete,deleted=deleted,control=list(trace=1))
  0:     142.76785:  1.24382  5.53677  1.12563  5.82337  2.22895
  1:     142.76785:  1.26720  5.47504 0.646165  5.69783  2.22895
$par
[1] 1.2671979 5.4750445 0.6461647 5.6978339 2.2289513

$objective
[1] 142.7678

$convergence
[1] 1

$iterations
[1] 1

$evaluations
function gradient 
       3        5 

$message
[1] "false convergence (8)"

Warning messages:
1: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced
2: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced
3: In log(eigen(sigma, symmetric = TRUE, only.values = TRUE)$values) :
  NaNs produced
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to