Hello all,

I am working on a longitudinal study of children in the UK and trying the PAN 
package for imputation of missing data, since it fulfils the critical criteria 
of taking into account individual subject trend over time as well as population 
trend over time.  In order to validate the procedure I have started by deleting 
some known values …we have 6 annual measures of height on 300 children and I 
have imputed the missing values using PAN and compared the imputed values to 
the real values I deleted - in most individuals the imputed values fit the 
individual trend extremely well! However, when looking at the trend over time 
for a handful of individuals, the imputed value was actually lower than the 
previous (real) value of height or higher than the next (real) value making it 
appear that height went down…which in reality it never does…so my question is 
why, when it seems to work so well for the majority of individuals, does this 
happen? Am I doing something wrong?
As a novice user of R (and new to this area of statistics) I wondered if anyone 
could possibly  point me in the right direction, since the mixed effect design 
(plus potential ease and speed) of the PAN procedure for longitudinal data 
imputation is very appealing...
I would very much appreciate any advice you could give me, many thanks in 
advance.

Jo Hosking

Code and a small sample data are shown below (I could supply more data to 
anyone willing!)...

impht.data <-read.delim ("impht_long_trunc.dat",header = TRUE)
impht.data$sex <-factor(impht.data$sex,label = c("Boys","Girls"))
impht.data$visit <- factor (impht.data$visit)
impht.data$code <- factor (impht.data$code)

y <- impht.data$htmiss
subj <- impht.data$code
pred <- cbind (impht.data$age, impht.data$sex, impht.data$visit)
xcol <- 1:3
zcol <- 1
prior <- list(a=1, Binv=1, c=1, Dinv=1)
ht1 <- pan(y, subj, pred, xcol, zcol, prior, seed=13579, iter=1000)

code    sex     visit   age     ht      htmiss
1       2       1       4.87    105     105
1       2       2       5.86    109.6
1       2       3       6.88    116.4   116.4
1       2       4       7.72    121.2   121.2
1       2       5       8.72    126.7   126.7
1       2       6       9.71    132.3   132.3
2       2       1       4.84    107.1   107.1
2       2       2       6       115.7   115.7
2       2       3       6.86    121.4   121.4
2       2       4       7.69    126.5   126.5
2       2       5       8.7     134.15  134.15
2       2       6       9.76    140
3       2       1       4.62    103     103
3       2       2       5.69    108.9   108.9
3       2       3       6.87    115.1
3       2       4       7.55    118.6   118.6
3       2       5       8.46    123.6   123.6
3       2       6       9.63    128.9   128.9

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to