Hi all, hope someone can help me out with this.
Background Introduction

I have a data set consisting of data collected from a questionnaire that I
wish to validate. I have chosen to use confirmatory factor analysis to
analyse this data set.
Instrument

The instrument consists of 11 subscales. There is a total of 68 items in
the 11 subscales. Each item is scored on an integer scale between 1 to 4.
Confirmatory factor analysis (CFA) setup

I use the sem package to conduct the CFA. My code is as below:

cov.mat <- as.matrix(read.table("http://dl.dropbox.com/u/1445171/cov.mat.csv";,
sep = ",", header = TRUE))
rownames(cov.mat) <- colnames(cov.mat)

model <- cfa(file = "http://dl.dropbox.com/u/1445171/cfa.model.txt";,
reference.indicators = FALSE)
cfa.output <- sem(model, cov.mat, N = 900, maxiter = 80000, optimizer
= optimizerOptim)
Warning message:In eval(expr, envir, enclos) : Negative parameter
variances.Model may be underidentified.

Straight off you might notice a few anomalies, let me explain.

   - Why is the optimizer chosen to be optimizerOptim?

ANS: I originally stuck with the default optimizerSem but no matter how
many iterations I run, either I run out of memory first (8GB RAM setup) or
it would report no convergence Things "seemed" a little better when I
switched to optimizerOptim where by it would conclude successfully but
throws up the error that the model is underidentified. Upon closer
inspection, I realise that the output shows convergence as TRUE but
iterations is NA so I am not sure what is exactly happening.

   - The maxiter is too high.

ANS: If I set it to a lower value, it refuses to converge, although as
mentioned above, I doubt real convergence actually occurred.
Problem

So by now I guess that the model is really underidentified so I looked for
resources to resolve this problem and found:

   - http://davidakenny.net/cm/identify_formal.htm
   - http://faculty.ucr.edu/~hanneman/soc203b/lectures/identify.html

I followed the 2nd link quite closely and applied the t-rule:

   - I have 68 observed variables, providing me with 68 variances and 2278
   covariances between variables = *2346 data points*.
   - I also have 68 regression coefficients, 68 error variances of
   variables, 11 factor variances and 55 factor covariances to estimate making
   it a total of 191 parameters.
   - Since I will be fixing the variances of the 11 latent factors to 1 for
   scaling, I would remove them from the parameters to estimate making it a
   total of *180 parameters to estimate*.
      - My degrees of freedom is therefore 2346 - 180 = 2166, making it an
      over identified model by the t-rule.

Questions

   1. Is the low variance of some of my items a possible cause for the
   underidentification? I was advised previously to remove items with zero
   variance which led me to think about items which are very close to zero.
   Should they be removed too?
   2. After reading much, I think but am not sure that it might be a case
   of empirical underidentification. Is there a systematic way of diagnosing
   what kind of underidentification it is? And what are my options to proceed
   with my analysis?

I have more questions but let's take it at these 2 for now. Thanks for any
help!

Regards,
Ruijie (RJ)

--------
He who has a why can endure any how.

~ Friedrich Nietzsche

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to