[R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA

saurav pathak Thu, 16 Jul 2009 03:20:04 -0700

Hello all

I have been working to fix this for weeks now, It should be simple to fix.
Please help


Let me explain what I am doing, I have a data set for 65 countries over a
period of 9 years (2000-2008). Each country has on an average say 2000
interviews, so that the total set has roughly 65*9*2000 data
points/observations (of course there are missing vales as well). Now let me
explain how are the data clustered or grouped. I use the variable "yearctry"
which is computed as year*10000+ international phone code of the country,
say for example USA with calling code 001 for the year 2000 will have a
yearctry value = 2000001. Under this particular value of yearctry of 2000001
there are roughly 2000 observations, next for the same year for say UK the
yearctry value would be 2000044 (having roughly 2000 observations) , and
similarly so on for the rest of the 63 countries for the year 2000 and all
other years from 2000 to 2008. For say the year 2001, the values of yearctry
for USA and UK would be 2001001 and 2001044 respectively (again 2000
obseravations for each country roughly) and so on for the other 63 countries
as well. So the data set is *grouped/clustered using "yearctry"*

I am trying to look into a selection bias if any within each "yearctry" (ie
2000 observation for one country for 9 years and so on for 65 countries)
value, essentially therefore I wish to check for 65*9 values of "yearctry"
with each "yearctry" having 2000 observations roughly. Hence I use the
glm/probit to look into the selection bias where all my dependant variable
"s" are either  0 or 1. The formula

*myProbit<- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc +
imf_pop + estbbo_m, family = binomial(link = "probit"), data =
adpopdata)*

is the Heckman selection equation based on all observations without taking
into account the fact that each "yearctry" is unique, I want the selection
equation to recognise the uniqueness of each "yearctry" value , takes one
"yearctry" at a time, estimates the probit, goes to the next "yearctry"
repeats the probit regression and then give me the result. At the moment I
do not accomplish that using the above formula. The above formula does
regression on a bulk basis, but I wish that it recognises one yearctry from
the other and then performs the regression for all yearctry values and
finally produces me the result

Is there any other model recommended that should do the job other than the
glm???If Yes please help how?

Let me give you the exact command that Stata uses, so that things become
very clear:

*xtprobit s age gender gemeduc gemhinc es_gdppc imf_pop estbbo_m,
i(yearctry)*

This does exactly what I wish to accomplish in R, ie does the heckman
selection equation for the selection variables (seven in my case) based upon
the uniqueness of "yearctrty"

I have worked weeks on this, kindly help me, I think it is a small issue to
fix in the equation, although since I am new to R, I do not exactly know
what exactly will fix my problem, so any help will be highly appreciated
Thanks

-- 
Dr.Saurav Pathak
PhD, Univ.of.Florida
Mechanical Engineering
Doctoral Student
Innovation and Entrepreneurship
Imperial College Business School
s.patha...@imperial.ac.uk
0044-7795321121

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] PROBIT REGRESSION FOR GROUPED/CLUSTERED DATA

Reply via email to