Hello all I have been working to fix this for weeks now, It should be simple to fix. Please help
Let me explain what I am doing, I have a data set for 65 countries over a period of 9 years (2000-2008). Each country has on an average say 2000 interviews, so that the total set has roughly 65*9*2000 data points/observations (of course there are missing vales as well). Now let me explain how are the data clustered or grouped. I use the variable "yearctry" which is computed as year*10000+ international phone code of the country, say for example USA with calling code 001 for the year 2000 will have a yearctry value = 2000001. Under this particular value of yearctry of 2000001 there are roughly 2000 observations, next for the same year for say UK the yearctry value would be 2000044 (having roughly 2000 observations) , and similarly so on for the rest of the 63 countries for the year 2000 and all other years from 2000 to 2008. For say the year 2001, the values of yearctry for USA and UK would be 2001001 and 2001044 respectively (again 2000 obseravations for each country roughly) and so on for the other 63 countries as well. So the data set is *grouped/clustered using "yearctry"* I am trying to look into a selection bias if any within each "yearctry" (ie 2000 observation for one country for 9 years and so on for 65 countries) value, essentially therefore I wish to check for 65*9 values of "yearctry" with each "yearctry" having 2000 observations roughly. Hence I use the glm/probit to look into the selection bias where all my dependant variable "s" are either 0 or 1. The formula *myProbit<- glm(s ~ age + gender + gemedu + gemhinc + es_gdppc + imf_pop + estbbo_m, family = binomial(link = "probit"), data = adpopdata)* is the Heckman selection equation based on all observations without taking into account the fact that each "yearctry" is unique, I want the selection equation to recognise the uniqueness of each "yearctry" value , takes one "yearctry" at a time, estimates the probit, goes to the next "yearctry" repeats the probit regression and then give me the result. At the moment I do not accomplish that using the above formula. The above formula does regression on a bulk basis, but I wish that it recognises one yearctry from the other and then performs the regression for all yearctry values and finally produces me the result Is there any other model recommended that should do the job other than the glm???If Yes please help how? Let me give you the exact command that Stata uses, so that things become very clear: *xtprobit s age gender gemeduc gemhinc es_gdppc imf_pop estbbo_m, i(yearctry)* This does exactly what I wish to accomplish in R, ie does the heckman selection equation for the selection variables (seven in my case) based upon the uniqueness of "yearctrty" I have worked weeks on this, kindly help me, I think it is a small issue to fix in the equation, although since I am new to R, I do not exactly know what exactly will fix my problem, so any help will be highly appreciated Thanks -- Dr.Saurav Pathak PhD, Univ.of.Florida Mechanical Engineering Doctoral Student Innovation and Entrepreneurship Imperial College Business School s.patha...@imperial.ac.uk 0044-7795321121 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.