Re: [R] How long to wait for process?

Michael Friendly Thu, 27 Jul 2017 06:09:25 -0700

Rather than go to a penalized GLM, you might be better off investigatingthe sources of quasi-perfect separation and simplifying the model toavoid or reduce it. In your data set you have several factors withlarge number of levels, making the data sparse for all their combinations.

Like multicolinearity, near perfect separation is a data problem, and isoften better solved by careful thought about the model, rather thanwrapping the data in a computationally intensive band aid.


-Michael

On 7/26/2017 10:14 AM, john polo wrote:

UseRs,
I have a dataframe with 2547 rows and several hundred columns in R3.1.3. I am trying to run a small logistic regression with a subset ofthe data.
know_fin ~comp_grp2+age+gender+education+employment+income+ideol+home_lot+home+county
     > str(knowf3)
     'data.frame':   2033 obs. of  18 variables:
$ userid : Factor w/ 2542 levels "FNCNM1639","FNCNM1642",..:1857 157 965 1967 164 315 849 1017 699 189 ...
     $ round_id   : Factor w/ 1 level "Round 11": 1 1 1 1 1 1 1 1 1 1 ...
     $ age       : int  67 66 44 27 32 67 36 76 70 66 ...
$ county: Factor w/ 80 levels "Adair","Alfalfa",..: 75 75 75 75 7575 64 64 64 64 ...
     $ gender    : Factor w/ 2 levels "0","1": 1 2 1 1 2 1 2 1 2 2 ...
$ education : Factor w/ 8 levels "1","2","3","4",..: 6 7 6 8 2 4 24 2 6 ...$ employment: Factor w/ 9 levels "1","2","3","4",..: 8 4 4 4 3 8 58 4 4 ...$ income : num 550000 80000 90000 19000 42000 30000 18000 50000800000 10000 ...
     $ home: num  0 0 0 0 0 0 0 0 0 0 ...
$ ideol : Factor w/ 7 levels "1","2","3","4",..: 2 7 4 3 2 4 23 2 6 ...
     $ home_lot  : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 3 3 1 2 ...
     $ hispanic  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ comp_grp2 : Factor w/ 16 levels "Cr_Gr","Cr_Ot",..: 13 13 13 1313 13 10 10 10 10 ...
     $ know_fin : Factor w/ 3 levels "0","1","2": 2 2 2 2 2 2 2 2 2 2 ...
With the regular glm() function, I get a warning about "perfect orquasi-perfect separation"[1]. I looked for a method to deal with thisand a penalized GLM is an accepted method[2]. This is implemented inlogistf(). I used the default settings for the function.
Just before I run the model, memory.size() for my session is ~4500 (MB).memory.limit() is ~25500. When I start the model, R immediately becomesnon-responsive. This is in a Windows environment and in Task Manager,the instance of R is, and has been, using ~13% of CPU aand ~4997 MB ofRAM. It's been ~24 hours now in that state and I don't have any idea ofhow long this should take. If I run the same model in the same settingwith the base glm(), the model runs in about 60 seconds. Is there a wayto know if the process is going to produce something useful after allthis time or if it's hanging on some kind of problem?
[1]:https://stats.stackexchange.com/questions/11109/how-to-deal-with-perfect-separation-in-logistic-regression#68917[2]:https://academic.oup.com/biomet/article-abstract/80/1/27/228364/Bias-reduction-of-maximum-likelihood-estimates


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How long to wait for process?

Reply via email to