Second! Bert Gunter
On Wed, Apr 3, 2019 at 9:35 AM Richard M. Heiberger <r...@temple.edu> wrote: > fortune nomination. > > > The lesson to me here is that if you fit a sufficiently unreasonable > model to data, the computations may break down. > > On Wed, Apr 3, 2019 at 10:18 AM Fox, John <j...@mcmaster.ca> wrote: > > > > Dear Eric, > > > > I'm afraid that your argument doesn't make sense to me. As you saw when > you tried > > > > fit3 <- update(fit, subset = !(Name %in% c("Jupiter ", "Saturn > "))) > > > > glm.nb() effectively wasn't able to estimate the theta parameter of the > negative binomial model. So why would it be better to base deletion > diagnostics on actually refitting the model? > > > > The lesson to me here is that if you fit a sufficiently unreasonable > model to data, the computations may break down. Other than drawing > attention to the NaN with an explicit warning, I don't see what more could > usefully be done. > > > > Best, > > John > > > > > On Apr 2, 2019, at 9:08 PM, Eric Bridgeford <ericw...@gmail.com> > wrote: > > > > > > Hey John, > > > > > > I am aware they are high leverage points, and that the model is not the > > > best for them. The purpose of this dataset was to explore high leverage > > > points, and diagnostic statistics through which one would identify > them. > > > > > > What I am saying is that the current behavior of the function seems a > > > little non-specific to me; the influence for this problem is > > > finite/computable manually by fitting n models to n-1 points (manually > > > holding out each point individually to obtain the loo-variance, and > > > computing the influence in the non-approximate way). > > > > > > I am just suggesting that it seems the function could be improved by, > say, > > > throwing specific warnings when NaNs may arise. Ie, "Your have points > that > > > are very high leverage. The approximation technique is not numerically > > > stable for these points and the results should be used with caution" > > > etc...; I am sure there are other also pre-hoc approaches to diagnose > other > > > ways in which this function could fail). The approximation technique > not > > > behaving well for points that are ultra high leverage just seems > peculiar > > > that that would return an NaN with no other > recommendations/advice/specific > > > warnings, especially since the influence is frequently used to > diagnosing > > > this specific issue. > > > > > > Alternatively, one could afford an optional argument type="manual" that > > > computes the held-out variance manually rather than the approximate > > > fashion, and add a comment to use this in the help menu when you have > high > > > leverage points (this is what I ended up doing to obtain the true > influence > > > and the externally studentized residual). > > > > > > I just think some more specificity could be of use for future users, to > > > make the R:stats community even better :) Does that make sense? > > > > > > Sincerely, > > > Eric > > > > > > On Tue, Apr 2, 2019 at 7:53 PM Fox, John <j...@mcmaster.ca> wrote: > > > > > >> Dear Eric, > > >> > > >> Have you looked at your data? -- for example: > > >> > > >> plot(log(Moons) ~ Volume, data = moon_data) > > >> text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1, > > >> subset = Volume > 400) > > >> > > >> The negative-binomial model doesn't look reasonable, does it? > > >> > > >> After you eliminate Jupiter there's one very high leverage point left, > > >> Saturn. Computing studentized residuals entails an approximation to > > >> deleting that as well from the model, so try fitting > > >> > > >> fit3 <- update(fit, subset = !(Name %in% c("Jupiter ", "Saturn > "))) > > >> summary(fit3) > > >> > > >> which runs into numeric difficulties. > > >> > > >> Then look at: > > >> > > >> plot(log(Moons) ~ Volume, data = moon_data, subset = Volume < > 400) > > >> > > >> Finally, try > > >> > > >> plot(log(Moons) ~ log(Volume), data = moon_data) > > >> fit4 <- update(fit2, . ~ log(Volume)) > > >> rstudent(fit4) > > >> > > >> I hope this helps, > > >> John > > >> > > >> ----------------------------------------------------------------- > > >> John Fox > > >> Professor Emeritus > > >> McMaster University > > >> Hamilton, Ontario, Canada > > >> Web: https://socialsciences.mcmaster.ca/jfox/ > > >> > > >> > > >> > > >> > > >>> -----Original Message----- > > >>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Eric > > >>> Bridgeford > > >>> Sent: Tuesday, April 2, 2019 5:01 PM > > >>> To: Bert Gunter <bgunter.4...@gmail.com> > > >>> Cc: R-help <r-help@r-project.org> > > >>> Subject: Re: [R] Fwd: Potential Issue with lm.influence > > >>> > > >>> I agree the influence documentation suggests NaNs may result; > however, as > > >>> these can be manually computed and are, indeed, finite/existing (ie, > > >>> computing the held-out influence by manually training n models for n > > >> points > > >>> to obtain n leave one out influence measures), I don't possibly see > how > > >> the > > >>> function SHOULD return NaN, and given that it is returning NaN, that > > >>> suggests to me that there should be either a) Providing an > alternative > > >>> method to compute them that (may be slower) that returns the correct > > >>> results in the even that lm.influence does not return a good > > >> approximation > > >>> (ie, a command line argument for type="approx" that does the > > >>> approximation strategy employed currently, or an alternative > > >> type="direct" > > >>> or something like that that computes them manually), or b) a > heuristic to > > >>> suggest why NaNs might result from one's particular inputs/what can > be > > >>> done to fix it (if the approximation strategy is the source of the > > >> problem) or > > >>> what the issue is with the data that will cause NaNs. Hence I was > > >> looking to > > >>> start a discussion around the specific strategy employed to compute > the > > >>> elements. > > >>> > > >>> Below is the code: > > >>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, > 5L, > > >> 11L, > > >>> 12L, 9L, 10L, 4L, 6L, > > >> 3L), .Label = c("Ceres ", "Earth", > > >>> "Eris ", > > >>> > > >>> "Haumea ", "Jupiter ", "Makemake ", "Mars ", "Mercury ", > > >> "Neptune ", > > >>> > > >>> "Pluto ", "Saturn ", "Uranus ", "Venus "), class = "factor"), > > >>> Distance = c(0.39, 0.72, 1, 1.52, 2.75, > 5.2, > > >> 9.54, 19.22, > > >>> 30.06, 39.5, 43.35, 45.8, > > >> 67.7), Diameter = c(0.382, 0.949, > > >>> > > >>> 1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15, > > >>> > > >>> 0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8, > > >>> > > >>> 95.2, 14.6, 17.2, 0.0022, 7e-04, > 7e-04, > > >> 0.0025), Moons = c(0L, > > >>> > > >>> > > >>> 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), > > >> Volume = > > >>> c(0.0291869497930152, > > >>> > > >>> > > >>> > > >>> 0.447504348276571, 0.523598775598299, 0.0788376225681443, > > >>> > > >>> > > >>> > > >>> 0.000268082573106329, 737.393372232996, 441.729261571372, > > >>> > > >>> > > >>> > > >>> 33.6865588825666, 30.6549628355953, 0.00305362805928928, > > >>> > > >>> > > >>> > > >>> 0.00176714586764426, 0.00090477868423386, 0.00359136400182873 > > >>> > > >>> > > >>> )), row.names = c(NA, -13L), class = "data.frame") > > >>> > > >>> fit <- glm.nb(Moons ~ Volume, data = moon_data) > > >>> rstudent(fit) > > >>> > > >>> fit2 <- update(fit, subset = Name != "Jupiter ") > > >>> rstudent(fit2) > > >>> > > >>> influence(fit2)$sigma > > >>> > > >>> # 1 2 3 4 5 7 8 > 9 > > >>> 10 11 12 13 > > >>> # 1.077945 1.077813 1.165025 1.181685 1.077954 NaN 1.044454 > 1.152110 > > >>> 1.187586 1.181696 1.077954 1.165147 > > >>> > > >>> Sincerely, > > >>> Eric > > >>> > > >>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4...@gmail.com> > > >>> wrote: > > >>> > > >>>> Also, I suggest you read ?influence which may explain the source of > > >>>> your NaN's . > > >>>> > > >>>> Bert Gunter > > >>>> > > >>>> "The trouble with having an open mind is that people keep coming > along > > >>>> and sticking things into it." > > >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >>>> > > >>>> > > >>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4...@gmail.com> > > >>> wrote: > > >>>> > > >>>>> I told you already: **Include code inline ** > > >>>>> > > >>>>> See ?dput for how to include a text version of objects, such as > data > > >>>>> frames, inline. > > >>>>> > > >>>>> Otherwise, I believe .txt text files are not stripped if you insist > > >>>>> on > > >>>>> *attaching* data or code. Others may have better advice. > > >>>>> > > >>>>> > > >>>>> Bert Gunter > > >>>>> > > >>>>> "The trouble with having an open mind is that people keep coming > > >>>>> along and sticking things into it." > > >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > >>>>> > > >>>>> > > >>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericw...@gmail.com > > > > >>>>> wrote: > > >>>>> > > >>>>>> How can I add attachments? The following two files were attached > in > > >>>>>> the initial message > > >>>>>> > > >>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter < > bgunter.4...@gmail.com> > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Nothing was attached. The r-help server strips most attachments. > > >>>>>>> Include your code inline. > > >>>>>>> > > >>>>>>> Also note that > > >>>>>>> > > >>>>>>>> 0/0 > > >>>>>>> [1] NaN > > >>>>>>> > > >>>>>>> so maybe something like that occurs in the course of your > > >> calculations. > > >>>>>>> But that's just a guess, so feel free to disregard. > > >>>>>>> > > >>>>>>> > > >>>>>>> Bert Gunter > > >>>>>>> > > >>>>>>> "The trouble with having an open mind is that people keep coming > > >>>>>>> along and sticking things into it." > > >>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip > ) > > >>>>>>> > > >>>>>>> > > >>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford > > >>>>>>> <ericw...@gmail.com> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>>> Hi R core team, > > >>>>>>>> > > >>>>>>>> I experienced the following issue with the attached data/code > > >>>>>>>> snippet, where the studentized residual for a single observation > > >>>>>>>> appears to be NaN given finite predictors/responses, which > appears > > >>>>>>>> to be driven by the glm.influence method in the stats package. I > > >>>>>>>> am curious to whether this is a consequence of the specific > > >>>>>>>> implementation used for computing the influence, which it would > > >>>>>>>> appear is the driving force for the NaN influence for the point, > > >>>>>>>> that I was ultimately able to trace back through the > lm.influence > > >>>>>>>> method to this specific line < > > >>>>>>>> https://github.com/SurajGupta/r- > > >>> source/blob/a28e609e72ed7c47f6ddfb > > >>>>>>>> b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67 > > >>>>>>>>> > > >>>>>>>> which > > >>>>>>>> calls C code which calls iminfl.f > > >>>>>>>> < > > >>>>>>>> > https://github.com/SurajGupta/r-source/blob/master/src/library/sta > > >>>>>>>> ts/src/lminfl.f > > >>>>>>>>> > > >>>>>>>> (I > > >>>>>>>> don't know fortran so I can't debug further). My understanding > is > > >>>>>>>> that the specific issue would have to do with the leave-one-out > > >>>>>>>> variance estimate associated with this particular point, which > it > > >>>>>>>> seems based on my understanding should be finite given finite > > >>>>>>>> predictors/responses. Let me know. Thanks! > > >>>>>>>> > > >>>>>>>> Sincerely, > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Eric Bridgeford > > >>>>>>>> ericwb.me > > >>>>>>>> ______________________________________________ > > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, > see > > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>>>>>>> PLEASE do read the posting guide > > >>>>>>>> http://www.R-project.org/posting-guide.html > > >>>>>>>> and provide commented, minimal, self-contained, reproducible > code. > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> Eric Bridgeford > > >>>>>> ericwb.me > > >>>>>> > > >>>>> > > >>> > > >>> -- > > >>> Eric Bridgeford > > >>> ericwb.me > > >>> > > >>> [[alternative HTML version deleted]] > > >>> > > >>> ______________________________________________ > > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >>> https://stat.ethz.ch/mailman/listinfo/r-help > > >>> PLEASE do read the posting guide http://www.R-project.org/posting- > > >>> guide.html > > >>> and provide commented, minimal, self-contained, reproducible code. > > >> > > > > > > > > > -- > > > Eric Bridgeford > > > ericwb.me > > > > > > [[alternative HTML version deleted]] > > > > > > ______________________________________________ > > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.