fortune nomination.
The lesson to me here is that if you fit a sufficiently unreasonable model to data, the computations may break down. On Wed, Apr 3, 2019 at 10:18 AM Fox, John <j...@mcmaster.ca> wrote: > > Dear Eric, > > I'm afraid that your argument doesn't make sense to me. As you saw when you > tried > > fit3 <- update(fit, subset = !(Name %in% c("Jupiter ", "Saturn "))) > > glm.nb() effectively wasn't able to estimate the theta parameter of the > negative binomial model. So why would it be better to base deletion > diagnostics on actually refitting the model? > > The lesson to me here is that if you fit a sufficiently unreasonable model to > data, the computations may break down. Other than drawing attention to the > NaN with an explicit warning, I don't see what more could usefully be done. > > Best, > John > > > On Apr 2, 2019, at 9:08 PM, Eric Bridgeford <ericw...@gmail.com> wrote: > > > > Hey John, > > > > I am aware they are high leverage points, and that the model is not the > > best for them. The purpose of this dataset was to explore high leverage > > points, and diagnostic statistics through which one would identify them. > > > > What I am saying is that the current behavior of the function seems a > > little non-specific to me; the influence for this problem is > > finite/computable manually by fitting n models to n-1 points (manually > > holding out each point individually to obtain the loo-variance, and > > computing the influence in the non-approximate way). > > > > I am just suggesting that it seems the function could be improved by, say, > > throwing specific warnings when NaNs may arise. Ie, "Your have points that > > are very high leverage. The approximation technique is not numerically > > stable for these points and the results should be used with caution" > > etc...; I am sure there are other also pre-hoc approaches to diagnose other > > ways in which this function could fail). The approximation technique not > > behaving well for points that are ultra high leverage just seems peculiar > > that that would return an NaN with no other recommendations/advice/specific > > warnings, especially since the influence is frequently used to diagnosing > > this specific issue. > > > > Alternatively, one could afford an optional argument type="manual" that > > computes the held-out variance manually rather than the approximate > > fashion, and add a comment to use this in the help menu when you have high > > leverage points (this is what I ended up doing to obtain the true influence > > and the externally studentized residual). > > > > I just think some more specificity could be of use for future users, to > > make the R:stats community even better :) Does that make sense? > > > > Sincerely, > > Eric > > > > On Tue, Apr 2, 2019 at 7:53 PM Fox, John <j...@mcmaster.ca> wrote: > > > >> Dear Eric, > >> > >> Have you looked at your data? -- for example: > >> > >> plot(log(Moons) ~ Volume, data = moon_data) > >> text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1, > >> subset = Volume > 400) > >> > >> The negative-binomial model doesn't look reasonable, does it? > >> > >> After you eliminate Jupiter there's one very high leverage point left, > >> Saturn. Computing studentized residuals entails an approximation to > >> deleting that as well from the model, so try fitting > >> > >> fit3 <- update(fit, subset = !(Name %in% c("Jupiter ", "Saturn "))) > >> summary(fit3) > >> > >> which runs into numeric difficulties. > >> > >> Then look at: > >> > >> plot(log(Moons) ~ Volume, data = moon_data, subset = Volume < 400) > >> > >> Finally, try > >> > >> plot(log(Moons) ~ log(Volume), data = moon_data) > >> fit4 <- update(fit2, . ~ log(Volume)) > >> rstudent(fit4) > >> > >> I hope this helps, > >> John > >> > >> ----------------------------------------------------------------- > >> John Fox > >> Professor Emeritus > >> McMaster University > >> Hamilton, Ontario, Canada > >> Web: https://socialsciences.mcmaster.ca/jfox/ > >> > >> > >> > >> > >>> -----Original Message----- > >>> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Eric > >>> Bridgeford > >>> Sent: Tuesday, April 2, 2019 5:01 PM > >>> To: Bert Gunter <bgunter.4...@gmail.com> > >>> Cc: R-help <r-help@r-project.org> > >>> Subject: Re: [R] Fwd: Potential Issue with lm.influence > >>> > >>> I agree the influence documentation suggests NaNs may result; however, as > >>> these can be manually computed and are, indeed, finite/existing (ie, > >>> computing the held-out influence by manually training n models for n > >> points > >>> to obtain n leave one out influence measures), I don't possibly see how > >> the > >>> function SHOULD return NaN, and given that it is returning NaN, that > >>> suggests to me that there should be either a) Providing an alternative > >>> method to compute them that (may be slower) that returns the correct > >>> results in the even that lm.influence does not return a good > >> approximation > >>> (ie, a command line argument for type="approx" that does the > >>> approximation strategy employed currently, or an alternative > >> type="direct" > >>> or something like that that computes them manually), or b) a heuristic to > >>> suggest why NaNs might result from one's particular inputs/what can be > >>> done to fix it (if the approximation strategy is the source of the > >> problem) or > >>> what the issue is with the data that will cause NaNs. Hence I was > >> looking to > >>> start a discussion around the specific strategy employed to compute the > >>> elements. > >>> > >>> Below is the code: > >>> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L, > >> 11L, > >>> 12L, 9L, 10L, 4L, 6L, > >> 3L), .Label = c("Ceres ", "Earth", > >>> "Eris ", > >>> > >>> "Haumea ", "Jupiter ", "Makemake ", "Mars ", "Mercury ", > >> "Neptune ", > >>> > >>> "Pluto ", "Saturn ", "Uranus ", "Venus "), class = "factor"), > >>> Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2, > >> 9.54, 19.22, > >>> 30.06, 39.5, 43.35, 45.8, > >> 67.7), Diameter = c(0.382, 0.949, > >>> > >>> 1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15, > >>> > >>> 0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8, > >>> > >>> 95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04, > >> 0.0025), Moons = c(0L, > >>> > >>> > >>> 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), > >> Volume = > >>> c(0.0291869497930152, > >>> > >>> > >>> > >>> 0.447504348276571, 0.523598775598299, 0.0788376225681443, > >>> > >>> > >>> > >>> 0.000268082573106329, 737.393372232996, 441.729261571372, > >>> > >>> > >>> > >>> 33.6865588825666, 30.6549628355953, 0.00305362805928928, > >>> > >>> > >>> > >>> 0.00176714586764426, 0.00090477868423386, 0.00359136400182873 > >>> > >>> > >>> )), row.names = c(NA, -13L), class = "data.frame") > >>> > >>> fit <- glm.nb(Moons ~ Volume, data = moon_data) > >>> rstudent(fit) > >>> > >>> fit2 <- update(fit, subset = Name != "Jupiter ") > >>> rstudent(fit2) > >>> > >>> influence(fit2)$sigma > >>> > >>> # 1 2 3 4 5 7 8 9 > >>> 10 11 12 13 > >>> # 1.077945 1.077813 1.165025 1.181685 1.077954 NaN 1.044454 1.152110 > >>> 1.187586 1.181696 1.077954 1.165147 > >>> > >>> Sincerely, > >>> Eric > >>> > >>> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4...@gmail.com> > >>> wrote: > >>> > >>>> Also, I suggest you read ?influence which may explain the source of > >>>> your NaN's . > >>>> > >>>> Bert Gunter > >>>> > >>>> "The trouble with having an open mind is that people keep coming along > >>>> and sticking things into it." > >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>> > >>>> > >>>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4...@gmail.com> > >>> wrote: > >>>> > >>>>> I told you already: **Include code inline ** > >>>>> > >>>>> See ?dput for how to include a text version of objects, such as data > >>>>> frames, inline. > >>>>> > >>>>> Otherwise, I believe .txt text files are not stripped if you insist > >>>>> on > >>>>> *attaching* data or code. Others may have better advice. > >>>>> > >>>>> > >>>>> Bert Gunter > >>>>> > >>>>> "The trouble with having an open mind is that people keep coming > >>>>> along and sticking things into it." > >>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>>> > >>>>> > >>>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericw...@gmail.com> > >>>>> wrote: > >>>>> > >>>>>> How can I add attachments? The following two files were attached in > >>>>>> the initial message > >>>>>> > >>>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4...@gmail.com> > >>>>>> wrote: > >>>>>> > >>>>>>> Nothing was attached. The r-help server strips most attachments. > >>>>>>> Include your code inline. > >>>>>>> > >>>>>>> Also note that > >>>>>>> > >>>>>>>> 0/0 > >>>>>>> [1] NaN > >>>>>>> > >>>>>>> so maybe something like that occurs in the course of your > >> calculations. > >>>>>>> But that's just a guess, so feel free to disregard. > >>>>>>> > >>>>>>> > >>>>>>> Bert Gunter > >>>>>>> > >>>>>>> "The trouble with having an open mind is that people keep coming > >>>>>>> along and sticking things into it." > >>>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford > >>>>>>> <ericw...@gmail.com> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi R core team, > >>>>>>>> > >>>>>>>> I experienced the following issue with the attached data/code > >>>>>>>> snippet, where the studentized residual for a single observation > >>>>>>>> appears to be NaN given finite predictors/responses, which appears > >>>>>>>> to be driven by the glm.influence method in the stats package. I > >>>>>>>> am curious to whether this is a consequence of the specific > >>>>>>>> implementation used for computing the influence, which it would > >>>>>>>> appear is the driving force for the NaN influence for the point, > >>>>>>>> that I was ultimately able to trace back through the lm.influence > >>>>>>>> method to this specific line < > >>>>>>>> https://github.com/SurajGupta/r- > >>> source/blob/a28e609e72ed7c47f6ddfb > >>>>>>>> b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67 > >>>>>>>>> > >>>>>>>> which > >>>>>>>> calls C code which calls iminfl.f > >>>>>>>> < > >>>>>>>> https://github.com/SurajGupta/r-source/blob/master/src/library/sta > >>>>>>>> ts/src/lminfl.f > >>>>>>>>> > >>>>>>>> (I > >>>>>>>> don't know fortran so I can't debug further). My understanding is > >>>>>>>> that the specific issue would have to do with the leave-one-out > >>>>>>>> variance estimate associated with this particular point, which it > >>>>>>>> seems based on my understanding should be finite given finite > >>>>>>>> predictors/responses. Let me know. Thanks! > >>>>>>>> > >>>>>>>> Sincerely, > >>>>>>>> > >>>>>>>> -- > >>>>>>>> Eric Bridgeford > >>>>>>>> ericwb.me > >>>>>>>> ______________________________________________ > >>>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help > >>>>>>>> PLEASE do read the posting guide > >>>>>>>> http://www.R-project.org/posting-guide.html > >>>>>>>> and provide commented, minimal, self-contained, reproducible code. > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> -- > >>>>>> Eric Bridgeford > >>>>>> ericwb.me > >>>>>> > >>>>> > >>> > >>> -- > >>> Eric Bridgeford > >>> ericwb.me > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide http://www.R-project.org/posting- > >>> guide.html > >>> and provide commented, minimal, self-contained, reproducible code. > >> > > > > > > -- > > Eric Bridgeford > > ericwb.me > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.