Yes, also notice that > predict(fit3, new=moon_data, type="resp") 1 2 3 4 5 6 1.060694e+00 1.102008e+00 1.109695e+00 1.065515e+00 1.057896e+00 1.892312e+29 7 8 9 10 11 12 3.531271e+17 2.295015e+01 1.739889e+01 1.058165e+00 1.058041e+00 1.057957e+00 13 1.058217e+00
so the model of fit3 predicts that Jupiter and Saturn should have several bazillions of moons each! -pd > On 3 Apr 2019, at 01:53 , Fox, John <j...@mcmaster.ca> wrote: > > Dear Eric, > > Have you looked at your data? -- for example: > > plot(log(Moons) ~ Volume, data = moon_data) > text(log(Moons) ~ Volume, data = moon_data, labels=Name, adj=1, subset > = Volume > 400) > > The negative-binomial model doesn't look reasonable, does it? > > After you eliminate Jupiter there's one very high leverage point left, > Saturn. Computing studentized residuals entails an approximation to deleting > that as well from the model, so try fitting > > fit3 <- update(fit, subset = !(Name %in% c("Jupiter ", "Saturn "))) > summary(fit3) > > which runs into numeric difficulties. > > Then look at: > > plot(log(Moons) ~ Volume, data = moon_data, subset = Volume < 400) > > Finally, try > > plot(log(Moons) ~ log(Volume), data = moon_data) > fit4 <- update(fit2, . ~ log(Volume)) > rstudent(fit4) > > I hope this helps, > John > > ----------------------------------------------------------------- > John Fox > Professor Emeritus > McMaster University > Hamilton, Ontario, Canada > Web: https://socialsciences.mcmaster.ca/jfox/ > > > > >> -----Original Message----- >> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Eric >> Bridgeford >> Sent: Tuesday, April 2, 2019 5:01 PM >> To: Bert Gunter <bgunter.4...@gmail.com> >> Cc: R-help <r-help@r-project.org> >> Subject: Re: [R] Fwd: Potential Issue with lm.influence >> >> I agree the influence documentation suggests NaNs may result; however, as >> these can be manually computed and are, indeed, finite/existing (ie, >> computing the held-out influence by manually training n models for n points >> to obtain n leave one out influence measures), I don't possibly see how the >> function SHOULD return NaN, and given that it is returning NaN, that >> suggests to me that there should be either a) Providing an alternative >> method to compute them that (may be slower) that returns the correct >> results in the even that lm.influence does not return a good approximation >> (ie, a command line argument for type="approx" that does the >> approximation strategy employed currently, or an alternative type="direct" >> or something like that that computes them manually), or b) a heuristic to >> suggest why NaNs might result from one's particular inputs/what can be >> done to fix it (if the approximation strategy is the source of the problem) >> or >> what the issue is with the data that will cause NaNs. Hence I was looking to >> start a discussion around the specific strategy employed to compute the >> elements. >> >> Below is the code: >> moon_data <- structure(list(Name = structure(c(8L, 13L, 2L, 7L, 1L, 5L, 11L, >> 12L, 9L, 10L, 4L, 6L, 3L), >> .Label = c("Ceres ", "Earth", >> "Eris ", >> >> "Haumea ", "Jupiter ", "Makemake ", "Mars ", "Mercury ", "Neptune ", >> >> "Pluto ", "Saturn ", "Uranus ", "Venus "), class = "factor"), >> Distance = c(0.39, 0.72, 1, 1.52, 2.75, 5.2, >> 9.54, 19.22, >> 30.06, 39.5, 43.35, 45.8, 67.7), >> Diameter = c(0.382, 0.949, >> >> 1, 0.532, 0.08, 11.209, 9.449, 4.007, 3.883, 0.18, 0.15, >> >> 0.12, 0.19), Mass = c(0.06, 0.82, 1, 0.11, 2e-04, 317.8, >> >> 95.2, 14.6, 17.2, 0.0022, 7e-04, 7e-04, >> 0.0025), Moons = c(0L, >> >> >> 0L, 1L, 2L, 0L, 64L, 62L, 27L, 13L, 4L, 2L, 0L, 1L), Volume = >> c(0.0291869497930152, >> >> >> >> 0.447504348276571, 0.523598775598299, 0.0788376225681443, >> >> >> >> 0.000268082573106329, 737.393372232996, 441.729261571372, >> >> >> >> 33.6865588825666, 30.6549628355953, 0.00305362805928928, >> >> >> >> 0.00176714586764426, 0.00090477868423386, 0.00359136400182873 >> >> >> )), row.names = c(NA, -13L), class = "data.frame") >> >> fit <- glm.nb(Moons ~ Volume, data = moon_data) >> rstudent(fit) >> >> fit2 <- update(fit, subset = Name != "Jupiter ") >> rstudent(fit2) >> >> influence(fit2)$sigma >> >> # 1 2 3 4 5 7 8 9 >> 10 11 12 13 >> # 1.077945 1.077813 1.165025 1.181685 1.077954 NaN 1.044454 1.152110 >> 1.187586 1.181696 1.077954 1.165147 >> >> Sincerely, >> Eric >> >> On Tue, Apr 2, 2019 at 4:38 PM Bert Gunter <bgunter.4...@gmail.com> >> wrote: >> >>> Also, I suggest you read ?influence which may explain the source of >>> your NaN's . >>> >>> Bert Gunter >>> >>> "The trouble with having an open mind is that people keep coming along >>> and sticking things into it." >>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>> >>> >>> On Tue, Apr 2, 2019 at 1:29 PM Bert Gunter <bgunter.4...@gmail.com> >> wrote: >>> >>>> I told you already: **Include code inline ** >>>> >>>> See ?dput for how to include a text version of objects, such as data >>>> frames, inline. >>>> >>>> Otherwise, I believe .txt text files are not stripped if you insist >>>> on >>>> *attaching* data or code. Others may have better advice. >>>> >>>> >>>> Bert Gunter >>>> >>>> "The trouble with having an open mind is that people keep coming >>>> along and sticking things into it." >>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>> >>>> >>>> On Tue, Apr 2, 2019 at 1:21 PM Eric Bridgeford <ericw...@gmail.com> >>>> wrote: >>>> >>>>> How can I add attachments? The following two files were attached in >>>>> the initial message >>>>> >>>>> On Tue, Apr 2, 2019 at 3:34 PM Bert Gunter <bgunter.4...@gmail.com> >>>>> wrote: >>>>> >>>>>> Nothing was attached. The r-help server strips most attachments. >>>>>> Include your code inline. >>>>>> >>>>>> Also note that >>>>>> >>>>>>> 0/0 >>>>>> [1] NaN >>>>>> >>>>>> so maybe something like that occurs in the course of your calculations. >>>>>> But that's just a guess, so feel free to disregard. >>>>>> >>>>>> >>>>>> Bert Gunter >>>>>> >>>>>> "The trouble with having an open mind is that people keep coming >>>>>> along and sticking things into it." >>>>>> -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) >>>>>> >>>>>> >>>>>> On Tue, Apr 2, 2019 at 11:32 AM Eric Bridgeford >>>>>> <ericw...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Hi R core team, >>>>>>> >>>>>>> I experienced the following issue with the attached data/code >>>>>>> snippet, where the studentized residual for a single observation >>>>>>> appears to be NaN given finite predictors/responses, which appears >>>>>>> to be driven by the glm.influence method in the stats package. I >>>>>>> am curious to whether this is a consequence of the specific >>>>>>> implementation used for computing the influence, which it would >>>>>>> appear is the driving force for the NaN influence for the point, >>>>>>> that I was ultimately able to trace back through the lm.influence >>>>>>> method to this specific line < >>>>>>> https://github.com/SurajGupta/r- >> source/blob/a28e609e72ed7c47f6ddfb >>>>>>> b86c85279a0750f0b7/src/library/stats/R/lm.influence.R#L67 >>>>>>>> >>>>>>> which >>>>>>> calls C code which calls iminfl.f >>>>>>> < >>>>>>> https://github.com/SurajGupta/r-source/blob/master/src/library/sta >>>>>>> ts/src/lminfl.f >>>>>>>> >>>>>>> (I >>>>>>> don't know fortran so I can't debug further). My understanding is >>>>>>> that the specific issue would have to do with the leave-one-out >>>>>>> variance estimate associated with this particular point, which it >>>>>>> seems based on my understanding should be finite given finite >>>>>>> predictors/responses. Let me know. Thanks! >>>>>>> >>>>>>> Sincerely, >>>>>>> >>>>>>> -- >>>>>>> Eric Bridgeford >>>>>>> ericwb.me >>>>>>> ______________________________________________ >>>>>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>>> PLEASE do read the posting guide >>>>>>> http://www.R-project.org/posting-guide.html >>>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Eric Bridgeford >>>>> ericwb.me >>>>> >>>> >> >> -- >> Eric Bridgeford >> ericwb.me >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Office: A 4.23 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.