Re: [Rd] Help useRs to use R's own Time/Date objects more efficiently

2020-04-05 Thread Mark Leeds
Hi All: I've been following this thread and just want to add one pointer.

For those who aren't interested in using new packages that try to make
dates-times easier but also find the
base R tools confusing, below is link to an extremely well written document
from over 15 years ago. It's probably
already known by quite a few people but, for people with less than 10 years
of R experience it could very well be unknown.
It's the clearest explanation of base R tools  for dates and times ( note:
one needs to consider chron a
base package but who's counting )  that I know of.

https://www.researchgate.net/publication/229087103_R_Help_Desk_Date_and_time_classes_in_R






On Sun, Apr 5, 2020 at 4:58 PM Abby Spurdle  wrote:

> I think POSIXct and POSIXlt are badly-chosen names.
> The name "POSIX" implies UNIX.
> (i.e. XYZix operating system is mostly POSIX compliant... Woo-Hoo!).
> My assumption is that most people modelling industrial/econometric
> data etc, or data imported from databases, don't want system
> references everywhere.
>
> Historically, I've use the principle that:
> If programming language A uses functionality from programming language
> B, then bindings should be as close as possible to whatever is in
> programming language B. Any additional functionality in programming
> language A, should be distinct from the bindings.
> R hasn't done this here, where POSIX-bindings have added in additional
> R functionality and semantics.
> Possibly introducing problems at an early stage.
>
> The help file entitled DateTimeClasses, only covers a small subset of
> information on date and time classes, with no obvious information
> about how to construct date and time objects, except for what's in the
> examples. The Date class has a similar problem, omitting information
> about how to construct Date objects.
>
> The "convenience extraction functions" aren't necessarily convenient
> because they return text rather than integers, requiring many users to
> still use the POSIXlt class.
>
> I don't think your example is simple.
> And I suspect it may discourage some people from using base packages.
> Having opposite effect to what's intended.
>
> It's probably too late to change the functions, but here's what I would
> suggest:
>
> (1) Create a top-level help page with a title like "Date and Time
> Classes" to give a brief but general overview. This would mean the
> existing DateTimeClasses would need a new title.
> (2) Create a another function the same as as.POSIXlt, but with a more
> user-friendly name, which would increase readability.
> (3) If help files for describing classes are separate from the help
> files for creating/coercing objects (e.g. Date vs as.Date), then I
> think they should cross reference each other in the description field,
> not just the details or seealso fields.
> (4) Reference relevant extraction/formatting functions, in most
> date/time help files, even if there's some (small) duplication in the
> help files.
> (5) Focus on keeping the examples simple rather than comprehensive.
>
> Expanding on suggestion (4), if you read the help file for as.Date
> (which seems like an obvious starting point, because that's where I
> started reading...), there's no reference at all to getting the month,
> or the day of the week, etc. To make it worse it doesn't mention
> coercion to POSIXlt objects either (but does mention coercion from
> POSIXlt to Date objects). This could give the wrong impression to many
> readers...
>
> In it's defense, it does link to Date, which links to weekdays, which
> links to as.POSIXlt.
>
> Of course the note and seealso fields are near the bottom, and there's
> an implicit (possibly false) assumption that the reader will read all
> the help file*s*, and follow the links at the bottom, at least three
> times over.
> And a new-ish R user is likely to have to read more than four help files.
> Unless they Google it, read stack exchange, or read some fancy
> (apparently modern) textbook on data science...
>
> Reinforcing the need for the help files to be clear about what the
> functions (collectively) can do and specifically what
> extraction/formatting functionality is available...
>
> My guess is the that most common tasks with date and time objects are:
> (1) Reading a character vector representing dates/times.
> (2) Formatting a date/time (i.e. Object to character vector, or
> character vector to another character vector).
> (3) Extracting information such as month, weekday, etc, either as an
> integer or as text.
>
> So, I in short, these should be easy (to do, and find out how to do)...
>
>
> On Sat, Apr 4, 2020 at 10:51 PM Martin Maechler
>  wrote:
> >
> > This is mostly a RFC  [but *not* about the many extra packages,
> please..]:
> >
> > Noticing to my chagrin  how my students work in a project,
> > googling for R code and cut'n'pasting stuff together, accumulating
> > this and that package on the way  all just for simple daily time series
> > (though with partly missing

Re: [Rd] [External] Re: Help useRs to use R's own Time/Date objects more efficiently

2020-04-05 Thread Mark Leeds
I should have mentioned that it was in R-news. My mistake. Thanks Luke for
clarification.



On Sun, Apr 5, 2020 at 9:36 PM  wrote:

> On Mon, 6 Apr 2020, Abby Spurdle wrote:
>
> >> (1) Create a top-level help page with a title like "Date and Time
> >> Classes" to give a brief but general overview. This would mean the
> >> existing DateTimeClasses would need a new title.
> >
> > I wanted to modify my first suggestion.
> > Perhaps a better idea would be to reference an external document
> > giving an overview of the subject.
> > I couldn't find a discussion of POSIXct/POSIXlt objects in the R
> > manuals (unless I missed it somewhere), so perhaps "An Introduction to
> > R" could be updated to include this subject, and then the help files
> > could reference that?
> >
> > Mark Leeds has already mentioned one possible (unofficial) source.
> > And I suspect that there are others.
>
> Not entirely unofficial as it waw published in R News:
>
> @article{grothendieck2004r,
>title={R help desk: Date and time classes in R},
>author={Grothendieck, Gabor and Petzoldt, Thomas},
>journal={R News},
>volume={4},
>number={1},
>pages={29--32},
>year={2004}
> }
>
> Best,
>
> luke
>
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Trying to understand the search path and namespaces

2010-11-15 Thread Mark Leeds
Hi Duncan: Luke's article is in the June,  2003 edition of R-news

On Mon, Nov 15, 2010 at 8:43 PM, Duncan Murdoch wrote:

> Hadley Wickham wrote:
>
>> Hi all,
>>
>> I'm trying to understand how the search path and namespaces interact.
>> For example, take the devtools package which suggests the testthat
>> package.  Here's what the search path looks like after I load each of
>> those packages:
>>
>
> Luke Tierney wrote up a nice description of this a few years ago.  It's
> either on developer.r-project.org, or in an old issue of R News.
>
> Duncan Murdoch
>
>
>>  library(devtools)
>>> search()
>>>
>>  [1] ".GlobalEnv""package:devtools"  "package:stats"
>>  [4] "package:graphics"  "package:grDevices" "package:utils"
>>  [7] "package:datasets"  "package:methods"   "Autoloads"
>> [10] "package:base"
>>
>>> library(testthat)
>>> search()
>>>
>>  [1] ".GlobalEnv""package:testthat"  "package:devtools"
>>  [4] "package:stats" "package:graphics"  "package:grDevices"
>>  [7] "package:utils" "package:datasets"  "package:methods"
>> [10] "Autoloads" "package:base"
>>
>> My question is this: when I execute the test function in devtools
>> function it calls the the test_package function in the testthat
>> package - but that function is located higher up the search path - how
>> does R find it?
>>
>> (I ask this question because I'm trying to simulate package loading
>> from within R to simplify the development cycle, but something is
>> missing in my knowledge of namespaces, and so I have the devel
>> versions of my packages can't access packages that are loaded after
>> they are)
>>
>> Hadley
>>
>>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] optim(…, method=‘L-BFGS-B’) stops with an error message while violating the lower bound

2016-10-08 Thread Mark Leeds
Hi Spencer: See the link below about L-BFGS-B below because  I had problems
with it a good while back (and I think the link description is the cause
but I can't prove it )  so  eventually I moved to the  Rvmmin(b) package.
It's a package but really an algorithm. Rvmmin(b) uses a variable-metric
algorithm similar to that of L-BFGS-B but without the problem below. It's
not surprisingly a creation of John Nash and quite impressive based on my
experience. Just like L-BFGS, it can implement box constraints by adding
the b.

http://users.eecs.northwestern.edu/~morales/PSfiles/acm-remark.pdf







On Sat, Oct 8, 2016 at 2:50 PM, Spencer Graves 
wrote:

> Hello:
>
>
>   The development version of Ecdat on R-Forge contains a vignette in
> which optim(…, method=‘L-BFGS-B’) stops with an error message while
> violating the lower bound.
>
>
>   To see all the details, try the following:
>
>
> install.packages("Ecdat", repos="http://R-Forge.R-project.org";)
>
>
>   Then do "help(pac=Ecdat)" -> "User guides, package vignettes and
> other documentation" -> "Ecdat::AverageIncomeModels".
>
>
>   I've found other optimizers that will get around the problem in this
> case but none that performs as well as optim with many other problems.
>
>
>   Thanks,
>   Spencer Graves
>
>
> p.s.  I've also tested bobyqa{minqa} or nloptr{nloptr}, recommended in a
> vignette in the lme4 package.  These did better than optim in this example
> but worse in others I tried.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] optim(…, method=‘L-BFGS-B’) stops with an error message while violating the lower bound

2016-10-08 Thread Mark Leeds
Hi Spencer:

1)  I can't help much as far as your results but one thing you could do is
check what
the convergence flag of Rvmmin is. There are difference ones depending
on what happened during the optimization and they're pretty helpful IIRC.
But that may require running Rvmmin directly rather than in optimx.
I did a lot of optimizations ( hundreds ) at one point and I never saw it
return NA ??

2) I wasn't sure about 2) either so I figured it was safer to get away from
L-BFGS-B.








On Sat, Oct 8, 2016 at 7:03 PM, Spencer Graves 
wrote:

> Hi, Mark et al.:
>
>
>   Thanks, Mark.
>
>
>   Three comments:
>
>
> 1.  Rvmmin was one of the methods I tried after Ravi directed
> me to optimx.  It returned NAs for essentially everything.  See my email of
> this subject stamped 4:43 PM Central time = 21:43 UTC.
>
>
> 2.  It would be interesting to know if the current algorithm
> behind optim and optimx with method='L-BFGS-B' incorporates Morales and
> Nocedal (2011) 'Remark on “Algorithm 778: L-BFGS-B: Fortran Subroutines for
> Large-Scale Bound Constrained Optimization”'.  I created this vignette and
> started this threat hoping that someone on the R Core team might decide
> it's worth checking things like that.
>
>
> 3.  The vignette mentioned below was extracted from a larger
> vignette fitting several models that seem to encounter convergence
> problems.  I should probably switch to optimx using all the methods that
> offers for constrained optimization, including nminb.
>
>
>   Best Wishes,
>   Spencer Graves
>
>
>
> On 10/8/2016 5:00 PM, Mark Leeds wrote:
>
> Hi Spencer: See the link below about L-BFGS-B below because  I had problems
> with it a good while back (and I think the link description is the cause
> but I can't prove it )  so  eventually I moved to the  Rvmmin(b) package.
> It's a package but really an algorithm. Rvmmin(b) uses a variable-metric
> algorithm similar to that of L-BFGS-B but without the problem below. It's
> not surprisingly a creation of John Nash and quite impressive based on my
> experience. Just like L-BFGS, it can implement box constraints by adding
> the b.
>
> http://users.eecs.northwestern.edu/~morales/PSfiles/acm-remark.pdf
>
>
>
>
>
>
>
> On Sat, Oct 8, 2016 at 2:50 PM, Spencer Graves <
> spencer.gra...@prodsyse.com> wrote:
>
>> Hello:
>>
>>
>>   The development version of Ecdat on R-Forge contains a vignette in
>> which optim(…, method=‘L-BFGS-B’) stops with an error message while
>> violating the lower bound.
>>
>>
>>   To see all the details, try the following:
>>
>>
>> install.packages("Ecdat", repos="http://R-Forge.R-project.org";)
>>
>>
>>   Then do "help(pac=Ecdat)" -> "User guides, package vignettes and
>> other documentation" -> "Ecdat::AverageIncomeModels".
>>
>>
>>   I've found other optimizers that will get around the problem in
>> this case but none that performs as well as optim with many other problems.
>>
>>
>>   Thanks,
>>   Spencer Graves
>>
>>
>> p.s.  I've also tested bobyqa{minqa} or nloptr{nloptr}, recommended in a
>> vignette in the lme4 package.  These did better than optim in this example
>> but worse in others I tried.
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Wrongly converging glm()

2017-07-20 Thread Mark Leeds
Hi Harm-Jan. I've been following this thread to some degree and just want
to add that
 this issue is not specific to the GLM. It's a problem with optimization of
functions in general. I was using use Rvmmin with constraints which is an
extremely solid optimization package written by John Nash ( uses a modified
BFGS  algorithm) and it took me two years to realize that, although my
optimization generally converged, there was an idenitifiability issue with
my model that basically meant that the results meant nothing. I only
eventually found this out because, in the econometrics literature,  the
type of economic model I was estimating ( rational expectations ) is known
to have an identifiability issue. I guess if I was an economics expert, I
may have been able to know this but, in general, I think what you are
asking
optimization code to do is EXTREMELY DIFFICULT.

John Nash can say more because he's THE optimization masteR but it's much
more difficult to write optimization algorithms with convergence rules that
are able to identify when mathematical convergence ( norm near zero say )
is not necessarily model convergence. That I can tell you from experience
!!!








On Thu, Jul 20, 2017 at 2:32 PM, Harm-Jan Westra  wrote:

> My apologies if I seemed to ‘blame R’. This was in no way my intention. I
> get the feeling that you’re missing my point as well.
>
> I observed something that I thought was confusing, when comparing two more
> or less identical methods (when validating the C code), and wanted to make
> a suggestion as to how to help future R users. Note that I already
> acknowledged that my data was bad. Note that I also mention that the way R
> determines convergence is a valid approach.
>
> What strikes me as odd is that R would warn you when your data is faulty
> for a function such as cor(), but not for glm(). I don’t see why you
> wouldn’t want to check both convergence criteria if you know multiple of
> such criteria exist. It would make the software more user friendly in the
> end.
>
> It may be true that there are millions of edge cases causing issues with
> glm(), as you say, but here I am presenting an edge case that can be easily
> detected, by checking whether the difference in beta estimates between the
> current and previous iteration is bigger than a certain epsilon value.
>
> I agree ‘that everybody using R should first do the effort of learning
> what they're doing’, but it is a bit of a non-argument, because we all know
> that, the world just doesn’t work that way, plus this is one of the
> arguments that has held for example the Linux community back for quite a
> while (i.e. let’s not make the software more user friendly because the user
> should be more knowledgeable).
>
> Harm-Jan
>
>
> From: Joris Meys
> Sent: Thursday, July 20, 2017 13:16
> To: Harm-Jan Westra
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] Wrongly converging glm()
>
>
>
> On Thu, Jul 20, 2017 at 6:21 PM, Harm-Jan Westra <
> westra.harm...@outlook.com> wrote:
> Dear Joris,
>
>
> I agree that such a covariate should not be used in the analysis, and
> fully agree with your assessment. However, your response assumes that
> everybody who uses R knows what they're doing, which is a dangerous
> assumption to make. I bet there are a lot of people who blindly trust the
> output from R, even when there's clearly something wrong with the estimates.
>
> You missed my point then. I don't assume that everybody who uses R knows
> what they're doing. Actually, I know for a fact quite a few people using R
> have absolutely no clue about what they are doing. My point is that
> everybody using R should first do the effort of learning what they're
> doing. And if they don't, one shouldn't blame R. There's a million
> different cases where both algorithms would converge and the resulting
> estimates are totally meaningless regardless. R cannot be held responsible
> for that.
>
>
>
> In terms of your conclusion that the C++ estimate corresponds to a value
> within the R estimated confidence interval: if I allow the C++ code to run
> for 1000 iterations, it's estimate would be around -1000. It simply never
> converges.
>
> I didn't test that far, and you're right in the sense that -100 is indeed
> not the final estimate. After looking at the C code, it appears as if the
> author of that code combines a Newton-Raphson approach with a different
> convergence rule. And then it's quite understandible it doesn't converge.
> You can wildly vary that estimate, the effect it has on the jacobian, log
> likelihood or deviance will be insignificant. So the model won't improve,
> it would just move all over the parameter space.
>
>
>
> I think there's nothing wrong with letting the user know there might be
> something wrong with one of the estimates, especially if your code can
> easily figure it out for you, by addi

Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Mark Leeds
Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
the cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out
that one of the parameters ( there were two parameters ) could have really
been anything and the same likelihood value would  be returned. So, what I
learned the hard way was  termination due to reasonable stopping  criteria
DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
a long time and only happened to notice it when playing around with the
likelihood by fixing the offending parameter to various values and
optimizing over the
non-offending parameter. Thanks for eloquent explanation.


  Mark




























On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan 
wrote:

> Please allow me to add my 3 cents.  Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky.  All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion.  A good software will tell you why it stopped - i.e. the
> stopping criterion that was triggered.  It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense.  In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood.  In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in log-likelihood."  Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: r-devel@r-project.org; Mark Leeds ;
> jorism...@gmail.com; westra.harm...@outlook.com
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58).  After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the log-likelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately.  I worked hard at tuning the
> threshold(s) for the message several years ago and finally gave up; I am
> guessing that the warning misses > 5% of the cases when the issue is true,
> and that 5% of the warnings that do print are incorrect.
> (And these estimates may be too optimistic.)   Highly correlated
> predictors tend to trip
> it up, e.g., the truncated power spline basis used by the rcs function in
> Hmisc.
>
> All in all, I am not completely sure whether the message does more harm
> than good.  I'd be quite reluctant to go down the same path again with the
> glm function.
>
> Terry Therneau
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Mark Leeds
H Ravi: Yes, IIRC that's EXACTLY what was going on in my case. Based on
the codes from Rvmmin,  the objective functon wasn't changing much and I
think norm of the gradient was close to zero so it was difficult for me to
detect the issue. I found it when I
happenered to notice that the objective function seemed independent of one
of the parameters.






On Fri, Jul 21, 2017 at 4:36 PM, Ravi Varadhan 
wrote:

> “So, what I learned the hard way was  termination due to reasonable
> stopping  criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”
>
>
>
> Yes, I agree, Mark.
>
>
>
> Let me add another observation.  In the “optimx” package, John Nash and I
> implemented a check for optimality conditions – first and second order KKT
> conditions.  This involves checking whether the gradient is sufficiently
> small and the Hessian is positive definite (for local minimum) at the final
> parameter values.  However, it can be quite time consuming to compute these
> quantities and in some problems checking KKT can take up more effort than
> finding the solution!  Furthermore, it is difficult to come up with good
> thresholds for determining “small” gradient and “positive definite”
> Hessian, since these can depend upon the scale of the objective function
> and the parameters.
>
>
>
> Ravi
>
>
>
> *From:* Mark Leeds [mailto:marklee...@gmail.com]
> *Sent:* Friday, July 21, 2017 3:09 PM
> *To:* Ravi Varadhan 
> *Cc:* Therneau, Terry M., Ph.D. ; r-devel@r-project.org;
> jorism...@gmail.com; westra.harm...@outlook.com
>
> *Subject:* Re: [Rd] Wrongly converging glm()
>
>
>
> Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
> the cause
>
> of the termination. The codes returned were fine. The problem was that
>
> the model I was using could have multiple solutions ( regardless of the
> data
> sent in ) so, even though the stopping criteria was reached, it turned out
> that one of the parameters ( there were two parameters ) could have really
> been anything and the same likelihood value would  be returned. So, what I
> learned the hard way was  termination due to reasonable stopping  criteria
> DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
> a long time and only happened to notice it when playing around with the
> likelihood by fixing the offending parameter to various values and
> optimizing over the
> non-offending parameter. Thanks for eloquent explanation.
>
>
> Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan 
> wrote:
>
> Please allow me to add my 3 cents.  Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky.  All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion.  A good software will tell you why it stopped - i.e. the
> stopping criterion that was triggered.  It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense.  In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood.  In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in log-likelihood."  Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: r-devel@r-project.org; Mark Leeds ;
> jorism...@gmail.com; westra.harm...@outlook.com
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58).  After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the log-likelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately.  I worked hard at tuning the
> threshold(s) for the message several years

Re: [Rd] The constant part of the log-likelihood in StructTS

2012-05-02 Thread Mark Leeds
Hi Ravi: As far as I know ( well , really read ) and Bert et al can say
more , the AIC is not dependent on the models being nested as long as the
sample sizes used are the same when comparing. In some cases, say comparing
MA(2), AR(1), you have to be careful with sample size usage but there is no
nesting requirement for AIC atleast, I'm pretty sure.

So, Jouni's worry I think should be the different likelihoods. Jouni:
There are ways of re-writing ARIMA as STRUCta type models  which might be
easier than trying to consistentitize the likelihoods across different
packages/base. StructTS is really a specific DLM as far as I understand it
so you may be better off going to the DLM package. The DLM likelihoods
still will not necessarily be consistent with arima likelihoods..But there
are ways of transforming arimas so that they can be written as DLM's so
that you can DLM  for those also. My point is that,  if you're comparing
likelihoods of different models, if possible, it's best to use ONE
package/function so that you don't use different likelihoods by accident.


Mark

Also, not sure why this is on R-dev ?








 Mark















On Wed, May 2, 2012 at 11:19 AM, Ravi Varadhan  wrote:

> Comparing such disparate, non-nested models can be quite problematic.  I
> am not sure what AIC/BIC comparisons mean in such cases.  The issue of
> different constants should be the least of your worries.
>
> Ravi
>
> -Original Message-
> From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org]
> On Behalf Of Jouni Helske
> Sent: Tuesday, May 01, 2012 2:17 PM
> To: r-devel@r-project.org
> Subject: Re: [Rd] The constant part of the log-likelihood in StructTS
>
> Ok, it seems that R's AIC and BIC functions warn about different
> constants, so that's probably enough. The constants are not irrelevant
> though, if you compute the log-likelihood of one model using StructTS, and
> then fit alternative model using other functions such as arima(), which do
> take account the constant term, and use those loglikelihoods for computing
> for example BIC, you get wrong results when checking which model gives
> lower BIC value. Hadn't though about it before, have to be more careful in
> future when checking results from different packages etc.
>
> Jouni
>
>
> On Tue, May 1, 2012 at 4:51 PM, Ravi Varadhan  wrote:
>
> > This is not a problem at all.  The log likelihood function is a
> > function of the model parameters and the data, but it is defined up to
> > an additive arbitrary constant, i.e. L(\theta) and L(\theta) + k are
> > completely equivalent, for any k. This does not affect model
> > comparisons or hypothesis tests.
> >
> > Ravi
> > 
> > From: r-devel-boun...@r-project.org [r-devel-boun...@r-project.org] on
> > behalf of Jouni Helske [jounihel...@gmail.com]
> > Sent: Monday, April 30, 2012 7:37 AM
> > To: r-devel@r-project.org
> > Subject: [Rd] The constant part of the log-likelihood in StructTS
> >
> > Dear all,
> >
> > I'd like to discuss about a possible bug in function StructTS of stats
> > package. It seems that the function returns wrong value of the
> > log-likelihood, as the added constant to the relevant part of the
> > log-likelihood is misspecified. Here is an simple example:
> >
> > > data(Nile)
> > > fit <- StructTS(Nile, type = "level") fit$loglik
> > [1] -367.5194
> >
> > When computing the log-likelihood with other packages such as KFAS and
> > FKF, the loglikelihood value is around -645.
> >
> > For the local level model, the likelihood is defined by
> > -0.5*n*log(2*pi) -
> > 0.5*sum(log(F_t) + v_t^2/sqrt(F_t)) (see for example  Durbin and
> > Koopman (2001, page 30). But in StructTS, the likelihood is computed
> like this:
> >
> > loglik <- -length(y) * res$value + length(y) * log(2 * pi),
> >
> > where the first part coincides with the last part of the definition,
> > but the constant part has wrong sign and it is not multiplied by 0.5.
> > Also in case of missing observations, I think there should be
> > sum(!is.na(y)) instead of length(y) in the constant term, as the
> > likelihood is only computed for those y which are observed.
> >
> > This does not affect in estimation of model parameters, but it could
> > have effects in model comparison or some other cases.
> >
> > Is there some reason for this kind of constant, or is it just a bug?
> >
> > Best regards,
> >
> > Jouni Helske
> > PhD student in Statistics
> > University of Jyväskylä
> > Finland
> >
> > [[alternative HTML version deleted]]
>
>[[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Recommended way to call/import functions from a Suggested package

2013-02-22 Thread Mark Leeds
Hi David: According to the description on cran, lattice imports grid.

I don't know if you've seen it but www.obeautifulcode.com has a very nice
topic in its archives called "How R Searches and Finds Stuff" which is
relatde to your question that I found it to be really helpful. A lot of
people  on this list ( you know who they are as well as I do ) know it all
in their heads but, for me, sitting down and reading it was truly
enlightening. I think it should be required reading for anyone starting out
with R.












On Fri, Feb 22, 2013 at 9:50 PM, David Winsemius wrote:

>
> On Feb 22, 2013, at 6:39 PM, David Winsemius wrote:
>
> >
> > On Feb 22, 2013, at 6:13 PM, Hadley Wickham wrote:
> >
> >> Hi Davor,
> >>
> >> To the best of my knowledge, there's only one way to use functions
> >> from a suggested package: with require:
> >>
> >> if (require("suggested_package")) {
> >> function_from_suggested_package()
> >> } else {
> >> stop("suggested package not installed")
> >> }
> >>
> >> Unfortunately I don't think there's any way to use a suggested package
> >> without polluting the search path.
> >
> > I've always wondered: How does lattice manage to use grid functions
> without putting them on the search path?
>
> Maybe I am using the wrong terminology, so here is the behavior I'm
> referring to:
>
> > sessionInfo()
> snipped version an locale ino
>
> attached base packages:
> [1] grDevices datasets  splines   graphics  utils stats methods
> base
>
> other attached packages:
> [1] rms_3.6-2   Hmisc_3.10-1survival_2.37-2 sos_1.3-5
> brew_1.0-6
> [6] lattice_0.20-10
>
> loaded via a namespace (and not attached):
> [1] cluster_1.14.3 grid_2.15.2
>
> Notice that lattice is loaded from my profile
>
> > require(ggplot2)
> Loading required package: ggplot2
> > sessionInfo()
>  snipped version an locale ino
> attached base packages:
> [1] grDevices datasets  splines   graphics  utils stats methods
> base
>
> other attached packages:
> [1] ggplot2_0.9.3   rms_3.6-2   Hmisc_3.10-1survival_2.37-2
> sos_1.3-5
> [6] brew_1.0-6  lattice_0.20-10
>
> loaded via a namespace (and not attached):
>  [1] cluster_1.14.3 colorspace_1.2-1   dichromat_2.0-0digest_0.6.0
>  [5] grid_2.15.2gtable_0.1.2   labeling_0.1   MASS_7.3-22
>  [9] munsell_0.4plyr_1.8   proto_0.3-10
> RColorBrewer_1.0-5
> [13] reshape2_1.2.2 scales_0.2.3   stringr_0.6.2
> > ?grid.text
> No documentation for ‘grid.text’ in specified packages and libraries:
> you could try ‘??grid.text’
>
> So at least the help system cannot find a grid function.
>
>
> > ?grid::grid.text
> starting httpd help server ... done
> > grid.text
> Error: object 'grid.text' not found
>
> Neither can the R interpreter find it. But it's clearly available if you
> ask nicely:
>
> > grid::grid.text
> function (label, x = unit(0.5, "npc"), y = unit(0.5, "npc"),
> just = "centre", hjust = NULL, vjust = NULL, rot = 0, check.overlap =
> FALSE,
> default.units = "npc", name = NULL, gp = gpar(), draw = TRUE,
> vp = NULL)
> {
> tg <- textGrob(label = label, x = x, y = y, just = just,
> hjust = hjust, vjust = vjust, rot = rot, check.overlap =
> check.overlap,
> default.units = default.units, name = name, gp = gp,
> vp = vp)
> if (draw)
> grid.draw(tg)
> invisible(tg)
> }
> 
> 
>
> --
> David/
>
> >
> > --
> > David
> >>
> >> Hadley
> >>
> >> On Fri, Feb 22, 2013 at 6:26 PM, Davor Cubranic 
> wrote:
> >>> If in my package "Foo" I call a function from another package "Bar" if
> it's available, according to R-exts, this sounds like I should include
> "Suggests: Bar" in package Foo's description. But the manual is silent on
> how to treat Bar's namespace. Should I import it? If so, should this be
> conditional or unconditional? There is a thread from 2008 in which Duncan
> Murdoch suggests trying conditionally importing a package if it's
> installed, with the caveat "If this is allowed" (
> http://tolstoy.newcastle.edu.au/R/e5/devel/08/10/0488.html). This appears
> to work in current release of R, 2.15.2, but I'm still not clear if it's
> officially allowed, much less recommended.
> >>>
> >>> The manual also says:
> >>>
>  If a package only needs a few objects from another package it can use
> a fully qualified variable reference in the code instead of a formal
> import. A fully qualified reference to the function f in package foo is of
> the form foo::f. This is slightly less efficient than a formal import and
> also loses the advantage of recording all dependencies in the NAMESPACE
> file, so this approach is usually not recommended. Evaluating foo::f will
> cause package foo to be loaded, but not attached, if it was not loaded
> already—this can be an advantage in delaying the loading of a rarely used
> package.
> 
> >>>
> >>>
> >>> Would this be a better solution than importing when calling into a
> suggested package?
> >>>
> >>> Davor
> >>>
> >>>
>

Re: [Rd] Recommended way to call/import functions from a Suggested package

2013-02-22 Thread Mark Leeds
david: I was slightly misleading about the usefulness of that link for your
question because it
doesn't explicitly talk about the case where base packages import packages.
But it talks about
the general case of importing so hopefully base handles importing just like
a package  that would
be installed from cran.




On Fri, Feb 22, 2013 at 10:07 PM, Mark Leeds  wrote:

> Hi David: According to the description on cran, lattice imports grid.
>
> I don't know if you've seen it but www.obeautifulcode.com has a very nice
> topic in its archives called "How R Searches and Finds Stuff" which is
> relatde to your question that I found it to be really helpful. A lot of
> people  on this list ( you know who they are as well as I do ) know it all
> in their heads but, for me, sitting down and reading it was truly
> enlightening. I think it should be required reading for anyone starting out
> with R.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Feb 22, 2013 at 9:50 PM, David Winsemius 
> wrote:
>
>>
>> On Feb 22, 2013, at 6:39 PM, David Winsemius wrote:
>>
>> >
>> > On Feb 22, 2013, at 6:13 PM, Hadley Wickham wrote:
>> >
>> >> Hi Davor,
>> >>
>> >> To the best of my knowledge, there's only one way to use functions
>> >> from a suggested package: with require:
>> >>
>> >> if (require("suggested_package")) {
>> >> function_from_suggested_package()
>> >> } else {
>> >> stop("suggested package not installed")
>> >> }
>> >>
>> >> Unfortunately I don't think there's any way to use a suggested package
>> >> without polluting the search path.
>> >
>> > I've always wondered: How does lattice manage to use grid functions
>> without putting them on the search path?
>>
>> Maybe I am using the wrong terminology, so here is the behavior I'm
>> referring to:
>>
>> > sessionInfo()
>> snipped version an locale ino
>>
>> attached base packages:
>> [1] grDevices datasets  splines   graphics  utils stats methods
>> base
>>
>> other attached packages:
>> [1] rms_3.6-2   Hmisc_3.10-1survival_2.37-2 sos_1.3-5
>> brew_1.0-6
>> [6] lattice_0.20-10
>>
>> loaded via a namespace (and not attached):
>> [1] cluster_1.14.3 grid_2.15.2
>>
>> Notice that lattice is loaded from my profile
>>
>> > require(ggplot2)
>> Loading required package: ggplot2
>> > sessionInfo()
>>  snipped version an locale ino
>> attached base packages:
>> [1] grDevices datasets  splines   graphics  utils stats methods
>> base
>>
>> other attached packages:
>> [1] ggplot2_0.9.3   rms_3.6-2   Hmisc_3.10-1survival_2.37-2
>> sos_1.3-5
>> [6] brew_1.0-6  lattice_0.20-10
>>
>> loaded via a namespace (and not attached):
>>  [1] cluster_1.14.3 colorspace_1.2-1   dichromat_2.0-0digest_0.6.0
>>  [5] grid_2.15.2gtable_0.1.2   labeling_0.1   MASS_7.3-22
>>  [9] munsell_0.4plyr_1.8   proto_0.3-10
>> RColorBrewer_1.0-5
>> [13] reshape2_1.2.2 scales_0.2.3   stringr_0.6.2
>> > ?grid.text
>> No documentation for ‘grid.text’ in specified packages and libraries:
>> you could try ‘??grid.text’
>>
>> So at least the help system cannot find a grid function.
>>
>>
>> > ?grid::grid.text
>> starting httpd help server ... done
>> > grid.text
>> Error: object 'grid.text' not found
>>
>> Neither can the R interpreter find it. But it's clearly available if you
>> ask nicely:
>>
>> > grid::grid.text
>> function (label, x = unit(0.5, "npc"), y = unit(0.5, "npc"),
>> just = "centre", hjust = NULL, vjust = NULL, rot = 0, check.overlap =
>> FALSE,
>> default.units = "npc", name = NULL, gp = gpar(), draw = TRUE,
>> vp = NULL)
>> {
>> tg <- textGrob(label = label, x = x, y = y, just = just,
>> hjust = hjust, vjust = vjust, rot = rot, check.overlap =
>> check.overlap,
>> default.units = default.units, name = name, gp = gp,
>> vp = vp)
>> if (draw)
>> grid.draw(tg)
>> invisible(tg)
>> }
>> 
>> 
>>
>> --
>> David/
>>
>> >
>> > --
>> > David
>> >>
>> >> Hadley
>> >>
>> >> On Fri, Feb 22, 2013 at 6:26 PM, Davor Cub