Re: [Rd] [PATCH] Fix missing break

2017-07-21 Thread Martin Morgan

On 07/20/2017 05:02 PM, Steve Grubb wrote:

Hello,

There appears to be a break missing in the switch/case for the LISTSXP case.
If this is supposed to fall through, I'd suggest a comment so that others
know its by design.

Signed-off-by: Steve Grubb 


An example is

$ R --vanilla -e "pl = pairlist(1, 2); length(pl) = 1; pl"
> pl = pairlist(1, 2); length(pl) = 1; pl
Error in length(pl) = 1 :
  SET_VECTOR_ELT() can only be applied to a 'list', not a 'pairlist'
Execution halted

fixed in r72936 (R-devel) / 72937 (R-3-4-branch).

Martin Morgan



Index: src/main/builtin.c
===
--- src/main/builtin.c  (revision 72935)
+++ src/main/builtin.c  (working copy)
@@ -888,6 +888,7 @@
SETCAR(t, CAR(x));
SET_TAG(t, TAG(x));
}
+   break;
  case VECSXP:
for (i = 0; i < len; i++)
if (i < lenx) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix status in main

2017-07-21 Thread Martin Morgan

On 07/20/2017 05:31 PM, Steve Grubb wrote:

Hello,

This is a patch to fix what appears to be a simple typo. The warning says
"invalid status assuming 0", but then instead sets runLast to 0.

Signed-of-by: Steve Grubb 


fixed in 72938 / 39.

This seemed not to have consequence, since exit() reports NA & 0377 
(i.e., 0) and the incorrect assignment to runLast is immediately 
over-written by the correct value.


Martin Morgan



Index: src/main/main.c
===
--- src/main/main.c (revision 72935)
+++ src/main/main.c (working copy)
@@ -1341,7 +1341,7 @@
  status = asInteger(CADR(args));
  if (status == NA_INTEGER) {
warning(_("invalid 'status', 0 assumed"));
-   runLast = 0;
+   status = 0;
  }
  runLast = asLogical(CADDR(args));
  if (runLast == NA_LOGICAL) {

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix missing break

2017-07-21 Thread Martin Maechler
> Steve Grubb 
> on Thu, 20 Jul 2017 22:20:33 -0400 writes:

> On Thursday, July 20, 2017 7:41:00 PM EDT Duncan Murdoch wrote:
>> Thanks for posting this series of patches.  Unfortunately, there's a
>> good chance they'll get lost in all the traffic on R-devel.  If you
>> don't hear that they've been fixed in the next couple of weeks, could
>> you post them to bugs.r-project.org, and post future patches there as 
well?

> That was my first inclination. But there is no way to create an account 
unlike 
> most open source projects I work with. And I work with quite a lot.

It used to be easily possible, until someone urged students or
paid people to create such accounts and create crap bug reports.
For a while, we tried a few measures to deal with that but we
found no measure which was both fast and relatively foolproof.

--> See https://www.r-project.org/bugs.html and look for  "abuse".

I have now created an account for you.

>> In examples like the one below, if you have R code that shows symptoms,
>> it would really help in the bug report. 


> I am hoping that we can look at the code as seasoned programmers and say 
yeah, 
> that is a bug.

I agree in this case.
OTOH, it is exactly one of the case where the bug is not
triggerable currently:

  al <- formals(ls); length(al) <- 3

would trigger the bug... but you get an error message ".. vector .."
and as I now found that is from a slightly misguided check:
isVectorizable()  is not approriate here and should really be
replaced by isList().
  
So .. indeed, your report will have triggered an improvement in
the code, which I'm about to commit.

Thank you very much Steve!

> I run the code through Coverity and have quite a lot of 
> problems to tell you about.

I'm not the expert on static code analysis, but as a seasoned
statistician (*and* from experience with other such analyses) I
know that you always get false positives.

> I run these 5 out as tests to see how this 
> community works. I am new to this community but not necessarily R and 
just 
> want to contribute back to something I am using. But believe me, I have a 
> bunch more that seasoned programmers can eyeball and say yep - that's a 
bug.

Good, looking forward to see them.

>> Otherwise, someone else will have to analyze the code to decide whether 
it's
>> a bug or missing comment.  That takes time, and if there are no known
>> symptoms, it's likely to be assigned a low priority.  The sad truth is 
that
>> very few members of R Core are currently actively fixing bugs.

> That's a shame. I'd be happy to give the scan to people in core so they 
can 
> see what the lay of the land looks like.

 (hmm... the above does look a teeny tiny bit arrogant in my
  eyes; but then I'm not a native English (nor "American" :-)
  speaker ...)


> R works amazingly good. So much so I decided to dig
> deeper. I'd recommend to the core developers that they ask
> to get on Coverity's open source scan list.

> https://scan.coverity.com/

> It's free to open source projects like this. :-)

> -Steve


>> On 20/07/2017 5:02 PM, Steve Grubb wrote:
>> > Hello,
>> > 
>> > There appears to be a break missing in the switch/case for the LISTSXP
>> > case. If this is supposed to fall through, I'd suggest a comment so 
that
>> > others know its by design.
>> > 
>> > Signed-off-by: Steve Grubb 
>> > 
>> > Index: src/main/builtin.c
>> > ===
>> > --- src/main/builtin.c (revision 72935)
>> > +++ src/main/builtin.c (working copy)
>> > @@ -888,6 +888,7 @@
>> > 
>> >SETCAR(t, CAR(x));
>> >SET_TAG(t, TAG(x));
>> >
>> >}
>> > 
>> > +  break;
>> > 
>> >  case VECSXP:
>> >for (i = 0; i < len; i++)
>> >
>> >if (i < lenx) {
>> > 
>> > __
>> > R-devel@r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-devel

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix status in main

2017-07-21 Thread Martin Maechler

> Hello,
> This is a patch to fix what appears to be a simple typo. The warning says
> "invalid status assuming 0", but then instead sets runLast to 0.

> Signed-of-by: Steve Grubb 

> Index: src/main/main.c
> ===
> --- src/main/main.c   (revision 72935)
> +++ src/main/main.c   (working copy)
> @@ -1341,7 +1341,7 @@
>  status = asInteger(CADR(args));
>  if (status == NA_INTEGER) {
>   warning(_("invalid 'status', 0 assumed"));
> - runLast = 0;
> + status = 0;
>  }
>  runLast = asLogical(CADDR(args));
>  if (runLast == NA_LOGICAL) {

Yes, thank you!

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix missing break

2017-07-21 Thread Martin Maechler
> Martin Morgan 
> on Fri, 21 Jul 2017 03:43:48 -0400 writes:

> On 07/20/2017 05:02 PM, Steve Grubb wrote:
>> Hello,
>> 
>> There appears to be a break missing in the switch/case for the LISTSXP 
case.
>> If this is supposed to fall through, I'd suggest a comment so that others
>> know its by design.
>> 
>> Signed-off-by: Steve Grubb 

> An example is

> $ R --vanilla -e "pl = pairlist(1, 2); length(pl) = 1; pl"
>> pl = pairlist(1, 2); length(pl) = 1; pl
> Error in length(pl) = 1 :
> SET_VECTOR_ELT() can only be applied to a 'list', not a 'pairlist'
> Execution halted

> fixed in r72936 (R-devel) / 72937 (R-3-4-branch).

> Martin Morgan

Cool: The two  Martin M*'s  in the R core team have been "running in parallel".

The example which I meant where the missing break was hidden by
an earlier (wrong) errror message is this one:

> al <- formals(ls); length(al) <- 9
Error in length(al) <- 9 : invalid argument

which my extra changes will fix.

--
The other Martin M* - from the R Core Team

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix bad free in connections

2017-07-21 Thread Martin Morgan

On 07/20/2017 05:04 PM, Steve Grubb wrote:

Hello,

There are times when b points to buf which is a stack variable. This
leads to a bad free. The current test actually guarantees the stack
will try to get freed. Simplest to just drop the variable and directly
test if b should get freed.


Signed-off-by: Steve Grubb 


Index: src/main/connections.c
===
--- src/main/connections.c  (revision 72935)
+++ src/main/connections.c  (working copy)
@@ -421,7 +421,6 @@
  char buf[BUFSIZE], *b = buf;
  int res;
  const void *vmax = NULL; /* -Wall*/
-int usedVasprintf = FALSE;
  va_list aq;
  
  va_copy(aq, ap);

@@ -434,7 +433,7 @@
b = buf;
buf[BUFSIZE-1] = '\0';
warning(_("printing of extremely long output is truncated"));
-   } else usedVasprintf = TRUE;
+   }
  }
  #else
  if(res >= BUFSIZE) { /* res is the desired output length */
@@ -481,7 +480,7 @@
  } else
con->write(b, 1, res, con);
  if(vmax) vmaxset(vmax);
-if(usedVasprintf) free(b);
+if(b != buf) free(b);


The code can be exercised with

  z = paste(rep("a", 11000), collapse="")
  f = fifo("foo", "w+")
  writeLines(z, f)

If the macro HAVE_VASPRINTF is not defined, then b is the result of 
R_alloc(), and it is not appropriate to free(b).


If the macro is defined we go through

res = vasprintf(&b, format, ap);
if (res < 0) {
b = buf;
buf[BUFSIZE-1] = '\0';
warning(_("printing of extremely long output is truncated"));
} else usedVasprintf = TRUE;

b gets reallocated when

res = vasprintf(&b, format, ap);

is successful and res >= 0. usedVasprintf is then set to TRUE, and 
free(b) called.


It seems like the code is correct as written?

Martin Morgan (the real other Martin M*)


  return res;
  }

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




This email message may contain legally privileged and/or...{{dropped:2}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 'gsub' not perl compatible?

2017-07-21 Thread Lipatz Jean-Luc
Hi all,

Working on some SAS program conversions, I was testing this (3.4.0 Windows, but 
also 2.10.1 MacOsX):
gsub("b?","!","abc",perl=T)

which returns
[1] "!a!c!"

that I didn't understand.

Unfortunately, asked for the same thing SAS 9.4 replies : "!a!!c!", and so does 
Perl (Strawberry 5.26), a more logical answer for me.
Is there some problem with PCRE or some subtility that I didn't catch?

Results are similar with * instead of ?
and there is a similar issue with the lazy operator:
gsub("b??","!","abc",perl=T) gives : "!a!b!c!", while the other softwares give 
"!a!!!c!"


Thanks

Jean-Luc LIPATZ




[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Therneau, Terry M., Ph.D.
I'm chiming in late since I read the news in digest form, and I won't copy the entire 
conversation to date.


The issue raised comes up quite often in Cox models, so often that the Therneau and 
Grambsch book has a section on the issue (3.5, p 58).  After a few initial iterations the 
offending coefficient will increase by a constant at each iteration while the 
log-likelihood approaches an asymptote (essentially once the other coefficients "settle 
down").


The coxph routine tries to detect this case and print a warning, and this turns out to be 
very hard to do accurately.  I worked hard at tuning the threshold(s) for the message 
several years ago and finally gave up; I am guessing that the warning misses > 5% of the 
cases when the issue is true, and that 5% of the warnings that do print are incorrect.  
(And these estimates may be too optimistic.)   Highly correlated predictors tend to trip 
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.


All in all, I am not completely sure whether the message does more harm than good.  I'd be 
quite reluctant to go down the same path again with the glm function.


Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Ravi Varadhan
Please allow me to add my 3 cents.  Stopping an iterative optimization 
algorithm at an "appropriate" juncture is very tricky.  All one can say is that 
the algorithm terminated because it triggered a particular stopping criterion.  
A good software will tell you why it stopped - i.e. the stopping criterion that 
was triggered.  It is extremely difficult to make a failsafe guarantee that the 
triggered stopping criterion is the correct one and that the answer obtained is 
trustworthy. It is up to the user to determine whether the answer makes sense.  
In the case of maximizing a likelihood function, it is perfectly reasonable to 
stop when the algorithm has not made any progress in increasing the log 
likelihood.  In this case, the software should print out something like 
"algorithm terminated due to lack of improvement in log-likelihood."  
Therefore, I don't see a need to issue any warning, but simply report the 
stopping criterion that was applied to terminate the algorithm.

Best,
Ravi

-Original Message-
From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of Therneau, 
Terry M., Ph.D.
Sent: Friday, July 21, 2017 8:04 AM
To: r-devel@r-project.org; Mark Leeds ; 
jorism...@gmail.com; westra.harm...@outlook.com
Subject: Re: [Rd] Wrongly converging glm()

I'm chiming in late since I read the news in digest form, and I won't copy the 
entire conversation to date.

The issue raised comes up quite often in Cox models, so often that the Therneau 
and Grambsch book has a section on the issue (3.5, p 58).  After a few initial 
iterations the offending coefficient will increase by a constant at each 
iteration while the log-likelihood approaches an asymptote (essentially once 
the other coefficients "settle down").

The coxph routine tries to detect this case and print a warning, and this turns 
out to be very hard to do accurately.  I worked hard at tuning the threshold(s) 
for the message several years ago and finally gave up; I am guessing that the 
warning misses > 5% of the cases when the issue is true, and that 5% of the 
warnings that do print are incorrect.  
(And these estimates may be too optimistic.)   Highly correlated predictors 
tend to trip 
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.

All in all, I am not completely sure whether the message does more harm than 
good.  I'd be quite reluctant to go down the same path again with the glm 
function.

Terry Therneau

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix memory leak in PicTeXDeviceDriver

2017-07-21 Thread Martin Maechler
> Steve Grubb 
> on Thu, 20 Jul 2017 17:06:52 -0400 writes:

> Hello,
> This patch fixes a memory leak due to ptd going out of scope
> before its assigned to dd.

Hmm, I'm not an expert here, but I tend to say 
that it may not be a memory leak because the corresponding
function is static and only called from inside PicTeX() and that has 

const void *vmax = vmaxget();

at the beginning, and ends with

vmaxset(vmax);
return R_NilValue;

maybe because the code writer "saw" that it could not really
free everything correctly ?

The following code - after install.package("sfsmisc")
seems to suggest a very small "leak" but only up to some point, and your
patch does not seem to change much about it:

if(!require("sfsmisc")) install.packages("sfsmisc") # for Sys.sizes()
## but that's not available on Windoze ...

monitor <- function(call, n = 1, every = n %/% 50, doGC = TRUE) {
for(i in seq_len(n)) {
if(i == 1 || i %% every == 0) {
if(doGC) gc()
ns <- names(ss <- Sys.sizes())
cat(sprintf("%5d: (%s=%7d, %s=%7d)\n",
i, ns[1], ss[[1]], ns[2], ss[[2]]))
}
eval(call)
}
}

monitor(quote({pictex(); dev.off()}))
##   ^\__needed, otherwise have too many open devices
## goes up to some and then stays

monitor(quote({pdf(); dev.off()}))
## ---
## goes up to some and then stays ...[almost, sometimes increases still a bit]
## is not much different to pictex()


But I've added a  free(ptd) --- in a way where the code
is slightly easier to parse by a human.

Thank you, Steve!
Martin


> Signed-off-by: Steve Grubb 

> Index: src/library/grDevices/src/devPicTeX.c
> ===
> --- src/library/grDevices/src/devPicTeX.c (revision 72935)
> +++ src/library/grDevices/src/devPicTeX.c (working copy)
> @@ -665,8 +665,10 @@
> ptd-> width = width;
> ptd-> height = height;
 
> -if( ! PicTeX_Open(dd, ptd) ) 
> +if( ! PicTeX_Open(dd, ptd) ) {
> +free(ptd);
> return FALSE;
> +}
 
> /* Base Pointsize */
> /* Nominal Character Sizes in Pixels */

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix fscanf specifier in InIntegerAscii

2017-07-21 Thread Martin Maechler
> Steve Grubb 
> on Thu, 20 Jul 2017 17:28:53 -0400 writes:

> Hello,
> The SMBUF_SIZED_STRING allows fscanf to read upto 511 bytes. The buffer
> at line 1382 is only 128 bytes. The fscanf format specifier ought to be
> resized to prevent a stack overrun.

Yes, you are right, thank you!

Fix committed as svn rev  72945



> Signed-of-by: Steve Grubb 

> Index: saveload.c
> ===
> --- src/main/saveload.c   (revision 72935)
> +++ src/main/saveload.c   (working copy)
> @@ -1379,7 +1379,7 @@
> {
> char buf[128];
> int x, res;
> -res = fscanf(fp, SMBUF_SIZED_STRING, buf);
> +res = fscanf(fp, "%127s", buf);
> if(res != 1) error(_("read error"));
> if (strcmp(buf, "NA") == 0)
> return NA_INTEGER;

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix bad free in connections

2017-07-21 Thread Steve Grubb
On Friday, July 21, 2017 5:03:09 AM EDT Martin Morgan wrote:
> b gets reallocated when
> 
>  res = vasprintf(&b, format, ap);
> 
> is successful and res >= 0. usedVasprintf is then set to TRUE, and
> free(b) called.
> 
> It seems like the code is correct as written?

Yes, I think I see the issue. vasprintf seems to not be tracked for memory 
allocation. So, yes I agree its correct as written. Sorry for the noise.

-Steve

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [PATCH] Fix missing break

2017-07-21 Thread Steve Grubb
Hello Martin,

On Friday, July 21, 2017 4:21:21 AM EDT Martin Maechler wrote:
> I have now created an account for you.

Thanks. Is that the preferred method of transferring these patches?


> >> In examples like the one below, if you have R code that shows symptoms,
> >> it would really help in the bug report.
> > 
> > I am hoping that we can look at the code as seasoned programmers and say
> > yeah, that is a bug.
> 
> I agree in this case.
> OTOH, it is exactly one of the case where the bug is not
> triggerable currently:
> 
>   al <- formals(ls); length(al) <- 3
> 
> would trigger the bug... but you get an error message ".. vector .."
> and as I now found that is from a slightly misguided check:
> isVectorizable()  is not approriate here and should really be
> replaced by isList().
> 
> So .. indeed, your report will have triggered an improvement in
> the code, which I'm about to commit.

That's what it's all about.  :-)
 
> Thank you very much Steve!
> 
> > I run the code through Coverity and have quite a lot of
> > problems to tell you about.
> 
> I'm not the expert on static code analysis, but as a seasoned
> statistician (*and* from experience with other such analyses) I
> know that you always get false positives.

Absolutely. I weeded the report down to 15 issues to start with. There are 
also ways to annotate the code so that checkers dismiss something it would 
otherwise be inclined to report.

> >> Otherwise, someone else will have to analyze the code to decide whether
> >> it's a bug or missing comment.  That takes time, and if there are no
> >> known symptoms, it's likely to be assigned a low priority.  The sad
> >> truth is that very few members of R Core are currently actively fixing
> >> bugs.
> > 
> > That's a shame. I'd be happy to give the scan to people in core so they
> > can see what the lay of the land looks like.
> 
>  (hmm... the above does look a teeny tiny bit arrogant in my
>   eyes; but then I'm not a native English (nor "American" 
>   speaker ...)

I apologize if that is the way it came across. "That's a shame" can also mean 
"That's unfortunate" because I was thinking that I spent some time fixing up 
patches that might not be wanted. However, I see that you have looked at the 
patches and I thank you for that.  :-)

The second sentence above is an honest offer. I'd be happy to send the output 
of the report off list (in case anything sensitive is listed). In this and the 
other patches I haven't sent, I'm just picking the low hanging fruit.

-Steve

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Mark Leeds
Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
the cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out
that one of the parameters ( there were two parameters ) could have really
been anything and the same likelihood value would  be returned. So, what I
learned the hard way was  termination due to reasonable stopping  criteria
DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
a long time and only happened to notice it when playing around with the
likelihood by fixing the offending parameter to various values and
optimizing over the
non-offending parameter. Thanks for eloquent explanation.


  Mark




























On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan 
wrote:

> Please allow me to add my 3 cents.  Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky.  All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion.  A good software will tell you why it stopped - i.e. the
> stopping criterion that was triggered.  It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense.  In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood.  In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in log-likelihood."  Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: r-devel@r-project.org; Mark Leeds ;
> jorism...@gmail.com; westra.harm...@outlook.com
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58).  After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the log-likelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately.  I worked hard at tuning the
> threshold(s) for the message several years ago and finally gave up; I am
> guessing that the warning misses > 5% of the cases when the issue is true,
> and that 5% of the warnings that do print are incorrect.
> (And these estimates may be too optimistic.)   Highly correlated
> predictors tend to trip
> it up, e.g., the truncated power spline basis used by the rcs function in
> Hmisc.
>
> All in all, I am not completely sure whether the message does more harm
> than good.  I'd be quite reluctant to go down the same path again with the
> glm function.
>
> Terry Therneau
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Ravi Varadhan
“So, what I learned the hard way was  termination due to reasonable stopping  
criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”

Yes, I agree, Mark.

Let me add another observation.  In the “optimx” package, John Nash and I 
implemented a check for optimality conditions – first and second order KKT 
conditions.  This involves checking whether the gradient is sufficiently small 
and the Hessian is positive definite (for local minimum) at the final parameter 
values.  However, it can be quite time consuming to compute these quantities 
and in some problems checking KKT can take up more effort than finding the 
solution!  Furthermore, it is difficult to come up with good thresholds for 
determining “small” gradient and “positive definite” Hessian, since these can 
depend upon the scale of the objective function and the parameters.

Ravi

From: Mark Leeds [mailto:marklee...@gmail.com]
Sent: Friday, July 21, 2017 3:09 PM
To: Ravi Varadhan 
Cc: Therneau, Terry M., Ph.D. ; r-devel@r-project.org; 
jorism...@gmail.com; westra.harm...@outlook.com
Subject: Re: [Rd] Wrongly converging glm()

Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining the 
cause
of the termination. The codes returned were fine. The problem was that
the model I was using could have multiple solutions ( regardless of the data
sent in ) so, even though the stopping criteria was reached, it turned out that 
one of the parameters ( there were two parameters ) could have really been 
anything and the same likelihood value would  be returned. So, what I learned 
the hard way was  termination due to reasonable stopping  criteria DOES NOT 
NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for a long time 
and only happened to notice it when playing around with the likelihood by 
fixing the offending parameter to various values and optimizing over the
non-offending parameter. Thanks for eloquent explanation.

  Mark





















On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan 
mailto:ravi.varad...@jhu.edu>> wrote:
Please allow me to add my 3 cents.  Stopping an iterative optimization 
algorithm at an "appropriate" juncture is very tricky.  All one can say is that 
the algorithm terminated because it triggered a particular stopping criterion.  
A good software will tell you why it stopped - i.e. the stopping criterion that 
was triggered.  It is extremely difficult to make a failsafe guarantee that the 
triggered stopping criterion is the correct one and that the answer obtained is 
trustworthy. It is up to the user to determine whether the answer makes sense.  
In the case of maximizing a likelihood function, it is perfectly reasonable to 
stop when the algorithm has not made any progress in increasing the log 
likelihood.  In this case, the software should print out something like 
"algorithm terminated due to lack of improvement in log-likelihood."  
Therefore, I don't see a need to issue any warning, but simply report the 
stopping criterion that was applied to terminate the algorithm.

Best,
Ravi

-Original Message-
From: R-devel 
[mailto:r-devel-boun...@r-project.org] On 
Behalf Of Therneau, Terry M., Ph.D.
Sent: Friday, July 21, 2017 8:04 AM
To: r-devel@r-project.org; Mark Leeds 
mailto:marklee...@gmail.com>>; 
jorism...@gmail.com; 
westra.harm...@outlook.com
Subject: Re: [Rd] Wrongly converging glm()
I'm chiming in late since I read the news in digest form, and I won't copy the 
entire conversation to date.

The issue raised comes up quite often in Cox models, so often that the Therneau 
and Grambsch book has a section on the issue (3.5, p 58).  After a few initial 
iterations the offending coefficient will increase by a constant at each 
iteration while the log-likelihood approaches an asymptote (essentially once 
the other coefficients "settle down").

The coxph routine tries to detect this case and print a warning, and this turns 
out to be very hard to do accurately.  I worked hard at tuning the threshold(s) 
for the message several years ago and finally gave up; I am guessing that the 
warning misses > 5% of the cases when the issue is true, and that 5% of the 
warnings that do print are incorrect.
(And these estimates may be too optimistic.)   Highly correlated predictors 
tend to trip
it up, e.g., the truncated power spline basis used by the rcs function in Hmisc.

All in all, I am not completely sure whether the message does more harm than 
good.  I'd be quite reluctant to go down the same path again with the glm 
function.

Terry Therneau
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[[alternative HTML version deleted]]

_

Re: [Rd] Wrongly converging glm()

2017-07-21 Thread Mark Leeds
H Ravi: Yes, IIRC that's EXACTLY what was going on in my case. Based on
the codes from Rvmmin,  the objective functon wasn't changing much and I
think norm of the gradient was close to zero so it was difficult for me to
detect the issue. I found it when I
happenered to notice that the objective function seemed independent of one
of the parameters.






On Fri, Jul 21, 2017 at 4:36 PM, Ravi Varadhan 
wrote:

> “So, what I learned the hard way was  termination due to reasonable
> stopping  criteria DOES NOT NECESSARILY EQUAL OPTIMAL.”
>
>
>
> Yes, I agree, Mark.
>
>
>
> Let me add another observation.  In the “optimx” package, John Nash and I
> implemented a check for optimality conditions – first and second order KKT
> conditions.  This involves checking whether the gradient is sufficiently
> small and the Hessian is positive definite (for local minimum) at the final
> parameter values.  However, it can be quite time consuming to compute these
> quantities and in some problems checking KKT can take up more effort than
> finding the solution!  Furthermore, it is difficult to come up with good
> thresholds for determining “small” gradient and “positive definite”
> Hessian, since these can depend upon the scale of the objective function
> and the parameters.
>
>
>
> Ravi
>
>
>
> *From:* Mark Leeds [mailto:marklee...@gmail.com]
> *Sent:* Friday, July 21, 2017 3:09 PM
> *To:* Ravi Varadhan 
> *Cc:* Therneau, Terry M., Ph.D. ; r-devel@r-project.org;
> jorism...@gmail.com; westra.harm...@outlook.com
>
> *Subject:* Re: [Rd] Wrongly converging glm()
>
>
>
> Hi Ravi: Well said. In John's Rvmmin package, he has codes for explaining
> the cause
>
> of the termination. The codes returned were fine. The problem was that
>
> the model I was using could have multiple solutions ( regardless of the
> data
> sent in ) so, even though the stopping criteria was reached, it turned out
> that one of the parameters ( there were two parameters ) could have really
> been anything and the same likelihood value would  be returned. So, what I
> learned the hard way was  termination due to reasonable stopping  criteria
> DOES NOT NECESSARILY EQUAL OPTIMAL. But I lived in the dark about this for
> a long time and only happened to notice it when playing around with the
> likelihood by fixing the offending parameter to various values and
> optimizing over the
> non-offending parameter. Thanks for eloquent explanation.
>
>
> Mark
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Jul 21, 2017 at 9:22 AM, Ravi Varadhan 
> wrote:
>
> Please allow me to add my 3 cents.  Stopping an iterative optimization
> algorithm at an "appropriate" juncture is very tricky.  All one can say is
> that the algorithm terminated because it triggered a particular stopping
> criterion.  A good software will tell you why it stopped - i.e. the
> stopping criterion that was triggered.  It is extremely difficult to make a
> failsafe guarantee that the triggered stopping criterion is the correct one
> and that the answer obtained is trustworthy. It is up to the user to
> determine whether the answer makes sense.  In the case of maximizing a
> likelihood function, it is perfectly reasonable to stop when the algorithm
> has not made any progress in increasing the log likelihood.  In this case,
> the software should print out something like "algorithm terminated due to
> lack of improvement in log-likelihood."  Therefore, I don't see a need to
> issue any warning, but simply report the stopping criterion that was
> applied to terminate the algorithm.
>
> Best,
> Ravi
>
> -Original Message-
> From: R-devel [mailto:r-devel-boun...@r-project.org] On Behalf Of
> Therneau, Terry M., Ph.D.
> Sent: Friday, July 21, 2017 8:04 AM
> To: r-devel@r-project.org; Mark Leeds ;
> jorism...@gmail.com; westra.harm...@outlook.com
> Subject: Re: [Rd] Wrongly converging glm()
>
> I'm chiming in late since I read the news in digest form, and I won't copy
> the entire conversation to date.
>
> The issue raised comes up quite often in Cox models, so often that the
> Therneau and Grambsch book has a section on the issue (3.5, p 58).  After a
> few initial iterations the offending coefficient will increase by a
> constant at each iteration while the log-likelihood approaches an asymptote
> (essentially once the other coefficients "settle down").
>
> The coxph routine tries to detect this case and print a warning, and this
> turns out to be very hard to do accurately.  I worked hard at tuning the
> threshold(s) for the message several years ago and finally gave up; I am
> guessing that the warning misses > 5% of the cases when the issue is true,
> and that 5% of the warnings that do print are incorrect.
> (And these estimates may be too optimistic.)   Highly correlated
> predictors tend to trip
> it up, e.g., the truncated power spline basis used by the rcs function in
> Hmisc.
>
> All in all, I am not completely sure whether the message does more harm
> than good.