Re: [Rd] strtoi output of empty string inconsistent across platforms

2019-01-12 Thread Michael Chirico
Thanks Martin.

For what it's worth, this extremely representative, highly scientific
Twitter poll suggests the Mac/Linux split is pretty stark (NA on Mac, 0 on
Linux)

https://twitter.com/michael_chirico/status/1083649190117306369?s=17

On Sat, Jan 12, 2019, 2:00 AM Martin Maechler  > Martin Maechler
> > on Fri, 11 Jan 2019 09:44:14 +0100 writes:
>
> > Michael Chirico
> > on Fri, 11 Jan 2019 14:36:17 +0800 writes:
>
> >> Identified as root cause of a bug in data.table:
> >> https://github.com/Rdatatable/data.table/issues/3267
>
> >> On my machine, strtoi("", base = 2L) produces NA_integer_
> >> (which seems consistent with ?strtoi: "Values which
> >> cannot be interpreted as integers or would overflow are
> >> returned as NA_integer_").
>
> > indeed consistent with R's documentation on strtoi().
> > What machine would that be?
>
> >> But on all the other machines I've seen, 0L is
> >> returned. This seems to be consistent with the output of
> >> a simple C program using the underlying strtol function
> >> (see data.table link for this program, and for full
> >> sessionInfo() of some environments with differing
> >> output).
>
> >> So, what is the correct output of strtoi("", base = 2L)?
>
> >> Is the cross-platform inconsistency to be
> >> expected/documentable?
>
> > The inconsistency is certainly undesirable.  The relevant
> > utility function in R's source (/src/main/character.c)
> > is
>
> > static int strtoi(SEXP s, int base) { long int res; char
> > *endp;
>
> > /* strtol might return extreme values on error */
> > errno = 0;
>
> > if(s == NA_STRING) return(NA_INTEGER); res =
> > strtol(CHAR(s), &endp, base); /* ASCII */ if(errno ||
> > *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res <
> > INT_MIN) res = NA_INTEGER; return (int) res; }
>
> > and so it clearly is a platform-inconsistency in the
> > underlying C library's strtol().
>
> (corrected typos here: )
>
> > I think we should make this cross-platform consistent ...
> > and indeed it makes much sense to ensure the result of
>
> > strtoi("", base=2L)to become   NA_integer_
>
> > but chances are that would break code that has relied on
> > the current behavior {on "all but your computer" ;-)} ?
>
> I still think that such a change should be done.
>
> 'make check all' on the R source (+ Recommended packages) seems
> not to signal any error or warning with such a change, so I plan
> to commit that change to "the trunk" / "R-devel" soon, unless
> concerns are raised highly (and quickly enough).
>
> Martin
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strtoi output of empty string inconsistent across platforms

2019-01-12 Thread Martin Maechler
> Michael Chirico 
> on Sat, 12 Jan 2019 17:34:03 +0800 writes:

> Thanks Martin.  For what it's worth, this extremely
> representative, highly scientific Twitter poll suggests
> the Mac/Linux split is pretty stark (NA on Mac, 0 on
> Linux)

> https://twitter.com/michael_chirico/status/1083649190117306369?s=17

> On Sat, Jan 12, 2019, 2:00 AM Martin Maechler
> > > Martin Maechler > on Fri, 11 Jan 2019 09:44:14
>> +0100 writes:
>> 
>> > Michael Chirico > on Fri, 11 Jan 2019 14:36:17
>> +0800 writes:
>> 
>> >> Identified as root cause of a bug in data.table: >>
>> https://github.com/Rdatatable/data.table/issues/3267
>> 
>> >> On my machine, strtoi("", base = 2L) produces
>> NA_integer_ >> (which seems consistent with ?strtoi:
>> "Values which >> cannot be interpreted as integers or
>> would overflow are >> returned as NA_integer_").
>> 
>> > indeed consistent with R's documentation on strtoi().
>> > What machine would that be?
>> 
>> >> But on all the other machines I've seen, 0L is >>
>> returned. This seems to be consistent with the output of
>> >> a simple C program using the underlying strtol
>> function >> (see data.table link for this program, and
>> for full >> sessionInfo() of some environments with
>> differing >> output).
>> 
>> >> So, what is the correct output of strtoi("", base =
>> 2L)?
>> 
>> >> Is the cross-platform inconsistency to be >>
>> expected/documentable?
>> 
>> > The inconsistency is certainly undesirable.  The
>> relevant > utility function in R's source
>> (/src/main/character.c) > is
>> 
>> > static int strtoi(SEXP s, int base) { long int res;
>> char > *endp;
>> 
>> > /* strtol might return extreme values on error */ >
>> errno = 0;
>> 
>> > if(s == NA_STRING) return(NA_INTEGER); res = >
>> strtol(CHAR(s), &endp, base); /* ASCII */ if(errno || >
>> *endp != '\0') res = NA_INTEGER; if(res > INT_MAX || res
>> < > INT_MIN) res = NA_INTEGER; return (int) res; }
>> 
>> > and so it clearly is a platform-inconsistency in the >
>> underlying C library's strtol().
>> 
>> (corrected typos here: )
>> 
>> > I think we should make this cross-platform consistent
>> ...  > and indeed it makes much sense to ensure the
>> result of
>> 
>> > strtoi("", base=2L) to become NA_integer_
>> 
>> > but chances are that would break code that has relied
>> on > the current behavior {on "all but your computer"
>> ;-)} ?
>> 
>> I still think that such a change should be done.
>> 
>> 'make check all' on the R source (+ Recommended packages)
>> seems not to signal any error or warning with such a
>> change, so I plan to commit that change to "the trunk" /
>> "R-devel" soon, unless concerns are raised highly (and
>> quickly enough).

I've now committed  svn rev 75977  to R-devel  which does change
strtoi("", )  to return NA_integer_   (for all )
platform independently, not even calling lib C's strtol() in
that case.

Martin

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel