On 10/09/2021 09:12, Duncan Murdoch wrote:
On 10/09/2021 11:29 a.m., Hervé Pagès wrote:
Hi,
The first warning below is unexpected and confusing:
> as.raw(c(3e9, 5.1))
[1] 00 05
Warning messages:
1: NAs introduced by coercion to integer range
2: out-of-range values treated as 0 in coercion to raw
The reason we get it is that coercion from numeric to raw is currently
implemented on top of coercion from numeric to int (file
src/main/coerce.c, lines 700-710):
case REALSXP:
for (i = 0; i < n; i++) {
// if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
tmp = IntegerFromReal(REAL_ELT(v, i), &warn);
if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;
The first warning comes from the call to IntegerFromReal().
The following code avoids the spurious warning and is also simpler and
slightly faster:
case REALSXP:
for (i = 0; i < n; i++) {
// if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
double vi = REAL_ELT(v, i);
if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;
Doesn't that give different results in case vi is so large that "(int)
vi" overflows? (I don't know what the C standard says, but some online
references say that behaviour is implementation dependent.)
For example, if
vi = 1.0 + INT_MAX;
wouldn't "(int) vi" be equal to a small integer?
Good catch, thanks!
Replacing
if(ISNAN(vi) || (tmp = (int) vi) < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
with
if(ISNAN(vi) || vi <= -1.0 || vi >= 256.0)
{
tmp = 0;
warn |= WARN_RAW;
} else {
tmp = (int) vi;
}
pa[i] = (Rbyte) tmp;
should address that.
FWIW IntegerFromReal() has a similar risk of int overflow
(src/main/coerce.c, lines 128-138):
int attribute_hidden
IntegerFromReal(double x, int *warn)
{
if (ISNAN(x))
return NA_INTEGER;
else if (x >= INT_MAX+1. || x <= INT_MIN ) {
*warn |= WARN_INT_NA;
return NA_INTEGER;
}
return (int) x;
}
The cast to int will also be an int overflow situation if x is > INT_MAX
and < INT_MAX+1 so the risk is small! There are other instances of this
situation in IntegerFromComplex() and IntegerFromString().
More below...
Duncan Murdoch
Coercion from complex to raw has the same problem:
> as.raw(c(3e9+0i, 5.1))
[1] 00 05
Warning messages:
1: NAs introduced by coercion to integer range
2: out-of-range values treated as 0 in coercion to raw
Current implementation (file src/main/coerce.c, lines 711-721):
case CPLXSXP:
for (i = 0; i < n; i++) {
// if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
tmp = IntegerFromComplex(COMPLEX_ELT(v, i), &warn);
if(tmp == NA_INTEGER || tmp < 0 || tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
}
pa[i] = (Rbyte) tmp;
}
break;
This implementation has the following additional problem when the
supplied complex has a nonzero imaginary part:
> as.raw(300+4i)
[1] 00
Warning messages:
1: imaginary parts discarded in coercion
2: out-of-range values treated as 0 in coercion to raw
> as.raw(3e9+4i)
[1] 00
Warning messages:
1: NAs introduced by coercion to integer range
2: out-of-range values treated as 0 in coercion to raw
In one case we get a warning about the discarding of the imaginary part
but not the other case, which is unexpected. We should see the exact
same warning (or warnings) in both cases.
With the following fix we only get the warning about the discarding of
the imaginary part if we are not in a "out-of-range values treated as 0
in coercion to raw" situation:
case CPLXSXP:
for (i = 0; i < n; i++) {
// if ((i+1) % NINTERRUPT == 0) R_CheckUserInterrupt();
Rcomplex vi = COMPLEX_ELT(v, i);
if(ISNAN(vi.r) || ISNAN(vi.i) || (tmp = (int) vi.r) < 0 ||
tmp > 255) {
tmp = 0;
warn |= WARN_RAW;
} else {
if(vi.i != 0.0)
warn |= WARN_IMAG;
}
pa[i] = (Rbyte) tmp;
}
break;
Corrected version:
if(ISNAN(vi.r) || ISNAN(vi.i) || vi.r <= -1.00 ||
vi.r >= 256.00) {
tmp = 0;
warn |= WARN_RAW;
} else {
tmp = (int) vi.r;
if(vi.i != 0.0)
warn |= WARN_IMAG;
}
pa[i] = (Rbyte) tmp;
Hopefully this time I got it right.
Best,
H.
Finally, coercion from character to raw has the same problem and its
code can be fixed in a similar manner:
> as.raw(c("3e9", 5.1))
[1] 00 05
Warning messages:
1: NAs introduced by coercion to integer range
2: out-of-range values treated as 0 in coercion to raw
Cheers,
H.
--
Hervé Pagès
Bioconductor Core Team
hpages.on.git...@gmail.com
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel