[Rd] R 4.5.0 is released

2025-04-11 Thread peter dalgaard
The build system rolled up R-4.5.0.tar.gz and .xz (codename "How About a 
Twenty-Six") this morning.

This is a major release with a number of new features, API changes, and bug 
fixes.

The list below details the changes in this release. 

You can get the source code from

https://cran.r-project.org/src/base/R-4/R-4.5.0.tar.gz
https://cran.r-project.org/src/base/R-4/R-4.5.0.tar.xz

or wait for it to be mirrored at a CRAN site nearer to you.

Binaries for various platforms will appear in due course. 


For the R Core Team,

Peter Dalgaard


These are the checksums (md5 and SHA-256) for the freshly created files, in 
case you wish
to check that they are uncorrupted:

MD5 (AUTHORS) = 0ba932825aefae5566dc44822916b266
MD5 (build-dist.log) = 31a495e5d716faf011803973f977adbd
MD5 (COPYING) = eb723b61539feef013de476e68b5c50a
MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343
MD5 (FAQ) = cf1644761934816fb349f15d7956732e
MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b
MD5 (NEWS) = fda3e3633537ffb9c02f1278a7288db9
MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8
MD5 (NEWS.1) = f8466e418dec6b958b4ce484a13f9a9d
MD5 (NEWS.2) = 05e4a57b645e651ba13019c3cf5c495f
MD5 (NEWS.3) = 082abfc2fdc36912075e78b92fb2941e
MD5 (R-latest.tar.gz) = 2342b31a604631f8b130033d8582d547
MD5 (R-latest.tar.xz) = d379331fbe3f9bf19d3e53f547317114
MD5 (README) = e8e5ee38544d34409177cd479025fe66
MD5 (RESOURCES) = 5949c86e75c813f8f6ebc420aae46881
MD5 (THANKS) = 61d146aa6a2cf5999295b2fb340991c1
MD5 (VERSION-INFO.dcf) = 5ca3dfa954644258bfd0f83319c0377c
MD5 (R-4/R-4.5.0.tar.gz) = 2342b31a604631f8b130033d8582d547
MD5 (R-4/R-4.5.0.tar.xz) = d379331fbe3f9bf19d3e53f547317114

4cc9dcdfa46a2e2cff45c27df8f3a9f851ec97b44b8647ab8a9fbf844f37937f  AUTHORS
8f85b62440f992f6dd8e05e5bda84995b1a8d42c7d4d2a05927c6ae6e2fbad59  build-dist.log
e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4  COPYING
6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3  COPYING.LIB
ec1eb421f6810ffb53162b9dfb371190de30ab490855ddfa49fc0bf39c7f11cf  FAQ
f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31  INSTALL
d03a80d9ab25ce50e0ec7923385729d9dfb7d4fab1f33041e62c143cbff5a4f9  NEWS
4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62  NEWS.0
602f3a40ef759c7b2a6c485a33dc674af34249644ac5fb53b21283d4e12e808d  NEWS.1
7babb6d82a4479b2c3803f7dbfaab63125b0f0d1b6bb40b1389d3af65eaf83aa  NEWS.2
eb473efd365822e7ae64eb0f86028ea019815fdd273fe7daa9c6fe5e28fd2737  NEWS.3
3b33ea113e0d1ddc9793874d5949cec2c7386f66e4abfb1cef9aec22846c3ce1  
R-latest.tar.gz
101766c3aefffcbacde39c8a0b9c3accf50a563955f66ff2f7b321d6bf07da8d  
R-latest.tar.xz
f5aa875c23185cbfc3a50739d7295b0caba2cf0e38ba082850be338cc9541154  README
5e7ddf7349ada12c8142c42bac955835efd1768978cb476b61a3b53255442b24  RESOURCES
1d5064c86b6813865a033763f43212064c0a67ef05f5af13b13c4feb08264a33  THANKS
d3ced974014dc3da6ef3cf126d67e427172e9f3f77f801483f91acc881e2de38  
VERSION-INFO.dcf
3b33ea113e0d1ddc9793874d5949cec2c7386f66e4abfb1cef9aec22846c3ce1  
R-4/R-4.5.0.tar.gz
101766c3aefffcbacde39c8a0b9c3accf50a563955f66ff2f7b321d6bf07da8d  
R-4/R-4.5.0.tar.xz

This is the relevant part of the NEWS file

CHANGES IN R 4.5.0:

  NEW FEATURES:

• as.integer(rl) and hence as.raw(rl) now work for a list of raw(1)
  elements, as proposed by Michael Chirico's PR#18696.

• graphics' grid() gains optional argument nintLog.

• New functions check_package_urls() and check_package_dois() in
  package tools for checking URLs and DOIs in package sources.

• New head() and tail() methods for class "ts" time series,
  proposed by Spencer Graves on R-devel.

• New qr.influence() function, a “bare bones” interface to the
  lm.influence() leave-one-out diagnostics computations; wished for
  in PR#18739.

• Package citation() results auto-generated from the package
  metadata now also provide package DOIs for CRAN and Bioconductor
  packages.

• New function grepv() identical to grep() except for the default
  value = TRUE.

• methods(:::) now does report methods when neither
  the generic nor the methods have been exported.

• pdf() gains an author argument to set the corresponding metadata
  field, and logical arguments timestamp and producer to optionally
  omit the respective metadata.  (Thanks to Edzer Pebesma.)

• grDevices::glyphInfo() gains a rot argument to allow per-glyph
  rotation.  (Thanks to Daniel Sabanes Bove.)

• Package tools now exports functions CRAN_current_db(),
  CRAN_aliases_db(), CRAN_rdxrefs_db(), CRAN_archive_db(), and
  CRAN_authors_db().

• Package tools now exports functions R() and
  parse_URI_reference().

• Package tools now exports functions base_aliases_db() and
  base_rdxrefs_db().

• It is now possible to set the background color for row and column
  names in the data editor on Windows (Rgui).

• Rterm on Windows now accepts input lines of unlimited length.

• file

Re: [Rd] table() and as.character() performance for logical values

2025-04-11 Thread Suharto Anggono Suharto Anggono via R-devel
 On second thought, I wonder if the caching in my changed 'StringFromLogical' 
in my previous message is safe. While 'ans' in the C function 'coerceToString' 
is protected, its element is also protected. If the object corresponding to 
'ans' is then no longer protected, is it possible for the cached object 
'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, 
I think of clearing the cache for each first filling. For example, by abusing 
'warn' argument, the following is added to my changed 'StringFromLogical'.

 if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

 warn = i == 0;

is inserted before

 SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:


> Suharto Anggono Suharto Anggono via R-devel
>    on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:

    > Chain of calls of C functions in coerce.c for as.character() in 
R:

    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)

    > The definition of 'StringFromLogical' in coerce.c :

    > Chain of calls of C functions in coerce.c for as.character() in 
R:
    >
    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)
    >
    > The definition of 'StringFromLogical' in coerce.c :
    >
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    int w;
    >    formatLogical(&x, 1, &w);
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else return mkChar(EncodeLogical(x, w));
    > }
    >
    > The definition of 'EncodeLogical' in printutils.c :
    >
    > const char *EncodeLogical(int x, int w)
    > {
    >    static char buff[NB];
    >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
    >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
    >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
    >    buff[NB-1] = '\0';
    >    return buff;
    > }
    >
    > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    2.69    0.02    2.73
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.15    0.04    0.20
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.08    0.05    0.13
    > > L <- rep(NA, 10^7)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    0.11    0.00    0.11
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.16    0.06    0.22
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.09    0.03    0.12
    >
    > `as.character` of a logical vector that is all NA is fast enough.
    > It appears that the call to 'formatLogical' inside > the C function
    > 'StringFromLogical' does not introduce much    > slowdown.


    > I found that using string literal inside the C function 
'StringFromLogical', by replacing
    > EncodeLogical(x, w)
    > with
    > x ? "TRUE" : "FALSE"
    > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

    > Alternatively,
or in addition !


    > "fast path" could be introduced in 'EncodeLogical', potentially also 
benefits format() in R.
    > For example, without replacing existing code, the following fragment 
could be inserted.
    >
    >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return 
CHAR(R_print.na_string);}
    >    else if(x) {if(w == 4) return "TRUE";}
    >    else {if(w == 5) return "FALSE";}
    >
    > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster 
than as.character(L) .
    >
    > Precomputing or caching possible results of the C function 
'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", 
"TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to
    >
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    static SEXP TrueCh, FalseCh;
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
    >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));

    > }

Indeed, and something along this line (storing the other two constant strings) 
was also
my thought when seeing the
  mkChar(x ? "TRUE" : "FALSE)
you implicitly proposed above.

I'm looking into applying both speedups;
thank you very much, Suharto!

Martin


--
Martin Maechler
ETH Zurich  and  R Core team
  
[[alternative HTML version deleted]]

___

Re: [Rd] table() and as.character() performance for logical values

2025-04-11 Thread Suharto Anggono Suharto Anggono via R-devel
 Oh, with the abuse of 'warn' in my previous message, warning would be issued 
if the input 'v' of 'coerceToString' is a logical vector of length 1.

Revision:

Added to my changed 'StringFromLogical':
if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;}

'coerceToString': insert
if (i == 0) warn = 1;
for LGLSXP case or initialize 'warn' to 16

'coerceToSymbol': insert
warn = 1;
for LGLSXP case or initialize 'warn' to 16


Another way is following the approach of caching in ''StringFromInteger'.

--
On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono 
 wrote:


On second thought, I wonder if the caching in my changed 'StringFromLogical' in 
my previous message is safe. While 'ans' in the C function 'coerceToString' is 
protected, its element is also protected. If the object corresponding to 'ans' 
is then no longer protected, is it possible for the cached object 'TrueCh' or 
'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of 
clearing the cache for each first filling. For example, by abusing 'warn' 
argument, the following is added to my changed 'StringFromLogical'.

if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

warn = i == 0;

is inserted before

SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:


> Suharto Anggono Suharto Anggono via R-devel
>    on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:

    > Chain of calls of C functions in coerce.c for as.character() in 
R:

    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)

    > The definition of 'StringFromLogical' in coerce.c :

    > Chain of calls of C functions in coerce.c for as.character() in 
R:
    >
    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)
    >
    > The definition of 'StringFromLogical' in coerce.c :
    >
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    int w;
    >    formatLogical(&x, 1, &w);
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else return mkChar(EncodeLogical(x, w));
    > }
    >
    > The definition of 'EncodeLogical' in printutils.c :
    >
    > const char *EncodeLogical(int x, int w)
    > {
    >    static char buff[NB];
    >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
    >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
    >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
    >    buff[NB-1] = '\0';
    >    return buff;
    > }
    >
    > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    2.69    0.02    2.73
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.15    0.04    0.20
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.08    0.05    0.13
    > > L <- rep(NA, 10^7)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    0.11    0.00    0.11
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.16    0.06    0.22
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.09    0.03    0.12
    >
    > `as.character` of a logical vector that is all NA is fast enough.
    > It appears that the call to 'formatLogical' inside > the C function
    > 'StringFromLogical' does not introduce much    > slowdown.


    > I found that using string literal inside the C function 
'StringFromLogical', by replacing
    > EncodeLogical(x, w)
    > with
    > x ? "TRUE" : "FALSE"
    > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

    > Alternatively,
or in addition !


    > "fast path" could be introduced in 'EncodeLogical', potentially also 
benefits format() in R.
    > For example, without replacing existing code, the following fragment 
could be inserted.
    >
    >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return 
CHAR(R_print.na_string);}
    >    else if(x) {if(w == 4) return "TRUE";}
    >    else {if(w == 5) return "FALSE";}
    >
    > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster 
than as.character(L) .
    >
    > Precomputing or caching possible results of the C function 
'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", 
"TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to
    >
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    static

Re: [Rd] table() and as.character() performance for logical values

2025-04-11 Thread Suharto Anggono Suharto Anggono via R-devel
 Alternative revision:

Added to my changed 'StringFromLogical':
#define CACHE 16
if (!(*warn & CACHE)) {TrueCh = FalseCh = NULL; *warn |= CACHE;}

No change to 'coerceToString' and 'coerceToSymbol'.

--
On Friday, 11 April 2025 at 08:02:58 pm GMT+7, Suharto Anggono Suharto Anggono 
 wrote:


Oh, with the abuse of 'warn' in my previous message, warning would be issued if 
the input 'v' of 'coerceToString' is a logical vector of length 1.

Revision:

Added to my changed 'StringFromLogical':
if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;}

'coerceToString': insert
if (i == 0) warn = 1;
for LGLSXP case or initialize 'warn' to 16

'coerceToSymbol': insert
warn = 1;
for LGLSXP case or initialize 'warn' to 16


Another way is following the approach of caching in ''StringFromInteger'.

--
On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono 
 wrote:


On second thought, I wonder if the caching in my changed 'StringFromLogical' in 
my previous message is safe. While 'ans' in the C function 'coerceToString' is 
protected, its element is also protected. If the object corresponding to 'ans' 
is then no longer protected, is it possible for the cached object 'TrueCh' or 
'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of 
clearing the cache for each first filling. For example, by abusing 'warn' 
argument, the following is added to my changed 'StringFromLogical'.

if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

warn = i == 0;

is inserted before

SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:


> Suharto Anggono Suharto Anggono via R-devel
>    on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:

    > Chain of calls of C functions in coerce.c for as.character() in 
R:

    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)

    > The definition of 'StringFromLogical' in coerce.c :

    > Chain of calls of C functions in coerce.c for as.character() in 
R:
    >
    > do_asatomic
    > ascommon
    > coerceVector
    > coerceToString
    > StringFromLogical (for each element)
    >
    > The definition of 'StringFromLogical' in coerce.c :
    >
    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    int w;
    >    formatLogical(&x, 1, &w);
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else return mkChar(EncodeLogical(x, w));
    > }
    >
    > The definition of 'EncodeLogical' in printutils.c :
    >
    > const char *EncodeLogical(int x, int w)
    > {
    >    static char buff[NB];
    >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
    >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
    >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
    >    buff[NB-1] = '\0';
    >    return buff;
    > }
    >
    > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    2.69    0.02    2.73
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.15    0.04    0.20
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.08    0.05    0.13
    > > L <- rep(NA, 10^7)
    > > system.time(as.character(L))
    >    user  system elapsed
    >    0.11    0.00    0.11
    > > system.time(c("FALSE", "TRUE")[L+1])
    >    user  system elapsed
    >    0.16    0.06    0.22
    > > system.time(c("FALSE", "TRUE")[L+1L])
    >    user  system elapsed
    >    0.09    0.03    0.12
    >
    > `as.character` of a logical vector that is all NA is fast enough.
    > It appears that the call to 'formatLogical' inside > the C function
    > 'StringFromLogical' does not introduce much    > slowdown.


    > I found that using string literal inside the C function 
'StringFromLogical', by replacing
    > EncodeLogical(x, w)
    > with
    > x ? "TRUE" : "FALSE"
    > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

    > Alternatively,
or in addition !


    > "fast path" could be introduced in 'EncodeLogical', potentially also 
benefits format() in R.
    > For example, without replacing existing code, the following fragment 
could be inserted.
    >
    >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return 
CHAR(R_print.na_string);}
    >    else if(x) {if(w == 4) return "TRUE";}
    >    else {if(w == 5) return "FALSE";}
    >
    > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster 
than as.character(L) .
    >

[Rd] Check for protection (was: table() and as.character() performance for logical values)

2025-04-11 Thread Duncan Murdoch
On a tangent from the main topic of this thread:  sometimes (especially 
to non-experts) it's not obvious whether a variable is protected or not.


I don't think there's any easy way to determine that, but perhaps there 
should be.  Would it be possible to add a run-time test you could call 
in C code (e.g. is_protected(x)) that would do the same search the 
garbage collector does in order to determine if a particular pointer is 
protected?


This would be an expensive operation, similar in cost to actually doing 
a garbage collection.  You wouldn't want to do it routinely, but it 
would be really helpful in debugging.


Duncan Murdoch

On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote:

  On second thought, I wonder if the caching in my changed 'StringFromLogical' 
in my previous message is safe. While 'ans' in the C function 'coerceToString' 
is protected, its element is also protected. If the object corresponding to 
'ans' is then no longer protected, is it possible for the cached object 
'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, 
I think of clearing the cache for each first filling. For example, by abusing 
'warn' argument, the following is added to my changed 'StringFromLogical'.

  if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

  warn = i == 0;

is inserted before

  SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:



Suharto Anggono Suharto Anggono via R-devel
     on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:


     > Chain of calls of C functions in coerce.c for as.character() in 
R:

     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)

     > The definition of 'StringFromLogical' in coerce.c :

     > Chain of calls of C functions in coerce.c for as.character() in 
R:
     >
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)
     >
     > The definition of 'StringFromLogical' in coerce.c :
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    int w;
     >    formatLogical(&x, 1, &w);
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else return mkChar(EncodeLogical(x, w));
     > }
     >
     > The definition of 'EncodeLogical' in printutils.c :
     >
     > const char *EncodeLogical(int x, int w)
     > {
     >    static char buff[NB];
     >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
     >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
     >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
     >    buff[NB-1] = '\0';
     >    return buff;
     > }
     >
     > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    2.69    0.02    2.73
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.15    0.04    0.20
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.08    0.05    0.13
     > > L <- rep(NA, 10^7)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    0.11    0.00    0.11
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.16    0.06    0.22
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.09    0.03    0.12
     >
     > `as.character` of a logical vector that is all NA is fast enough.
     > It appears that the call to 'formatLogical' inside > the C function
     > 'StringFromLogical' does not introduce much    > slowdown.


     > I found that using string literal inside the C function 
'StringFromLogical', by replacing
     > EncodeLogical(x, w)
     > with
     > x ? "TRUE" : "FALSE"
     > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, and that makes sense: At this point when you
know you have a an R logical value there are only three
possibilities and no reason ever to warn about the conversion.

     > Alternatively,
or in addition !


     > "fast path" could be introduced in 'EncodeLogical', potentially also 
benefits format() in R.
     > For example, without replacing existing code, the following fragment 
could be inserted.
     >
     >    if(x == NA_LOGICAL) {if(w == R_print.na_width) return 
CHAR(R_print.na_string);}
     >    else if(x) {if(w == 4) return "TRUE";}
     >    else {if(w == 5) return "FALSE";}
     >
     > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster 
than as.character(L) .
     >
     > Precomputing or caching possible results of the C function 'StringFromLogical' allows 
as.character(L)

Re: [Rd] Check for protection (was: table() and as.character() performance for logical values)

2025-04-11 Thread Paul McQuesten
For a long-term horizon, would it help R developers to use a naming
convention?
Perhaps, varName_PROT, or the inverse varName_UNPROT?
Eventually, teach some linter about that?

On Fri, Apr 11, 2025 at 10:40 AM Duncan Murdoch 
wrote:

> On a tangent from the main topic of this thread:  sometimes (especially
> to non-experts) it's not obvious whether a variable is protected or not.
>
> I don't think there's any easy way to determine that, but perhaps there
> should be.  Would it be possible to add a run-time test you could call
> in C code (e.g. is_protected(x)) that would do the same search the
> garbage collector does in order to determine if a particular pointer is
> protected?
>
> This would be an expensive operation, similar in cost to actually doing
> a garbage collection.  You wouldn't want to do it routinely, but it
> would be really helpful in debugging.
>
> Duncan Murdoch
>
> On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote:
> >   On second thought, I wonder if the caching in my changed
> 'StringFromLogical' in my previous message is safe. While 'ans' in the C
> function 'coerceToString' is protected, its element is also protected. If
> the object corresponding to 'ans' is then no longer protected, is it
> possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical'
> to be garbage collected? If it is, I think of clearing the cache for each
> first filling. For example, by abusing 'warn' argument, the following is
> added to my changed 'StringFromLogical'.
> >
> >   if (*warn) TrueCh = FalseCh = NULL;
> >
> > Correspondingly, in 'coerceToString',
> >
> >   warn = i == 0;
> >
> > is inserted before
> >
> >   SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));
> >
> > for LGLSXP case.
> >
> > -
> > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler <
> maech...@stat.math.ethz.ch> wrote:
> >
> >
> >> Suharto Anggono Suharto Anggono via R-devel
> >>  on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:
> >
> >  > Chain of calls of C functions in coerce.c for
> as.character() in R:
> >
> >  > do_asatomic
> >  > ascommon
> >  > coerceVector
> >  > coerceToString
> >  > StringFromLogical (for each element)
> >
> >  > The definition of 'StringFromLogical' in coerce.c :
> >
> >  > Chain of calls of C functions in coerce.c for
> as.character() in R:
> >  >
> >  > do_asatomic
> >  > ascommon
> >  > coerceVector
> >  > coerceToString
> >  > StringFromLogical (for each element)
> >  >
> >  > The definition of 'StringFromLogical' in coerce.c :
> >  >
> >  > attribute_hidden SEXP StringFromLogical(int x, int *warn)
> >  > {
> >  >int w;
> >  >formatLogical(&x, 1, &w);
> >  >if (x == NA_LOGICAL) return NA_STRING;
> >  >else return mkChar(EncodeLogical(x, w));
> >  > }
> >  >
> >  > The definition of 'EncodeLogical' in printutils.c :
> >  >
> >  > const char *EncodeLogical(int x, int w)
> >  > {
> >  >static char buff[NB];
> >  >if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)),
> CHAR(R_print.na_string));
> >  >else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
> >  >else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
> >  >buff[NB-1] = '\0';
> >  >return buff;
> >  > }
> >  >
> >  > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
> >  > > system.time(as.character(L))
> >  >user  system elapsed
> >  >2.690.022.73
> >  > > system.time(c("FALSE", "TRUE")[L+1])
> >  >user  system elapsed
> >  >0.150.040.20
> >  > > system.time(c("FALSE", "TRUE")[L+1L])
> >  >user  system elapsed
> >  >0.080.050.13
> >  > > L <- rep(NA, 10^7)
> >  > > system.time(as.character(L))
> >  >user  system elapsed
> >  >0.110.000.11
> >  > > system.time(c("FALSE", "TRUE")[L+1])
> >  >user  system elapsed
> >  >0.160.060.22
> >  > > system.time(c("FALSE", "TRUE")[L+1L])
> >  >user  system elapsed
> >  >0.090.030.12
> >  >
> >  > `as.character` of a logical vector that is all NA is fast enough.
> >  > It appears that the call to 'formatLogical' inside > the C
> function
> >  > 'StringFromLogical' does not introduce much> slowdown.
> >
> >
> >  > I found that using string literal inside the C function
> 'StringFromLogical', by replacing
> >  > EncodeLogical(x, w)
> >  > with
> >  > x ? "TRUE" : "FALSE"
> >  > (and the call to 'formatLogical' is not needed anymore), make it
> faster.
> >
> > indeed! ... and we also notice that the 'w' argument is neither
> > needed anymore, and that makes sense: At this point when you
> > know you have a an R logical value there are only three
> > possibilities and no reason ever to warn 

Re: [Rd] table() and as.character() performance for logical values

2025-04-11 Thread Tomas Kalibera



On 4/11/25 16:23, Suharto Anggono Suharto Anggono via R-devel wrote:

  Alternative revision:

Added to my changed 'StringFromLogical':
#define CACHE 16
if (!(*warn & CACHE)) {TrueCh = FalseCh = NULL; *warn |= CACHE;}

No change to 'coerceToString' and 'coerceToSymbol'.

--
On Friday, 11 April 2025 at 08:02:58 pm GMT+7, Suharto Anggono Suharto Anggono 
 wrote:


Oh, with the abuse of 'warn' in my previous message, warning would be issued if 
the input 'v' of 'coerceToString' is a logical vector of length 1.

Revision:

Added to my changed 'StringFromLogical':
if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;}

'coerceToString': insert
if (i == 0) warn = 1;
for LGLSXP case or initialize 'warn' to 16

'coerceToSymbol': insert
warn = 1;
for LGLSXP case or initialize 'warn' to 16


Another way is following the approach of caching in ''StringFromInteger'.

--
On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono 
 wrote:


On second thought, I wonder if the caching in my changed 'StringFromLogical' in 
my previous message is safe. While 'ans' in the C function 'coerceToString' is 
protected, its element is also protected. If the object corresponding to 'ans' 
is then no longer protected, is it possible for the cached object 'TrueCh' or 
'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of 
clearing the cache for each first filling. For example, by abusing 'warn' 
argument, the following is added to my changed 'StringFromLogical'.


If this is the caching you had in mind:

    > attribute_hidden SEXP StringFromLogical(int x, int *warn)
    > {
    >    static SEXP TrueCh, FalseCh;
    >    if (x == NA_LOGICAL) return NA_STRING;
    >    else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE"));
    >    else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE"));

that is really a protection error. StringFromLogical() should make sure 
that TrueCh, FalseCh will be protected as long as recorded in the static 
field. PreserveObject() would be a natural function for this.


Best
Tomas


if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

warn = i == 0;

is inserted before

SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:



Suharto Anggono Suharto Anggono via R-devel
     on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:

     > Chain of calls of C functions in coerce.c for as.character() in 
R:

     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)

     > The definition of 'StringFromLogical' in coerce.c :

     > Chain of calls of C functions in coerce.c for as.character() in 
R:
     >
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)
     >
     > The definition of 'StringFromLogical' in coerce.c :
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    int w;
     >    formatLogical(&x, 1, &w);
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else return mkChar(EncodeLogical(x, w));
     > }
     >
     > The definition of 'EncodeLogical' in printutils.c :
     >
     > const char *EncodeLogical(int x, int w)
     > {
     >    static char buff[NB];
     >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), 
CHAR(R_print.na_string));
     >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE");
     >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
     >    buff[NB-1] = '\0';
     >    return buff;
     > }
     >
     > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    2.69    0.02    2.73
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.15    0.04    0.20
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.08    0.05    0.13
     > > L <- rep(NA, 10^7)
     > > system.time(as.character(L))
     >    user  system elapsed
     >    0.11    0.00    0.11
     > > system.time(c("FALSE", "TRUE")[L+1])
     >    user  system elapsed
     >    0.16    0.06    0.22
     > > system.time(c("FALSE", "TRUE")[L+1L])
     >    user  system elapsed
     >    0.09    0.03    0.12
     >
     > `as.character` of a logical vector that is all NA is fast enough.
     > It appears that the call to 'formatLogical' inside > the C function
     > 'StringFromLogical' does not introduce much    > slowdown.


     > I found that using string literal inside the C function 
'StringFromLogical', by replacing
     > EncodeLogical(x, w)
     > with
     > x ? "TRUE" : "FALSE"
     > (and the call to 'formatLogical' is not needed anymore), make it faster.

indeed! ... and we also notice that the 'w' argument is neither
needed anymore, 

Re: [Rd] Check for protection

2025-04-11 Thread Duncan Murdoch
That might help, but protecting things is a fairly cheap operation, so I 
don't know if people would bother with the naming convention.  It's just 
as easy to just protect things if you're not sure.


One way things can go wrong is when you think you protected something, 
but then the pointer changes and the new pointer is not protected. 
Maybe a linter could recognize that some code path assigned a new value 
to a variable without protecting it?  I guess it's easier to recognize 
that you made an assignment to varName_PROT without protecting it again 
than to look at the PROTECT calls, but it's not really that different.


Duncan Murdoch



On 2025-04-11 11:57 a.m., Paul McQuesten wrote:
For a long-term horizon, would it help R developers to use a naming 
convention?

Perhaps, varName_PROT, or the inverse varName_UNPROT?
Eventually, teach some linter about that?

On Fri, Apr 11, 2025 at 10:40 AM Duncan Murdoch 
mailto:murdoch.dun...@gmail.com>> wrote:


On a tangent from the main topic of this thread:  sometimes (especially
to non-experts) it's not obvious whether a variable is protected or not.

I don't think there's any easy way to determine that, but perhaps there
should be.  Would it be possible to add a run-time test you could call
in C code (e.g. is_protected(x)) that would do the same search the
garbage collector does in order to determine if a particular pointer is
protected?

This would be an expensive operation, similar in cost to actually doing
a garbage collection.  You wouldn't want to do it routinely, but it
would be really helpful in debugging.

Duncan Murdoch

On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel
wrote:
 >   On second thought, I wonder if the caching in my changed
'StringFromLogical' in my previous message is safe. While 'ans' in
the C function 'coerceToString' is protected, its element is also
protected. If the object corresponding to 'ans' is then no longer
protected, is it possible for the cached object 'TrueCh' or
'FalseCh' in 'StringFromLogical' to be garbage collected? If it is,
I think of clearing the cache for each first filling. For example,
by abusing 'warn' argument, the following is added to my changed
'StringFromLogical'.
 >
 >   if (*warn) TrueCh = FalseCh = NULL;
 >
 > Correspondingly, in 'coerceToString',
 >
 >   warn = i == 0;
 >
 > is inserted before
 >
 >   SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i),
&warn));
 >
 > for LGLSXP case.
 >
 > -
 > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler
mailto:maech...@stat.math.ethz.ch>> wrote:
 >
 >
 >> Suharto Anggono Suharto Anggono via R-devel
 >>      on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:
 >
 >      > Chain of calls of C functions in coerce.c for
as.character() in R:
 >
 >      > do_asatomic
 >      > ascommon
 >      > coerceVector
 >      > coerceToString
 >      > StringFromLogical (for each element)
 >
 >      > The definition of 'StringFromLogical' in coerce.c :
 >
 >      > Chain of calls of C functions in coerce.c for
as.character() in R:
 >      >
 >      > do_asatomic
 >      > ascommon
 >      > coerceVector
 >      > coerceToString
 >      > StringFromLogical (for each element)
 >      >
 >      > The definition of 'StringFromLogical' in coerce.c :
 >      >
 >      > attribute_hidden SEXP StringFromLogical(int x, int *warn)
 >      > {
 >      >    int w;
 >      >    formatLogical(&x, 1, &w);
 >      >    if (x == NA_LOGICAL) return NA_STRING;
 >      >    else return mkChar(EncodeLogical(x, w));
 >      > }
 >      >
 >      > The definition of 'EncodeLogical' in printutils.c :
 >      >
 >      > const char *EncodeLogical(int x, int w)
 >      > {
 >      >    static char buff[NB];
 >      >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w,
(NB-1)), CHAR(R_print.na_string));
 >      >    else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)),
"TRUE");
 >      >    else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE");
 >      >    buff[NB-1] = '\0';
 >      >    return buff;
 >      > }
 >      >
 >      > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE)
 >      > > system.time(as.character(L))
 >      >    user  system elapsed
 >      >    2.69    0.02    2.73
 >      > > system.time(c("FALSE", "TRUE")[L+1])
 >      >    user  system elapsed
 >      >    0.15    0.04    0.20
 >      > > system.time(c("FALSE", "TRUE")[L+1L])
 >      >    user  system elapsed
 >      >    0.08    0.05    0.13
 >      > > L <- rep(NA, 10^7)
 >      > > system.time(as.character(L))
 >      >    user  system elapsed
 >      

Re: [Rd] Check for protection

2025-04-11 Thread Tomas Kalibera

On 4/11/25 17:39, Duncan Murdoch wrote:
On a tangent from the main topic of this thread:  sometimes 
(especially to non-experts) it's not obvious whether a variable is 
protected or not.


I don't think there's any easy way to determine that, but perhaps 
there should be.  Would it be possible to add a run-time test you 
could call in C code (e.g. is_protected(x)) that would do the same 
search the garbage collector does in order to determine if a 
particular pointer is protected?


This would be an expensive operation, similar in cost to actually 
doing a garbage collection.  You wouldn't want to do it routinely, but 
it would be really helpful in debugging.


I've experimented with some things like that in the past and concluded 
they were not that useful.


Learning that a value is not protected at certain point in the program 
doesn't necessarily mean this is a bug - it depends whether that value 
will be exposed to a possible garbage collection. It is perfectly fine 
that an unprotected value is returned from a C function (and this is how 
it should be). It is fine when an unprotected value exists before it is 
passed to say SET_VECTOR_ELT().


So, right, one might ask if a specific value would be later exposed to a 
garbage collection unprotected (leaving to the tool when such collection 
would happen). But then, it may be ok, because when such a garbage 
collection happens, it would be clear the value cannot be used anymore. 
It only matters if such a value is then being used.


And then: a value may be protected by coincidence, by something that is 
not safe to rely on. Such as the example of the caching of a value in a 
global variable: when we ask whether it is protected, it may be that it 
happens to be protected by some inconsequential call on the stack, but 
we should not rely on that.


We have gc torture with the strict barrier checking, which allows to 
detect use of a value that has been in fact garbage collected. Also, one 
can use the strict barrier checking and manually place calls to gc at 
certain points of interest (though, the danger is one places it where it 
actually cannot happen). These runtime solutions can't find all possible 
problems nor would they tell one what should actually be protected where.


And we have rchk, a static analysis tool, which can direct one close to 
where the problems occur, and works based on the rules how protection 
should be done. It is faster, but, it will have false alarms.


The rules for how to protect objects in Writing R Extensions should be 
quite clear and easy to follow, and certainly it is fine and appropriate 
to ask for help on this list given a small C example. I think the bigger 
problem is when one knows the rules, tries to follow them, but simply 
forgets/makes a mistake at some point. And for that, we have the 
checking tools mentioned. UBSAN also sometimes can spot some of these 
problems.


Best
Tomas



Duncan Murdoch

On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel 
wrote:
  On second thought, I wonder if the caching in my changed 
'StringFromLogical' in my previous message is safe. While 'ans' in 
the C function 'coerceToString' is protected, its element is also 
protected. If the object corresponding to 'ans' is then no longer 
protected, is it possible for the cached object 'TrueCh' or 'FalseCh' 
in 'StringFromLogical' to be garbage collected? If it is, I think of 
clearing the cache for each first filling. For example, by abusing 
'warn' argument, the following is added to my changed 
'StringFromLogical'.


  if (*warn) TrueCh = FalseCh = NULL;

Correspondingly, in 'coerceToString',

  warn = i == 0;

is inserted before

  SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn));

for LGLSXP case.

-
On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler 
 wrote:




Suharto Anggono Suharto Anggono via R-devel
     on Thu, 10 Apr 2025 07:53:04 + (UTC) writes:


     > Chain of calls of C functions in coerce.c for 
as.character() in R:


     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)

     > The definition of 'StringFromLogical' in coerce.c :

     > Chain of calls of C functions in coerce.c for 
as.character() in R:

     >
     > do_asatomic
     > ascommon
     > coerceVector
     > coerceToString
     > StringFromLogical (for each element)
     >
     > The definition of 'StringFromLogical' in coerce.c :
     >
     > attribute_hidden SEXP StringFromLogical(int x, int *warn)
     > {
     >    int w;
     >    formatLogical(&x, 1, &w);
     >    if (x == NA_LOGICAL) return NA_STRING;
     >    else return mkChar(EncodeLogical(x, w));
     > }
     >
     > The definition of 'EncodeLogical' in printutils.c :
     >
     > const char *EncodeLogical(int x, int w)
     > {
     >    static char buff[NB];
     >    if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, 
(NB-1)), CHAR(R_prin