[Rd] R 4.5.0 is released
The build system rolled up R-4.5.0.tar.gz and .xz (codename "How About a Twenty-Six") this morning. This is a major release with a number of new features, API changes, and bug fixes. The list below details the changes in this release. You can get the source code from https://cran.r-project.org/src/base/R-4/R-4.5.0.tar.gz https://cran.r-project.org/src/base/R-4/R-4.5.0.tar.xz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course. For the R Core Team, Peter Dalgaard These are the checksums (md5 and SHA-256) for the freshly created files, in case you wish to check that they are uncorrupted: MD5 (AUTHORS) = 0ba932825aefae5566dc44822916b266 MD5 (build-dist.log) = 31a495e5d716faf011803973f977adbd MD5 (COPYING) = eb723b61539feef013de476e68b5c50a MD5 (COPYING.LIB) = a6f89e2100d9b6cdffcea4f398e37343 MD5 (FAQ) = cf1644761934816fb349f15d7956732e MD5 (INSTALL) = 7893f754308ca31f1ccf62055090ad7b MD5 (NEWS) = fda3e3633537ffb9c02f1278a7288db9 MD5 (NEWS.0) = bfcd7c147251b5474d96848c6f57e5a8 MD5 (NEWS.1) = f8466e418dec6b958b4ce484a13f9a9d MD5 (NEWS.2) = 05e4a57b645e651ba13019c3cf5c495f MD5 (NEWS.3) = 082abfc2fdc36912075e78b92fb2941e MD5 (R-latest.tar.gz) = 2342b31a604631f8b130033d8582d547 MD5 (R-latest.tar.xz) = d379331fbe3f9bf19d3e53f547317114 MD5 (README) = e8e5ee38544d34409177cd479025fe66 MD5 (RESOURCES) = 5949c86e75c813f8f6ebc420aae46881 MD5 (THANKS) = 61d146aa6a2cf5999295b2fb340991c1 MD5 (VERSION-INFO.dcf) = 5ca3dfa954644258bfd0f83319c0377c MD5 (R-4/R-4.5.0.tar.gz) = 2342b31a604631f8b130033d8582d547 MD5 (R-4/R-4.5.0.tar.xz) = d379331fbe3f9bf19d3e53f547317114 4cc9dcdfa46a2e2cff45c27df8f3a9f851ec97b44b8647ab8a9fbf844f37937f AUTHORS 8f85b62440f992f6dd8e05e5bda84995b1a8d42c7d4d2a05927c6ae6e2fbad59 build-dist.log e6d6a009505e345fe949e1310334fcb0747f28dae2856759de102ab66b722cb4 COPYING 6095e9ffa777dd22839f7801aa845b31c9ed07f3d6bf8a26dc5d2dec8ccc0ef3 COPYING.LIB ec1eb421f6810ffb53162b9dfb371190de30ab490855ddfa49fc0bf39c7f11cf FAQ f87461be6cbaecc4dce44ac58e5bd52364b0491ccdadaf846cb9b452e9550f31 INSTALL d03a80d9ab25ce50e0ec7923385729d9dfb7d4fab1f33041e62c143cbff5a4f9 NEWS 4e21b62f515b749f80997063fceab626d7258c7d650e81a662ba8e0640f12f62 NEWS.0 602f3a40ef759c7b2a6c485a33dc674af34249644ac5fb53b21283d4e12e808d NEWS.1 7babb6d82a4479b2c3803f7dbfaab63125b0f0d1b6bb40b1389d3af65eaf83aa NEWS.2 eb473efd365822e7ae64eb0f86028ea019815fdd273fe7daa9c6fe5e28fd2737 NEWS.3 3b33ea113e0d1ddc9793874d5949cec2c7386f66e4abfb1cef9aec22846c3ce1 R-latest.tar.gz 101766c3aefffcbacde39c8a0b9c3accf50a563955f66ff2f7b321d6bf07da8d R-latest.tar.xz f5aa875c23185cbfc3a50739d7295b0caba2cf0e38ba082850be338cc9541154 README 5e7ddf7349ada12c8142c42bac955835efd1768978cb476b61a3b53255442b24 RESOURCES 1d5064c86b6813865a033763f43212064c0a67ef05f5af13b13c4feb08264a33 THANKS d3ced974014dc3da6ef3cf126d67e427172e9f3f77f801483f91acc881e2de38 VERSION-INFO.dcf 3b33ea113e0d1ddc9793874d5949cec2c7386f66e4abfb1cef9aec22846c3ce1 R-4/R-4.5.0.tar.gz 101766c3aefffcbacde39c8a0b9c3accf50a563955f66ff2f7b321d6bf07da8d R-4/R-4.5.0.tar.xz This is the relevant part of the NEWS file CHANGES IN R 4.5.0: NEW FEATURES: • as.integer(rl) and hence as.raw(rl) now work for a list of raw(1) elements, as proposed by Michael Chirico's PR#18696. • graphics' grid() gains optional argument nintLog. • New functions check_package_urls() and check_package_dois() in package tools for checking URLs and DOIs in package sources. • New head() and tail() methods for class "ts" time series, proposed by Spencer Graves on R-devel. • New qr.influence() function, a “bare bones” interface to the lm.influence() leave-one-out diagnostics computations; wished for in PR#18739. • Package citation() results auto-generated from the package metadata now also provide package DOIs for CRAN and Bioconductor packages. • New function grepv() identical to grep() except for the default value = TRUE. • methods(:::) now does report methods when neither the generic nor the methods have been exported. • pdf() gains an author argument to set the corresponding metadata field, and logical arguments timestamp and producer to optionally omit the respective metadata. (Thanks to Edzer Pebesma.) • grDevices::glyphInfo() gains a rot argument to allow per-glyph rotation. (Thanks to Daniel Sabanes Bove.) • Package tools now exports functions CRAN_current_db(), CRAN_aliases_db(), CRAN_rdxrefs_db(), CRAN_archive_db(), and CRAN_authors_db(). • Package tools now exports functions R() and parse_URI_reference(). • Package tools now exports functions base_aliases_db() and base_rdxrefs_db(). • It is now possible to set the background color for row and column names in the data editor on Windows (Rgui). • Rterm on Windows now accepts input lines of unlimited length. • file
Re: [Rd] table() and as.character() performance for logical values
On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: > Suharto Anggono Suharto Anggono via R-devel > on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore, and that makes sense: At this point when you know you have a an R logical value there are only three possibilities and no reason ever to warn about the conversion. > Alternatively, or in addition ! > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. > For example, without replacing existing code, the following fragment could be inserted. > > if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);} > else if(x) {if(w == 4) return "TRUE";} > else {if(w == 5) return "FALSE";} > > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) . > > Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > static SEXP TrueCh, FalseCh; > if (x == NA_LOGICAL) return NA_STRING; > else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE")); > else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE")); > } Indeed, and something along this line (storing the other two constant strings) was also my thought when seeing the mkChar(x ? "TRUE" : "FALSE) you implicitly proposed above. I'm looking into applying both speedups; thank you very much, Suharto! Martin -- Martin Maechler ETH Zurich and R Core team [[alternative HTML version deleted]] ___
Re: [Rd] table() and as.character() performance for logical values
Oh, with the abuse of 'warn' in my previous message, warning would be issued if the input 'v' of 'coerceToString' is a logical vector of length 1. Revision: Added to my changed 'StringFromLogical': if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;} 'coerceToString': insert if (i == 0) warn = 1; for LGLSXP case or initialize 'warn' to 16 'coerceToSymbol': insert warn = 1; for LGLSXP case or initialize 'warn' to 16 Another way is following the approach of caching in ''StringFromInteger'. -- On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono wrote: On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: > Suharto Anggono Suharto Anggono via R-devel > on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore, and that makes sense: At this point when you know you have a an R logical value there are only three possibilities and no reason ever to warn about the conversion. > Alternatively, or in addition ! > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. > For example, without replacing existing code, the following fragment could be inserted. > > if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);} > else if(x) {if(w == 4) return "TRUE";} > else {if(w == 5) return "FALSE";} > > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) . > > Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L) to be as fast as c("FALSE", "TRUE")[L+1L] in R. For example, 'StringFromLogical' could be changed to > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > static
Re: [Rd] table() and as.character() performance for logical values
Alternative revision: Added to my changed 'StringFromLogical': #define CACHE 16 if (!(*warn & CACHE)) {TrueCh = FalseCh = NULL; *warn |= CACHE;} No change to 'coerceToString' and 'coerceToSymbol'. -- On Friday, 11 April 2025 at 08:02:58 pm GMT+7, Suharto Anggono Suharto Anggono wrote: Oh, with the abuse of 'warn' in my previous message, warning would be issued if the input 'v' of 'coerceToString' is a logical vector of length 1. Revision: Added to my changed 'StringFromLogical': if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;} 'coerceToString': insert if (i == 0) warn = 1; for LGLSXP case or initialize 'warn' to 16 'coerceToSymbol': insert warn = 1; for LGLSXP case or initialize 'warn' to 16 Another way is following the approach of caching in ''StringFromInteger'. -- On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono wrote: On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: > Suharto Anggono Suharto Anggono via R-devel > on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore, and that makes sense: At this point when you know you have a an R logical value there are only three possibilities and no reason ever to warn about the conversion. > Alternatively, or in addition ! > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. > For example, without replacing existing code, the following fragment could be inserted. > > if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);} > else if(x) {if(w == 4) return "TRUE";} > else {if(w == 5) return "FALSE";} > > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) . >
[Rd] Check for protection (was: table() and as.character() performance for logical values)
On a tangent from the main topic of this thread: sometimes (especially to non-experts) it's not obvious whether a variable is protected or not. I don't think there's any easy way to determine that, but perhaps there should be. Would it be possible to add a run-time test you could call in C code (e.g. is_protected(x)) that would do the same search the garbage collector does in order to determine if a particular pointer is protected? This would be an expensive operation, similar in cost to actually doing a garbage collection. You wouldn't want to do it routinely, but it would be really helpful in debugging. Duncan Murdoch On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote: On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: Suharto Anggono Suharto Anggono via R-devel on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore, and that makes sense: At this point when you know you have a an R logical value there are only three possibilities and no reason ever to warn about the conversion. > Alternatively, or in addition ! > "fast path" could be introduced in 'EncodeLogical', potentially also benefits format() in R. > For example, without replacing existing code, the following fragment could be inserted. > > if(x == NA_LOGICAL) {if(w == R_print.na_width) return CHAR(R_print.na_string);} > else if(x) {if(w == 4) return "TRUE";} > else {if(w == 5) return "FALSE";} > > However, with either of them, c("FALSE", "TRUE")[L+1L] is still faster than as.character(L) . > > Precomputing or caching possible results of the C function 'StringFromLogical' allows as.character(L)
Re: [Rd] Check for protection (was: table() and as.character() performance for logical values)
For a long-term horizon, would it help R developers to use a naming convention? Perhaps, varName_PROT, or the inverse varName_UNPROT? Eventually, teach some linter about that? On Fri, Apr 11, 2025 at 10:40 AM Duncan Murdoch wrote: > On a tangent from the main topic of this thread: sometimes (especially > to non-experts) it's not obvious whether a variable is protected or not. > > I don't think there's any easy way to determine that, but perhaps there > should be. Would it be possible to add a run-time test you could call > in C code (e.g. is_protected(x)) that would do the same search the > garbage collector does in order to determine if a particular pointer is > protected? > > This would be an expensive operation, similar in cost to actually doing > a garbage collection. You wouldn't want to do it routinely, but it > would be really helpful in debugging. > > Duncan Murdoch > > On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote: > > On second thought, I wonder if the caching in my changed > 'StringFromLogical' in my previous message is safe. While 'ans' in the C > function 'coerceToString' is protected, its element is also protected. If > the object corresponding to 'ans' is then no longer protected, is it > possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' > to be garbage collected? If it is, I think of clearing the cache for each > first filling. For example, by abusing 'warn' argument, the following is > added to my changed 'StringFromLogical'. > > > > if (*warn) TrueCh = FalseCh = NULL; > > > > Correspondingly, in 'coerceToString', > > > > warn = i == 0; > > > > is inserted before > > > > SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); > > > > for LGLSXP case. > > > > - > > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler < > maech...@stat.math.ethz.ch> wrote: > > > > > >> Suharto Anggono Suharto Anggono via R-devel > >> on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > > > > > Chain of calls of C functions in coerce.c for > as.character() in R: > > > > > do_asatomic > > > ascommon > > > coerceVector > > > coerceToString > > > StringFromLogical (for each element) > > > > > The definition of 'StringFromLogical' in coerce.c : > > > > > Chain of calls of C functions in coerce.c for > as.character() in R: > > > > > > do_asatomic > > > ascommon > > > coerceVector > > > coerceToString > > > StringFromLogical (for each element) > > > > > > The definition of 'StringFromLogical' in coerce.c : > > > > > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > > > { > > >int w; > > >formatLogical(&x, 1, &w); > > >if (x == NA_LOGICAL) return NA_STRING; > > >else return mkChar(EncodeLogical(x, w)); > > > } > > > > > > The definition of 'EncodeLogical' in printutils.c : > > > > > > const char *EncodeLogical(int x, int w) > > > { > > >static char buff[NB]; > > >if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), > CHAR(R_print.na_string)); > > >else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > > >else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > > >buff[NB-1] = '\0'; > > >return buff; > > > } > > > > > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > > > system.time(as.character(L)) > > >user system elapsed > > >2.690.022.73 > > > > system.time(c("FALSE", "TRUE")[L+1]) > > >user system elapsed > > >0.150.040.20 > > > > system.time(c("FALSE", "TRUE")[L+1L]) > > >user system elapsed > > >0.080.050.13 > > > > L <- rep(NA, 10^7) > > > > system.time(as.character(L)) > > >user system elapsed > > >0.110.000.11 > > > > system.time(c("FALSE", "TRUE")[L+1]) > > >user system elapsed > > >0.160.060.22 > > > > system.time(c("FALSE", "TRUE")[L+1L]) > > >user system elapsed > > >0.090.030.12 > > > > > > `as.character` of a logical vector that is all NA is fast enough. > > > It appears that the call to 'formatLogical' inside > the C > function > > > 'StringFromLogical' does not introduce much> slowdown. > > > > > > > I found that using string literal inside the C function > 'StringFromLogical', by replacing > > > EncodeLogical(x, w) > > > with > > > x ? "TRUE" : "FALSE" > > > (and the call to 'formatLogical' is not needed anymore), make it > faster. > > > > indeed! ... and we also notice that the 'w' argument is neither > > needed anymore, and that makes sense: At this point when you > > know you have a an R logical value there are only three > > possibilities and no reason ever to warn
Re: [Rd] table() and as.character() performance for logical values
On 4/11/25 16:23, Suharto Anggono Suharto Anggono via R-devel wrote: Alternative revision: Added to my changed 'StringFromLogical': #define CACHE 16 if (!(*warn & CACHE)) {TrueCh = FalseCh = NULL; *warn |= CACHE;} No change to 'coerceToString' and 'coerceToSymbol'. -- On Friday, 11 April 2025 at 08:02:58 pm GMT+7, Suharto Anggono Suharto Anggono wrote: Oh, with the abuse of 'warn' in my previous message, warning would be issued if the input 'v' of 'coerceToString' is a logical vector of length 1. Revision: Added to my changed 'StringFromLogical': if (*warn) {TrueCh = FalseCh = NULL; *warn = 0;} 'coerceToString': insert if (i == 0) warn = 1; for LGLSXP case or initialize 'warn' to 16 'coerceToSymbol': insert warn = 1; for LGLSXP case or initialize 'warn' to 16 Another way is following the approach of caching in ''StringFromInteger'. -- On Friday, 11 April 2025 at 05:05:30 pm GMT+7, Suharto Anggono Suharto Anggono wrote: On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. If this is the caching you had in mind: > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > static SEXP TrueCh, FalseCh; > if (x == NA_LOGICAL) return NA_STRING; > else if (x) return TrueCh ? TrueCh : (TrueCh = mkChar("TRUE")); > else return FalseCh ? FalseCh : (FalseCh = mkChar("FALSE")); that is really a protection error. StringFromLogical() should make sure that TrueCh, FalseCh will be protected as long as recorded in the static field. PreserveObject() would be a natural function for this. Best Tomas if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: Suharto Anggono Suharto Anggono via R-devel on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > buff[NB-1] = '\0'; > return buff; > } > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > system.time(as.character(L)) > user system elapsed > 2.69 0.02 2.73 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.15 0.04 0.20 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.08 0.05 0.13 > > L <- rep(NA, 10^7) > > system.time(as.character(L)) > user system elapsed > 0.11 0.00 0.11 > > system.time(c("FALSE", "TRUE")[L+1]) > user system elapsed > 0.16 0.06 0.22 > > system.time(c("FALSE", "TRUE")[L+1L]) > user system elapsed > 0.09 0.03 0.12 > > `as.character` of a logical vector that is all NA is fast enough. > It appears that the call to 'formatLogical' inside > the C function > 'StringFromLogical' does not introduce much > slowdown. > I found that using string literal inside the C function 'StringFromLogical', by replacing > EncodeLogical(x, w) > with > x ? "TRUE" : "FALSE" > (and the call to 'formatLogical' is not needed anymore), make it faster. indeed! ... and we also notice that the 'w' argument is neither needed anymore,
Re: [Rd] Check for protection
That might help, but protecting things is a fairly cheap operation, so I don't know if people would bother with the naming convention. It's just as easy to just protect things if you're not sure. One way things can go wrong is when you think you protected something, but then the pointer changes and the new pointer is not protected. Maybe a linter could recognize that some code path assigned a new value to a variable without protecting it? I guess it's easier to recognize that you made an assignment to varName_PROT without protecting it again than to look at the PROTECT calls, but it's not really that different. Duncan Murdoch On 2025-04-11 11:57 a.m., Paul McQuesten wrote: For a long-term horizon, would it help R developers to use a naming convention? Perhaps, varName_PROT, or the inverse varName_UNPROT? Eventually, teach some linter about that? On Fri, Apr 11, 2025 at 10:40 AM Duncan Murdoch mailto:murdoch.dun...@gmail.com>> wrote: On a tangent from the main topic of this thread: sometimes (especially to non-experts) it's not obvious whether a variable is protected or not. I don't think there's any easy way to determine that, but perhaps there should be. Would it be possible to add a run-time test you could call in C code (e.g. is_protected(x)) that would do the same search the garbage collector does in order to determine if a particular pointer is protected? This would be an expensive operation, similar in cost to actually doing a garbage collection. You wouldn't want to do it routinely, but it would be really helpful in debugging. Duncan Murdoch On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote: > On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. > > if (*warn) TrueCh = FalseCh = NULL; > > Correspondingly, in 'coerceToString', > > warn = i == 0; > > is inserted before > > SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); > > for LGLSXP case. > > - > On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler mailto:maech...@stat.math.ethz.ch>> wrote: > > >> Suharto Anggono Suharto Anggono via R-devel >> on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > > > Chain of calls of C functions in coerce.c for as.character() in R: > > > do_asatomic > > ascommon > > coerceVector > > coerceToString > > StringFromLogical (for each element) > > > The definition of 'StringFromLogical' in coerce.c : > > > Chain of calls of C functions in coerce.c for as.character() in R: > > > > do_asatomic > > ascommon > > coerceVector > > coerceToString > > StringFromLogical (for each element) > > > > The definition of 'StringFromLogical' in coerce.c : > > > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > > { > > int w; > > formatLogical(&x, 1, &w); > > if (x == NA_LOGICAL) return NA_STRING; > > else return mkChar(EncodeLogical(x, w)); > > } > > > > The definition of 'EncodeLogical' in printutils.c : > > > > const char *EncodeLogical(int x, int w) > > { > > static char buff[NB]; > > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_print.na_string)); > > else if(x) snprintf(buff, NB, "%*s", min(w, (NB-1)), "TRUE"); > > else snprintf(buff, NB, "%*s", min(w, (NB-1)), "FALSE"); > > buff[NB-1] = '\0'; > > return buff; > > } > > > > > L <- sample(c(TRUE, FALSE), 10^7, replace = TRUE) > > > system.time(as.character(L)) > > user system elapsed > > 2.69 0.02 2.73 > > > system.time(c("FALSE", "TRUE")[L+1]) > > user system elapsed > > 0.15 0.04 0.20 > > > system.time(c("FALSE", "TRUE")[L+1L]) > > user system elapsed > > 0.08 0.05 0.13 > > > L <- rep(NA, 10^7) > > > system.time(as.character(L)) > > user system elapsed >
Re: [Rd] Check for protection
On 4/11/25 17:39, Duncan Murdoch wrote: On a tangent from the main topic of this thread: sometimes (especially to non-experts) it's not obvious whether a variable is protected or not. I don't think there's any easy way to determine that, but perhaps there should be. Would it be possible to add a run-time test you could call in C code (e.g. is_protected(x)) that would do the same search the garbage collector does in order to determine if a particular pointer is protected? This would be an expensive operation, similar in cost to actually doing a garbage collection. You wouldn't want to do it routinely, but it would be really helpful in debugging. I've experimented with some things like that in the past and concluded they were not that useful. Learning that a value is not protected at certain point in the program doesn't necessarily mean this is a bug - it depends whether that value will be exposed to a possible garbage collection. It is perfectly fine that an unprotected value is returned from a C function (and this is how it should be). It is fine when an unprotected value exists before it is passed to say SET_VECTOR_ELT(). So, right, one might ask if a specific value would be later exposed to a garbage collection unprotected (leaving to the tool when such collection would happen). But then, it may be ok, because when such a garbage collection happens, it would be clear the value cannot be used anymore. It only matters if such a value is then being used. And then: a value may be protected by coincidence, by something that is not safe to rely on. Such as the example of the caching of a value in a global variable: when we ask whether it is protected, it may be that it happens to be protected by some inconsequential call on the stack, but we should not rely on that. We have gc torture with the strict barrier checking, which allows to detect use of a value that has been in fact garbage collected. Also, one can use the strict barrier checking and manually place calls to gc at certain points of interest (though, the danger is one places it where it actually cannot happen). These runtime solutions can't find all possible problems nor would they tell one what should actually be protected where. And we have rchk, a static analysis tool, which can direct one close to where the problems occur, and works based on the rules how protection should be done. It is faster, but, it will have false alarms. The rules for how to protect objects in Writing R Extensions should be quite clear and easy to follow, and certainly it is fine and appropriate to ask for help on this list given a small C example. I think the bigger problem is when one knows the rules, tries to follow them, but simply forgets/makes a mistake at some point. And for that, we have the checking tools mentioned. UBSAN also sometimes can spot some of these problems. Best Tomas Duncan Murdoch On 2025-04-11 6:05 a.m., Suharto Anggono Suharto Anggono via R-devel wrote: On second thought, I wonder if the caching in my changed 'StringFromLogical' in my previous message is safe. While 'ans' in the C function 'coerceToString' is protected, its element is also protected. If the object corresponding to 'ans' is then no longer protected, is it possible for the cached object 'TrueCh' or 'FalseCh' in 'StringFromLogical' to be garbage collected? If it is, I think of clearing the cache for each first filling. For example, by abusing 'warn' argument, the following is added to my changed 'StringFromLogical'. if (*warn) TrueCh = FalseCh = NULL; Correspondingly, in 'coerceToString', warn = i == 0; is inserted before SET_STRING_ELT(ans, i, StringFromLogical(LOGICAL_ELT(v, i), &warn)); for LGLSXP case. - On Thursday, 10 April 2025 at 10:54:03 pm GMT+7, Martin Maechler wrote: Suharto Anggono Suharto Anggono via R-devel on Thu, 10 Apr 2025 07:53:04 + (UTC) writes: > Chain of calls of C functions in coerce.c for as.character() in R: > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > The definition of 'StringFromLogical' in coerce.c : > Chain of calls of C functions in coerce.c for as.character() in R: > > do_asatomic > ascommon > coerceVector > coerceToString > StringFromLogical (for each element) > > The definition of 'StringFromLogical' in coerce.c : > > attribute_hidden SEXP StringFromLogical(int x, int *warn) > { > int w; > formatLogical(&x, 1, &w); > if (x == NA_LOGICAL) return NA_STRING; > else return mkChar(EncodeLogical(x, w)); > } > > The definition of 'EncodeLogical' in printutils.c : > > const char *EncodeLogical(int x, int w) > { > static char buff[NB]; > if(x == NA_LOGICAL) snprintf(buff, NB, "%*s", min(w, (NB-1)), CHAR(R_prin