from:"Ivan Krylov"

Re: [Rd] [R] Open a file which name contains a tilde

2019-06-06 Thread Ivan Krylov

On Wed, 5 Jun 2019 18:07:15 +0200
Frank Schwidom  wrote:

> +> path.expand("a ~ b")  
> [1] "a /home/user b"

> How can I switch off any file crippling activity?

It doesn't seem to be possible if readline is enabled and works
correctly.

Calls to path.expand [1] end up [2] in R_ExpandFileName [3], which
calls R_ExpandFileName_readline [4], which uses libreadline function
tilde_expand [5]. tilde_expand seems to be designed to expand '~'
anywhere in the string it is handed, i.e. operate on whole command
lines, not file paths.

I am taking the liberty of Cc-ing R-devel in case this can be
considered a bug.

-- 
Best regards,
Ivan

[1]
https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/names.c#L807

[2]
https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/main/platform.c#L1915

[3]
https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/unix/sys-unix.c#L147

[4]
https://github.com/wch/r-source/blob/12d1d2d232d84aa355e48b81180a0e2c6f2f/src/unix/sys-std.c#L494

[5]
https://git.savannah.gnu.org/cgit/readline.git/tree/tilde.c?h=devel#n187

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] tempdir() containing spaces breaks installing source packages

2019-12-13 Thread Ivan Krylov

Hello everyone!

Temp paths are used in system2() calls without shQuote() because
they are assumed not to contain spaces. On Windows,
GetShortPathName() is used to try to ensure that.

Unfortunately, sometimes GetShortPathName() silently fails to return a
8.3 file path and gives a full path instead (8.3 file names might be
disabled on newer Windows 10 installations, or there may be another
reason). This has been spotted in the wild [*]. When %USERPROFILE%
contains spaces, this results in tempdir() also containing spaces and
prevents the user from being able to install source packages.

As of ,

 - src/library/utils/R/packages2.R line 839 contains an unquoted
   temporary file path (fil) passed to system2(), which results in it
   being split and R CMD INSTALL not being able to find the package
   file. In other invocations of R CMD INSTALL in the same file, the
   path is properly quoted.

 - src/library/tools/R/check.R line 125 contains an unquoted temporary
   file path passed to system2, which results in Rterm.exe -f not being
   able to find the RtmpXX\Rin file, causing the attempt to
   run tools:::makeLazyLoading(...) to fail.

I can report these two problems (thanks to Martin Maechler for the
Bugzilla account and the advice) and attach the patches required to fix
them, but there might be more. The bug report [**] is somewhat relevant
here (though changing the default behaviour of system2() is obviously
not the right solution as it would break existing code).

Is there anything I should consider before creating the PR as
described above?

-- 
Best regards,
Ivan

[*] https://stat.ethz.ch/pipermail/r-help/2019-December/465075.html

[**] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16127

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] system2 doesn't quote stdin on unix, unlike stdout, stderr & input and on Windows

2019-12-15 Thread Ivan Krylov

Hi again!

While investigating the bug report [*] I found out that on unix, system2
does not quote its `stdin` argument while preparing the command line to launch.

It does shQuote the `stdout` and `stderr` arguments, and also the `f <-
tmpfile()` variable (which is used if `input` argument is provided),
which seems to set a precedent. On Windows, stdin, stdout, and stderr are 
handled independently of the shell, so it also just works without the use of 
shQuote by the caller.

Have people been relying on system2 not quoting the `stdin` argument, but
quoting `stdout` and `stderr`? For what it's worth, neither R_runR in
src/library/tools/R/check.R, nor .system_with_capture in
src/library/tools/R/utils.R (the only callers of system2(..., stdin = ...)),
nor their callers seem to be shQuote'ing the `stdin` argument.

Nor the rare system2(..., stdin = ...) callers (or their callers) on CRAN
seem to be quoting the `stdin` argument (I did find one exception [**]), it 
usually being "" or tmpfile() passed across a few function calls.

Given the considerations above, would the following patch be a good idea?

Index: src/library/base/R/unix/system.unix.R
===
--- src/library/base/R/unix/system.unix.R   (revision 77566)
+++ src/library/base/R/unix/system.unix.R   (working copy)
@@ -102,7 +102,7 @@
 writeLines(input, f)
 ## here 'command' is a single command, unlike system()
 command <- paste(command, "<", shQuote(f))
-} else if (nzchar(stdin)) command <- paste(command, "<", stdin)
+} else if (nzchar(stdin)) command <- paste(command, "<", shQuote(stdin))
 if(!wait && !intern) command <- paste(command, "&")
 .Internal(system(command, intern, timeout))
 }

-- 
Best regards,
Ivan

[*] https://bugs.r-project.org/bugzilla/show_bug.cgi?id=17673

[**] https://github.com/search?q=org%3Acran+system2%28+stdin&type=Code
The search is probably computationally expensive and thus might only be 
available to GitHub account holders. The results are:

 - BatchJobs: R/WorkerLinux.R; stdin is "" by default, filenames generated by
   cfBrewTemplate aren't quoted
 - batchtools: callers of runOSCommand actually quote the stdin argument!
   
https://github.com/mllg/batchtools/commit/4a5818d70c82c8842c0b8bded224dbb423b79f33
 - nat: R/cmtk.R, R/xformpoints.R; stdin is tempfile(...)
 - credentials: R/credential-api.R; stdin is passed as-is, tempfile() or
   "" in default arguments
 - runjags: R/setup.jags.jagsfile.R; stdin is constant "script.cmd"
 - m2r: R/m2.R; stdin is constant ""
 - scriptexec: R/scriptexec.R; stdin is passed as-is or default value retained
 - BioInstaller: inst/extdata/shiny/global_var.R; stdin is "" by default and
   passed as-is, callers of the sql2sqlite function don't quote the sql.file
   argument
 - annovarR: R/build.R: same sql2sqlite function as above

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] source definition for R_NilValue/return from TYPEOF(R_NilValue)

2019-12-16 Thread Ivan Krylov

On Sun, 15 Dec 2019 10:47:11 +
Dan Kortschak  wrote:

> I'd like to see the source definition of R_NilValue, but after
> fair bit of searching I cannot find an obviously location for this.
> 
> Would someone please point me in the right direction?

As far as I understand, the assignment to R_NilValue happens in
src/main/memory.c:

https://github.com/wch/r-source/blob/776929704cb4f9398f52805f48f2c93582ec3d38/src/main/memory.c#L2186

(see the definition of GET_FREE_NODE(s) in the same file)

Besides declaring R_NulValue as "extern" and assignments in void
InitMemory(), there doesn't seem to be anything else.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] pipe(): input to, and output from, a single process

2020-03-16 Thread Ivan Krylov

On Fri, 13 Mar 2020 20:26:43 +0300
Greg Minshall  wrote:

> my sense from pipe and looking at the sources (sys-unix.c) is that is
> not possible.  is that true?  are there any thoughts of providing
> such a facility?

Pipes (including those created by popen(3), which R pipe() uses
internally) are uni-directional data channels. While it could be
possible to open two pipes for both stdin and stdout of the child
process, doing so correctly is complicated because of differences in
buffering provided by the runtime: when stdin/stdout is not a terminal,
buffering mode may be set to block-oriented instead of line-oriented,
resulting in both parent and child being dead-locked, waiting to fill
the buffer instead of returning from the blocking call after the first
newline. (Hence the -l flag to sed mentioned by Gábor Csárdi, which
avoids this problem for sed).

Programs designed to first read stdin until end-of-file, then process
the input and print results on the stdout are usually safe to use in
this way, but others may be not. Software specifically designed to
control other software (e.g. Expect [*]) gets around this limitation by
running the child processes inside pseudo-terminals and/or running in
event-driven manner, being ready to service the child process whether
it wants to read its stdin or write to stdout.

Since sed has its -l flag, it should be possible to safely drive it
line-by-line with the help of processx, but not via pipe().

-- 
Best regards,
Ivan

[*] https://core.tcl-lang.org/expect/index

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] ASAN

2020-04-08 Thread Ivan Krylov

On Tue, 07 Apr 2020 12:20:11 -0500
"Therneau, Terry M., Ph.D. via R-devel"  wrote:

> It's not quite clear to me what to do next.

A quick check: do you have anything in LD_PRELOAD environment variable?
Third-party software like gtk3-nocsd may be using it for its own
purposes, which breaks ASan. When I build software with -fsanitize=...,
I have to explicitly unset LD_PRELOAD to get it to work.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] detect ->

2020-04-15 Thread Ivan Krylov

On Wed, 15 Apr 2020 10:41:41 +0300
Adrian Dușa  wrote:

> Now, if I could find a way to define "=>" as a standalone operator,
> and convince the R parser to bypass that error, it would solve
> everything. If this is not possible, I am back to detecting "->".

Just to confirm, are you avoiding custom %operators% because of two
extra percent characters one would have to type per operator?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] How to find detritis rejected by "R CMD check" on Debian?

2020-04-17 Thread Ivan Krylov

This should probably have been addressed to R-pkg-devel, not Rd.

On Fri, 17 Apr 2020 09:14:44 -0500
Spencer Graves  wrote:

> Found the following files/directories:
>    ‘fdaMatlabPath.m’

This is not the "detritus in the temp directory"; the message is related
to the previous line in the log:

>> * checking for non-standard things in the check directory ... NOTE

I searched the repo for the file name in question and got a few hits
[1]. R CMD check runs the \examples{}, including those in
man/fdaMatlabPath.Rd. The function from R/fdaMatlabPath.R, called from
the examples, writes a file in the current directory [2], which
happens to be the check directory during R CMD check. In order to
conform to the CRAN policy, the function should receive the target file
name as a parameter instead, and the example should pass a file path
somewhere in tempdir().

-- 
Best regards,
Ivan

[1] https://github.com/JamesRamsay5/fda/search?q=fdaMatlabPath.m

[2]
https://github.com/JamesRamsay5/fda/blob/ca077f87f69efcfc434eeea39f4fff6136fcfaeb/R/fdaMatlabPath.R#L33

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.2 built with Intel Composer 19.0-19.1.1, error in "make check" on CentOS 7.7

2020-06-25 Thread Ivan Krylov

On Wed, 24 Jun 2020 18:56:06 +
Ryan Novosielski  wrote:

On my machine, getOption('expressions') is 5000 and the example from
the test correctly stops with length(traceback()) == 2500. (And the
simpler example of f <- function() f(); f() stops with
length(traceback()) == 5000).

> Traceback:

<...>

> 2718: foo()

This (traceback() being more than 2500 entries long) seems to imply
that the stack size check is somehow skipped. (Perhaps optimized away?)
The evaluation depth limit is checked in src/main/eval.c, line 705 [*],
followed by stack size check. Can you attach the debugger and take a
look at the values of R_EvalDepth and R_Expressions while executing the
text? What about R_CStackStart and R_CStackLimit? What is the stack
size limit (ulimit -s?) on the machine running this test?

-- 
Best regards,
Ivan

[*]
https://github.com/wch/r-source/blob/8d7ac4699fba640d030703fa010b66bf26054cbd/src/main/eval.c#L705

pgp_I_hPcZLJm.pgp
Description: Цифровая подпись OpenPGP
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Error in substring: invalid multibyte string

2020-06-27 Thread Ivan Krylov

On Fri, 26 Jun 2020 15:57:06 -0700
Toby Hocking  wrote:

>invalid multibyte string at 'gel-A<6b>iyoshi'

>https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html

The server says that the text is UTF-8:

curl -sI \
 https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html | \
 grep Content-Type
# Content-Type: text/html; charset=UTF-8

But it's not, at least not all of it. If you ask readLines to mark
the text as Latin-1, you get Jens Oehlschlägel-Akiyoshi without the
mojibake and invalid multi-byte characters:

x <- readLines(
 'https://stat.ethz.ch/pipermail/r-devel/1999-November/author.html',
 encoding = 'latin1'
)[28]
substr(x, 1, 100)
# [1] "Jens Oehlschlägel-Akiyoshi"

The behaviour we observe when encoding = 'latin1' is not specified
results from returned lines having "unknown" encoding. The substr()
implementation tries to interpret such strings according to multi-byte C
locale rules (using mbrtowc(3)). On my system (yours too, probably, if
it's GNU/Linux or macOS), the multi-byte C locale encoding is UTF-8,
and this Latin-1 string does not result in valid code points when
decoded as UTF-8.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] Re: New pipe operator

2020-12-04 Thread Ivan Krylov

On Fri, 4 Dec 2020 20:11:17 -0600 (CST)
luke-tier...@uiowa.edu wrote:

> We did try a number of variations; the code is in the R-syntax branch.
> At the root of that branch are two .md files with some notes as of
> around useR20.

Thanks for the information!

Can I make a suggestion? If the variation of the pipe that allows
a symbol on the RHS to be interpreted as the name of a function to call
does get chosen, may it also allow fully-qualified symbols?

Index: src/main/gram.y
===
--- src/main/gram.y (revision 79567)
+++ src/main/gram.y (working copy)
@@ -1242,7 +1242,11 @@
 if (GenerateCode) {
/* allow for symbols or lambda expressions */
if (TYPEOF(rhs) == SYMSXP ||
-   TYPEOF(rhs) == LANGSXP && CAR(rhs) == R_FunctionSymbol)
+   TYPEOF(rhs) == LANGSXP && (
+   CAR(rhs) == R_FunctionSymbol ||
+   CAR(rhs) == R_DoubleColonSymbol ||
+   CAR(rhs) == R_TripleColonSymbol
+   ))
return lang2(rhs, lhs);

if (TYPEOF(rhs) != LANGSXP)

Or is this feature creep?

-- 
Best regards,
Ivan

P.S. This lambda function is waving at us: \ (O,o) `/`

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] From .Fortran to .Call?

2020-12-19 Thread Ivan Krylov

On Sat, 19 Dec 2020 17:04:59 +
"Koenker, Roger W"  wrote:

> There are comments in various places, including R-extensions §5.4
> suggesting that .Fortran is (nearly) deprecated and hinting that use
> of .Call is more efficient and now preferred for packages.

My understanding of §5.4 and 5.5 is that explicit routine registration
is what's important for efficiency, and your package already does that
(i.e. calls R_registerRoutines()). The only two things left to add
would be types (REALSXP/INTSXP/...) and styles (R_ARG_IN,
R_ARG_OUT/...) of the arguments of each subroutine.

Switching to .Call makes sense if you want to change the interface of
your native subroutines to accept arbitrary heavily structured SEXPs
(and switching to .External could be useful if you wanted to play with
evaluation of the arguments).

-- 
Best regards,
Ivan

P.S. I think that this question is better asked in the
r-package-de...@r-project.org mailing list instead of R-devel.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] On Windows, need external access to the BLAS, LAPACK and LINPACK linear algebra functions included in R

2020-12-20 Thread Ivan Krylov

Hello Andre Mikulec!

On Mon, 21 Dec 2020 00:20:35 +
Andre Mikulec  wrote:

> ComputerUser@COMPUTER MINGW64 /c/APPLICATIONS/r-source-R-4-0-branch
>
> checking for BSD networking... configure: error: BSD networking
> functions are required
> 
> What do I need to do next?

MinGW64 seems to be not sufficiently Unix-alike for ./configure to
conclude that it has sockets API (WinSock header names and some type
definitions are different from its expectations). You seem to be on
Windows. What happens if you follow the Windows workflow from the next
section?

https://cran.r-project.org/doc/manuals/R-admin.html#Windows-standalone

Apologies if I'm missing some context, but it seems to me that the
Windows workflow is what's needed here.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] WRE still hints at the "styles" field in R_FortranMethodDef

2020-12-23 Thread Ivan Krylov

The field has been removed in R 3.4.0 after being deprecated in R
3.3.3. Indeed, the paragraph describing it has been commented out
(lines 10144-10151 in R-exts.texi), but another paragraph above (lines
10101-10110) still mentions the field as if it exists. I would like to
suggest some rewording along the lines of the attached patch in order
to avoid confusing readers like me.

-- 
Best regards,
Ivan
Index: doc/manual/R-exts.texi
===
--- doc/manual/R-exts.texi  (revision 79675)
+++ doc/manual/R-exts.texi  (working copy)
@@ -10099,9 +10099,9 @@
 arguments which tells @R{} not to check the actual number passed.

 Routines for use with the @code{.C} and @code{.Fortran} interfaces are
-described with similar data structures, but which have two additional
-fields for describing the type and ``style'' of each argument.  Each of
-these can be omitted. However, if specified, each should be an array
+described with similar data structures, which have one additional
+field for describing the type of each argument. This field is optional,
+however, if specified, it should be an array
 with the same number of elements as the number of parameters for the
 routine.  The types array should contain the @code{SEXP} types
 describing the expected type of the argument. (Technically, the elements
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] r79833 src/library/tools/R/Rd2HTML.R minor typo

2021-01-16 Thread Ivan Krylov

On line 105, "&\\hellip;" should probably be "…":

Index: Rd2HTML.R
===
--- Rd2HTML.R   (revision 79833)
+++ Rd2HTML.R   (working copy)
@@ -102,7 +102,7 @@
 ## http://htmlhelp.com/reference/html40/entities/symbols.html
 if(inEqn) {
 x <- 
psub("(Alpha|Beta|Gamma|Delta|Epsilon|Zeta|Eta|Theta|Iota|Kappa|Lambda|Mu|Nu|Xi|Omicron|Pi|Rho|Sigma|Tau|Upsilon|Phi|Chi|Psi|Omega|alpha|beta|gamma|delta|epsilon|zeta|eta|theta|iota|kappa|lambda|mu|nu|xi|omicron|pi|rho|sigma|tau|upsilon|phi|chi|psi|omega|le|ge|sum|prod)",
 "&\\1;",
x)
-x <- psub("(dots|ldots)", "&\\hellip;", x)
+x <- psub("(dots|ldots)", "…", x)
 x <- fsub("\\infty", "∞", x)
 x <- fsub("\\sqrt", "√", x)
 }

The backslash is ignored by gsub(), so no actual bug happens as a
result of this.

Further checking with

w <- makeCodeWalker(
 call = function(e,w) {
  if (
   grepl('^[gpf]?sub$', as.character(e[[1]])) &&
   grepl('\\', as.character(e[[3]]), fixed=TRUE)
  ) print(e) else lapply(e, walkCode, w)
 }, leaf = function(.,..) invisible()
)
for (f in list.files(
 'R-devel/src/library/tools/R',
 full = TRUE, pattern = '\\.R$', ignore.case=TRUE
)) for (e in parse(f)) invisible(walkCode(e, w))

doesn't reveal any similar typos.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R.sh and argument escaping

2021-03-16 Thread Ivan Krylov

Hello R-devel!

The following sequence of commands results in an error message on a
POSIX system:

tab="`echo -ne "\t"`"
LC_ALL=C Rscript -e " $tab 1"
# ARGUMENT '~+~1' __ignored__

Tabs can sneak into the -e argument from indented multi-line arguments
in shell scripts:

Rscript -e '
foo()
bar()
...
'

R.sh does a good job of escaping spaces and newlines, but since shells
are also supposed to split on a tab [*], it's a good idea to escape
tabs too:

Index: src/scripts/R.sh.in
===
--- src/scripts/R.sh.in (revision 80090)
+++ src/scripts/R.sh.in (working copy)
@@ -192,7 +192,7 @@
 -e)
   if test -n "`echo ${2} | ${SED} 's/^-.*//'`"; then
a=`(echo "${2}" && echo) | ${SED} -e 's/ /~+~/g' | \
-  ${SED} -e :a -e N -e '$!ba' -e 's/\n/~n~/g' -e 's/~n~$//g'`
+  ${SED} -e :a -e N -e '$!ba' -e 's/\n/~n~/g' -e 's/~n~$//g' -e 
's/\t/~t~/g'`
 shift
   else
error "option '${1}' requires a non-empty argument"
Index: src/unix/system.c
===
--- src/unix/system.c   (revision 80090)
+++ src/unix/system.c   (working copy)
@@ -170,6 +170,9 @@
} else if(*q == '~' && *(q+1) == 'n' && *(q+2) == '~') {
q += 2;
*p++ = '\n';
+   } else if(*q == '~' && *(q+1) == 't' && *(q+2) == '~') {
+   q += 2;
+   *p++ = '\t';
} else *p++ = *q;
 }
 return p;

I have verified that with the patch above, Rscript -e " $tab 1" no
longer fails.

While we're at it, perhaps it could be a good idea to replace the magic
number 1 with a the size of the character array above it:

Index: src/unix/system.c
===
--- src/unix/system.c   (revision 80090)
+++ src/unix/system.c   (working copy)
@@ -429,7 +432,7 @@
} else if(!strcmp(*av, "-e")) {
ac--; av++;
Rp->R_Interactive = FALSE;
-   if(strlen(cmdlines) + strlen(*av) + 2 <= 1) {
+   if(strlen(cmdlines) + strlen(*av) + 2 <= sizeof(cmdlines)) {
char *p = cmdlines+strlen(cmdlines);
p = unescape_arg(p, *av);
*p++ = '\n'; *p = '\0';

It might also be a good idea to make it possible to represent the escape
sequences themselves in the unescaped stream in a fully reversible
transformation ('~' <-> '~~~', ' ' <-> '~+~', '\n' <-> '~n~',
'\t' <-> '~t~'), making it possible to round-trip character sequences
like '~+~' through the escaping and unescaping process (thankfully,
'~+~' is not frequently needed in R programs), though expressing that
as a sed command is beyond me. Right now, Rscript -e '"~+~"' doesn't
print "~+~".

Perhaps the bigger question to ask is whether this escaping is
unavoidable. Is it documented? Since the args variable is only appended
(not prepended), it is likely possible to rewrite the
'while test -n "${1}"; do' loop in terms of 'set -- "$@" ...', which is
POSIX-compatible and doesn't require any escaping:

set -- "${@}" dummy # append one argument to skip it later

for arg in "${@}"; do # it's safe to modify $@ in the for loop [**]
  # TODO: on first iteration only, empty the $@ and don't check $prev_arg
  case "${prev_arg}" in
# ...
-g|--gui)
  if test -n "`echo "${arg}" | ${SED} 's/^-.*//'`"; then
gui="${arg}"
set -- "${@}" "${prev_arg}" "${arg}"
  else
error "option '${prev_arg}' requires an argument"
  fi
  ;;
# ...
-e)
  if ! test -n "`echo "${arg}" | ${SED} 's/^-.*//'`"; then
error "option '${prev_arg}' requires a non-empty argument"
  fi
  set -- "${@}" -e "${arg}"
  ;;
# ...
  esac
  prev_arg="${arg}"
  # no shift needed
done

# Later: use "${@}" instead of ${args}

Or is it documented behaviour that arguments following an empty
argument are not escaped by the shell script but are passed to
"${R_HOME}/bin/exec${R_ARCH}/R"?

LC_ALL=C R -q -e " $tab 1"
# ARGUMENT '~+~1' __ignored__
# 
# >
# >
# >
LC_ALL=C R '' -q -e " $tab 1"
# ARGUMENT '' __ignored__
# 
# > 1
# [1] 1
# >
# >


-- 
Best regards,
Ivan

[*]
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_06_05

[**]
"First, the list of words following in shall be expanded to generate a
list of items..."
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_09_04_03

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-22 Thread Ivan Krylov

On Sat, 20 Mar 2021 11:51:41 -0500
Dirk Eddelbuettel  wrote:

> R plots however all have one each. Needless to say I may also have
> more than one plot device open...  Would anyone know how we can force
> these to aggregate under just one?

Grouping seems to be achieved by setting the window_group component of
the WM_HINTS property of the window:

https://www.x.org/releases/X11R7.6/doc/xorg-docs/specs/ICCCM/icccm.html#wm_hints_property

ICCCM goes on to say that "The window group leader may be a window that
exists only for that purpose <...> [not] mapped either by the client or
by the window manager".

Is that enough of a direction? I have only ever written X11 code using
xcb, not Xlib, but I could try to help further if needed.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-22 Thread Ivan Krylov

On Mon, 22 Mar 2021 15:42:04 -0400
Duncan Murdoch  wrote:

> I'd be happy to add something to rgl if you could point me to 
> an example

On Mon, 22 Mar 2021 14:54:43 -0500
Dirk Eddelbuettel  wrote:

> Just like Duncan I see now WM_HINTS yet, so maybe by
> just giving them we can improve?

The surrounding code and

proved to be enough of an example. The following patch makes it
possible to group x11() windows on my PC with Xfce running:

---8<---
Index: src/modules/X11/devX11.c
===
--- src/modules/X11/devX11.c(revision 80104)
+++ src/modules/X11/devX11.c(working copy)
@@ -105,7 +105,7 @@
 static Display *display;   /* Display */
 static char dspname[101]="";
 static int screen; /* Screen */
-static Window rootwin; /* Root Window */
+static Window rootwin, group_leader;   /* Root Window, Group leader */
 static Visual *visual; /* Visual */
 static int depth;  /* Pixmap depth */
 static int Vclass; /* Visual class */
@@ -1617,6 +1617,16 @@
 PropModeReplace,
 (const unsigned char*) rlogo_icon, 2 + 99*77);
 
+   /* set the window group leader */
+   XWMHints * hints;
+   hints = XAllocWMHints();
+   if (hints) {
+   hints->window_group = group_leader;
+   hints->flags |= WindowGroupHint;
+   XSetWMHints(display, xd->window, hints);
+   XFree(hints);
+   }
+
/* set up protocols so that window manager sends */
/* me an event when user "destroys" window */
_XA_WM_PROTOCOLS = XInternAtom(display, "WM_PROTOCOLS", 0);
@@ -2109,6 +2119,7 @@
 if (numX11Devices == 0)  {
int fd = ConnectionNumber(display);
/* Free Resources Here */
+   XDestroyWindow(display, group_leader);
while (nfonts--)
  R_XFreeFont(display, fontcache[nfonts].font);
nfonts = 0;
@@ -3133,6 +3144,9 @@
 #endif
 screen = DefaultScreen(display);
 rootwin = DefaultRootWindow(display);
+group_leader = XCreateSimpleWindow( /* never mapped or visible */
+   display, rootwin, 0, 0, 1, 1, 0, 0, 0
+);
 depth = DefaultDepth(display, screen);
 visual = DefaultVisual(display, screen);
 colormap = DefaultColormap(display, screen);
---8<---

Some very limited testing didn't seem to uncover any problems.

-- 
Best regards,
Ivan
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-23 Thread Ivan Krylov

On Mon, 22 Mar 2021 16:57:48 -0500
Dirk Eddelbuettel  wrote:

> Do you want to send a proper patch to bugzilla?

Would be glad to, especially if we manage to solve that problem you
uncovered while I was asleep.

On Mon, 22 Mar 2021 22:23:47 -0500
Dirk Eddelbuettel  wrote:

> Close, close, close but no cigar yet: For a given R process, x11()
> windows group for a that process. But we often run multiple R
> processes.  Have you seen anything for grouping under the
> "program" (in some sense) but not the concrete process from it?

Do windows from different Emacs processes group together the way you
want them to group? What other applications group together for you
despite running from different processes? Do they have the same window
id # of group leader in `xprop WM_HINTS`? I checked Firefox, but its
windows all seem to have the same _NET_WM_PID.

I decided to copy the way GVim sets its group leader ID (because I know
the windows are different processes _and_ that they group in Xfce) and
spent a while chasing this red herring before realising that (1) on my
PC, different x11() windows are still grouped together, even from
different R processes, even without the patch (I never used the "group
windows" option in xfce4-panel before) and (2) different GVim windows
actually have different group leader XIDs in their WM_HINTS properties.
Oops.

Apparently Xfce uses libwnck [*] which groups windows by WM_CLASS in
addition to WM_HINTS (as far as understand the code).

Here is what GNOME Shell does [**] besides looking at
WM_HINTS.window_group:

 - looks up the window's WM_CLASS in .desktop files known to it
 - looks up the window's _NET_WM_PID among running applications (?)
 - looks for an XDG startup notification matching the window
 - checks other things not likely applicable to R, such as sandbox IDs
   and GApplication IDs

Adding StartupWMClass=R_x11 to R.desktop (not part of R sources, but
part of the .deb package, I believe) should help GNOME Shell match all
x11() windows to a single application without any changes to devX11.c,
but I don't have GNOME installed to check it.

Alternatively, we can also add a _NET_WM_PID property to x11() windows
(in the hope that GNOME Shell matches the PIDs to the same binary), but
then we'd have to add the WM_CLIENT_MACHINE property too [***], which
is way more hacky than I would prefer it to be:

---8<---
Index: src/modules/X11/devX11.c
===
--- src/modules/X11/devX11.c(revision 80104)
+++ src/modules/X11/devX11.c(working copy)
@@ -52,6 +52,8 @@
 #endif
 #include 

+#include  /* for uname -> WM_CLIENT_MACHINE -> _NET_WM_PID */
+#include  /* getpid -> _NET_WM_PID */

 #define R_USE_PROTOTYPES 1
 #include 
@@ -105,7 +107,7 @@
 static Display *display;   /* Display */
 static char dspname[101]="";
 static int screen; /* Screen */
-static Window rootwin; /* Root Window */
+static Window rootwin, group_leader;   /* Root Window */
 static Visual *visual; /* Visual */
 static int depth;  /* Pixmap depth */
 static int Vclass; /* Visual class */
@@ -1617,6 +1619,39 @@
 PropModeReplace,
 (const unsigned char*) rlogo_icon, 2 + 99*77);

+   /* set the window group leader */
+   XWMHints * hints;
+   hints = XAllocWMHints();
+   if (hints) {
+   hints->window_group = group_leader;
+   hints->flags |= WindowGroupHint;
+   XSetWMHints(display, xd->window, hints);
+   XFree(hints);
+   }
+
+   /* Provide WM_CLIENT_MACHINE to set a valid _NET_WM_PID */
+   struct utsname unm;
+   if (uname(&unm)) goto no_wm_pid;
+   char * nodename = &unm.nodename[0];
+   XTextProperty hostname = {0}; /* initialise the value pointer */
+   if (Success != XmbTextListToTextProperty(
+   display, &nodename, 1, XStringStyle, &hostname
+   )) {
+   if (hostname.value) XFree(hostname.value);
+   goto no_wm_pid;
+   }
+   XSetWMClientMachine(display, xd->window, &hostname);
+   XFree(hostname.value);
+
+   /* set _NET_WM_PID */
+   uint32_t mypid = (uint32_t)getpid(); /* must be CARDINAL(32) */
+XChangeProperty(display, xd->window,
+XInternAtom(display, "_NET_WM_PID", False),
+XInternAtom(display, "CARDINAL", False), 32,
+PropModeReplace,
+(const unsigned char*) &mypid, 1);
+   no_wm_pid:
+
/* set up protocols so that window manager sends */
/* me an event when user "destroys" window */

Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-23 Thread Ivan Krylov

В Tue, 23 Mar 2021 08:58:49 -0500
Dirk Eddelbuettel  пишет:

> I still ship /usr/share/icons/hicolor/48x48/apps/rlogo_icon.png which
> is from the 2012 patch, and I vaguely recall the .desktop file being
> unhappy without it. Re-creating a 48x48 from the svg may do.

I think you could even link /usr/share/R/doc/html/Rlogo.svg
into /usr/share/icons/hicolor/scalable/apps/rlogo_icon.svg to let GNOME
and others draw an arbitrarily high-resolution icon if the display
permits that, but a 48x48 bitmap is still required:

https://specifications.freedesktop.org/icon-theme-spec/icon-theme-spec-latest.html#install_icons

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible x11 window manager window aggregation under one icon?

2021-03-23 Thread Ivan Krylov

On Tue, 23 Mar 2021 11:41:39 -0400
Duncan Murdoch  wrote:

> It would probably be nice  to have rgl windows and other R graphics
> windows in the same group, but I don't see a way for rgl to know the
> group_leader that R is using (and it's probably not worth adding this
> to the API to be able to request it).
> 
> Am I missing an easier solution?

Do you envision any problems stemming from setting the same WM_CLASS
("r_x11", "R_x11") for rgl windows as used by x11() windows? I think
that most DEs are able to group windows by WM_CLASS in addition to
WM_HINTS.window_group (Xfce does that by default, GNOME turned out to
just need a hint in the .application file).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [Solved] Possible x11 window manager window aggregation under one icon?

2021-04-14 Thread Ivan Krylov



On Fri, 26 Mar 2021 14:49:56 +0100
Martin Maechler  wrote:

> I concluded I liked the first [patch] because it would achieve
> what's considered "uniformly better" in the sense that it makes
> R graphics behave like "all other" desktop applications *and* it
> would do so for all possible window manager scheme without any
> need of some desktop setting (which a typical user would not
> know about, nor know that s?he should/could change).

Martin, here is some information on how X11 (and rgl) windows are
grouped on different platforms, depending on the presence of the first
patch and the .desktop file:

 - GNOME [1] needs a .desktop file to group windows by their WM_CLASS
   and otherwise groups windows by their WM_HINTS.window_group only,
   which makes different configurations possible:
   * no window_group, no .desktop file: no grouping
   * window_group patch, no .desktop file: plot windows grouped
 per-process, x11 and rgl separately
   * StartupWMClass in .desktop file: x11 and rgl windows from all R
 processes grouped together
 - the XQuartz window manager apparently uses WM_HINTS.window_group but
   not WM_CLASS to group windows [2] (patch is only relevant for
   rgl, .desktop files aren't relevant)
 - Xfce groups windows by WM_CLASS [3], so adding WM_HINTS.window_group
   or changing the .desktop file doesn't group them any more
 - KDE seems to follow the .desktop file (if StartupWMClass is
   present) and group by WM_CLASS otherwise [4], but I don't have it
   installed to check. On the other hand, KDE might be ignoring
   WM_HINTS.window_group, judging by the absence of "WM2GroupLeader" in
   the source code of Plasma Workspace. This implies no changes in
   window grouping from patching window_group or changing the .desktop
   file. [5]

To summarise, the patch to x11() leads to visible window grouping in
GNOME-based environments. My impression is that most environments group
windows by WM_CLASS (which had been happening without the patch), but
there are far too many window managers to check that impression.

-- 
Best regards,
Ivan

[1] Including Cinammon:
https://github.com/linuxmint/cinnamon/blob/72732da43e971b83d926af56998c83ee8d000394/src/cinnamon-window-tracker.c#L482

Likely Unity forks too.

[2] Judging by
https://stat.ethz.ch/pipermail/r-devel/2021-March/080571.html

[3] With minor inconsistencies regarding x11 vs rgl windows between
Win+Tab "switch between application windows" and the window buttons on
the task bar, but let's not focus on that.

[4]
https://invent.kde.org/plasma/plasma-workspace/-/blob/7c49a0ae/libtaskmanager/xwindowtasksmodel.cpp#L439

[5] Strictly speaking, StartupWMClass= in the .desktop file lets the
desktop environment associate the plot windows with the R launcher
button if it's also present on the task bar.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, error in "make check" on CentOS 7.7-7.9

2021-04-15 Thread Ivan Krylov

Hello Ryan,

Sorry for not responding right away -- it took me a while to remember
what I meant back then :)

On Fri, 9 Apr 2021 23:34:13 +
Ryan Novosielski  wrote:

> Program received signal SIGSEGV, Segmentation fault.
> bcEval.R (body=0x3eb7748, rho=0x3f72770, useCache=TRUE)
> at /scratch/novosirj/install-files/R-4.0.5/src/main/eval.c:6478
> 6478  codebase = pc = BCCODE(body);
> (gdb) print R_EvalDepth
> $1 = 2729
> (gdb) print R_Expressions
> $2 = 5000
> (gdb) print R_CStackStart
> $3 = 140737488207872
> (gdb) print R_CStackLimit
> $4 = 7969177

> [novosirj@amarel-test2 bin]$ ulimit -s
> 8192

So far I can only say that you get the same ulimit -s of 8192 and
R_CStackLimit of 8192 * 1024 * .95 that I get on my current machine
(and now it's the stack size limit that's reached first, not expression
depth limit). Let's try to get more information.

When you get the SIGSEGV,

1) What does print $_siginfo._sifields._sigfault show? Try printing at
least $_siginfo if the first command gives you an error.

2) When you get the crash, is the body argument accessible? What does
print *body show?

3) What are the addresses of the local variables when the crash
happens? Specifically, what do the following commands show:

print &codebase
print &pc
print R_CStackDir * (R_CStackStart - (uintptr_t)&codebase)
print R_CStackDir * (R_CStackStart - (uintptr_t)&pc)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, error in "make check" on CentOS 7.7-7.9

2021-04-16 Thread Ivan Krylov

On Thu, 15 Apr 2021 22:46:56 +
Ryan Novosielski  wrote:

> (gdb) print $_siginfo._sifields._sigfault
> $1 = {
>  si_addr = 0x7f7fecf8, _addr_lsb = 0,
>  _addr_bnd = {_lower = 0x9215f829ff58, _upper = 0x7f7fecf8}
> }

> (gdb) print R_CStackDir * (R_CStackStart - (uintptr_t)&codebase)
> $5 = 18446744073701307232

Okay, this is clearly a stack overflow: the faulting address is close
to addresses of other stack variables, and the stack usage, calculated
manually as 140737488207872 - 0x7f7ff360, is 8244392, which is
above the (7969177), but the value that gdb gives you looks really
strange. I could only get that value when I calculated
-1 * (140737488207872 - 0x7f7ff360) and reinterpreted it as
unsigned.

What is the value of R_CStackDir at the moment of crash? Could it have
somehow became -1 despite the stack growing down?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, error in "make check" on CentOS 7.7-7.9

2021-04-16 Thread Ivan Krylov

On Fri, 16 Apr 2021 18:06:51 +
Ryan Novosielski  wrote:

> Well it definitely somehow could have, since it did

Wow! This is strange, but at least it should be easy to fix. Try editing
the config.site file in the root of the R source directory and setting
R_C_STACK_DIRECTION=down there. (Make sure to uncomment the line!)
Re-run .../configure, make sure that src/include/config.h contains
#define C_STACK_DIRECTION 1 and build R again.

Does the crash go away? If it does, are you interested in finding out
what went wrong in the configure test for stack direction?

There are lines in m4/R.m4 that I'm not sure I understand:

if test ${?} = 1; then
  r_cv_cstack_direction=down
elif test ${?} = 1; then
  r_cv_cstack_direction=up
fi

How can elif branch have the same condition as the if branch? Shouldn't
the second test had been for $? = 255? On the other hand, if the elif
branch was never taken, how did R_CStackDir become -1?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, error in "make check" on CentOS 7.7-7.9

2021-04-17 Thread Ivan Krylov

On Fri, 16 Apr 2021 18:39:04 +
Ryan Novosielski  wrote:

> I guess there’s probably some mode of m4 I could run against that and
> see if there’s any indication?

Here's a shell script that should be doing the same thing that
R's .../configure does, but a bit more verbose:

---8<---
#!/bin/sh
cat > conftest1.c <
uintptr_t dummy_ii(void)
{
int ii;

/* This is intended to return a local address. We could just return
   (uintptr_t) &ii, but doing it indirectly through ii_addr avoids
   a compiler warning (-Wno-return-local-addr would do as well).
*/
volatile uintptr_t ii_addr = (uintptr_t) ⅈ
return ii_addr;
}
EOF
cat > conftest.c <
#include 
extern uintptr_t dummy_ii(void);

typedef uintptr_t (*dptr_type)(void);
volatile dptr_type dummy_ii_ptr;

int main(int ac, char **av)
{
int i;
dummy_ii_ptr = dummy_ii;

/* call dummy_ii via a volatile function pointer to prevent
   inlinining in case the tests are accidentally built with
   LTO */
uintptr_t ii = dummy_ii_ptr();
#ifndef EXACTLY_AS_R_CONFIGURE
printf(
"main = %p, sub = %p, main %c sub, ret = %d\n",
&i, (void*)ii,
((uintptr_t)&i > ii) ? '>' : ((uintptr_t)&i < ii) ? '<' : '=',
((uintptr_t)&i > ii) ? 1 : -1
);
#endif
/* 1 is downwards */
return ((uintptr_t)&i > ii) ? 1 : -1;
}
EOF
echo "${CC:=cc} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS}" \
"-o conftest conftest.c conftest1.c"
${CC} ${CPPFLAGS} ${CFLAGS} ${LDFLAGS} ${MAIN_LDFLAGS} -o conftest \
conftest.c conftest1.c || exit $?
echo ./conftest
./conftest
ret=$?
echo "./conftest exited with $ret"
if test ${ret} = 1; then
  echo r_cv_cstack_direction=down
elif test ${ret} = 1; then
  echo r_cv_cstack_direction=up
fi
exit 0
---8<---

Please run it similarly to the way you run .../configure:

export CC=icc
export CFLAGS="-O3 -ipo -qopenmp -axAVX,CORE-AVX2,CORE-AVX512"
sh .../runme.sh

Does it give the right answer, that is, r_cv_cstack_direction=down?
Does the answer change if you add -DEXACTLY_AS_R_CONFIGURE to CFLAGS?
If the answer is always "down" and you have reused the build directory
(keeping the config.site file between .../configure runs), this is going
to be hard to track down. If you manage to get the "up" answer, it
should be possible to tweak the test until ICC doesn't optimise it to
the point of confusing the stack growth direction.

Either way, I think that the elif branch in the R_STACK_DIRECTION test
should be testing for $? = 255, not 1. I'm almost convinced that the
behaviour would be incorrect on platforms where the test exits with -1.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, errors in "make check" on CentOS 7.7-7.9

2021-04-17 Thread Ivan Krylov

On Sat, 17 Apr 2021 00:13:42 +
Ryan Novosielski  wrote:

> reg-tests-1d.Rout.fail:
> https://rutgersconnect-my.sharepoint.com/:u:/g/personal/novosirj_oarc_rutgers_edu/EYK2JHWQ1-9Dvu6gK9lrkRIBkEyA4QqkeH7C4gmbAYyBBQ?e=lfGJL7
> reg-packages.Rout.fail:
> https://rutgersconnect-my.sharepoint.com/:u:/g/personal/novosirj_oarc_rutgers_edu/EazCjI6fRnNKhQASFPeySBUBENVpCqCljFg3-sokBZJnAw?e=8lwywe

Sorry, these links seem to be asking me to log in. Could you try a
"paste bin" service like https://paste.debian.net/?

> These maybe seem like they’re OK, and if I don’t have pdf2latex,
> they’re expected?

I've never tried to build R without TeX Live installed. Is there
anything about LaTeX-less installation in 'R Installation and
Administration'?

> For the regression tests, these seem like some of them are actual
> problems, but maybe someone here knows if some are expected?

Anything that crashes (well, raises an R error, not crashes the R
process) inside tools::assertError(...) is meant to do that. In fact,
you get an error if it doesn't crash:

tools::assertError(stop('I will crash'))
tools::assertError(stop('I will crash'), verbose = TRUE)
# Asserted error: I will crash
tools::assertError(2+2)
# Error: Failed to get error in evaluating 2 + 2

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R 4.0.1-4.0.5 built with Intel Composer 19.0-19.1.1, errors in "make check" on CentOS 7.7-7.9

2021-04-19 Thread Ivan Krylov

On Sat, 17 Apr 2021 19:21:12 +
Ryan Novosielski  wrote:

> I tried actual Pastebin and it told me that the
> reg-tests-1d.Rout.fail was offensive. https://paste.debian.net says
> it’s too large. Let’s give that another whack; both are here:
> http://www.rnovosielski.ftml.net/r-project/

This link works fine for me, thanks!

For some reason, this build of R fails a test for LAPACK accidentally
ignoring NAs when computing Frobenius norms of matrices:

https://github.com/wch/r-source/commit/db10ee5237b1f9db83a693903c4293650a43244a
https://github.com/wch/r-source/commit/2f546cf778ae3bae8ef2e82c613658c72098a528

Does the following program print NaN on the last line of its output on
your machine?

program testdlange
use ieee_arithmetic, only: ieee_value, ieee_quiet_nan

intrinsic transpose

interface
double precision function dlange (NORM, M, N, A, LDA, WORK)
character :: NORM
integer :: M
integer :: N
double precision, dimension( lda, * ) :: A
integer :: LDA
double precision, dimension( * ) :: WORK
end function
end interface

double precision, dimension(2,3) :: A
double precision :: norm

A = reshape([0, 0, 0, 0, 0, -1], shape(A))
print '(3f5.1)', transpose(A)

norm = dlange('F', size(A, 1), size(A, 2), A, size(A, 1), [0D0])
print *, norm

A(1,2) = ieee_value(A(1,2), ieee_quiet_nan)
print '(3f5.1)', transpose(A)

norm = dlange('F', size(A, 1), size(A, 2), A, size(A, 1), [0D0])
print *, norm

end program

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] \Sexpr[results=hide] produces \verb{ newlines }

2021-07-29 Thread Ivan Krylov

Hello R-devel!

Here's an Rd file that produces a large empty area when converted to
HTML:

\name{repro}
\title{title}
\description{description}
\details{
  Hello
  \Sexpr[stage=build,results=hide]{
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
invisible(NULL)
"" # workaround: remove results=hide and use the return value
  }
}

This seems to happen because \Sexpr gets expanded to \verb{ as many
newlines as there were code lines } by processRdChunk, by first storing
a newline for each line of the code:

https://github.com/wch/r-source/blob/d7a4ed9aaeee1f57c3c165aefa08b8d69dfe59fa/src/library/tools/R/RdConv2.R#L257

...and then the newlines get translated to \verb because res is
not empty:

https://github.com/wch/r-source/blob/d7a4ed9aaeee1f57c3c165aefa08b8d69dfe59fa/src/library/tools/R/RdConv2.R#L332

As long as Rd above doesn't stem from my misuse of \Sexpr, I would like
to propose the following patch, which seems to fix the problem:

Index: src/library/tools/R/RdConv2.R
===
--- src/library/tools/R/RdConv2.R   (revision 80675)
+++ src/library/tools/R/RdConv2.R   (working copy)
@@ -329,6 +329,8 @@
}
} else if (options$results == "text")
res <- tagged(err, "TEXT")
+   else if (options$results == "hide")
+   res <- tagged("", "COMMENT")
else if (length(res)) {
res <- lapply(as.list(res), function(x) tagged(x, "VERB"))
res <- tagged(res, "\\verb")

There are probably other ways of fixing this problem, e.g. by only
populating res if options$results != "hide" or only appending newlines
if res is non-empty.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Rd2txt: frmt() overestimates width of multi-line arguments

2021-07-31 Thread Ivan Krylov

Hi!

I wanted to use the second argument of \figure{}{} to provide a
pseudo-graphic rendition of the image and noticed that it's not centered
despite the intent to do so is visible in the code.

I traced it to a line in the frmt() function. It seems to me that
frmt() doesn't expect its input to consist of more than one line
because it sums the lengths of its components. A similar problem could
be observed in a \deqn{} consisting of multiple relatively short lines:
it would not be centered despite the available space.

A fix would be to change the width calculation:

Index: src/library/tools/R/Rd2txt.R
===
--- src/library/tools/R/Rd2txt.R(revision 80675)
+++ src/library/tools/R/Rd2txt.R(working copy)
@@ -356,7 +356,7 @@
 ## Use display widths as used by cat not print.
 frmt <- function(x, justify="left", width = 0L) {
 justify <- match.arg(justify, c("left", "right", "centre", "none"))
-w <- sum(nchar(x, "width")) # copes with 0-length x
+w <- max(nchar(c("", x), "width")) # copes with 0-length x
 if(w < width && justify != "none") {
 excess <- width - w
 left <- right <- 0L

I would like to clarify: should such small problems be raised first in
R-devel for discussion (e.g. whether it should be considered a bug)? Or
should I just file a bug in the Bugzilla and attach a patch there?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] \Sexpr[results=hide] produces \verb{ newlines }

2021-08-05 Thread Ivan Krylov

Hello Martin,

On Sat, 31 Jul 2021 22:14:17 +0200
Martin Maechler  wrote:

> I have implemented a version of your patch in my local copy of
> R-devel and tested your example, also with  Rd2latex() ..
> interestingly   Rd2txt()  does not produce the extra new lines
> even without your patch.

That's interesting, indeed. I think that flushBuffer() is responsible
for collapsing multiple blank lines into one in that case, but they do
get preserved in the buffer up to before that.

> I plan to commit your proposal after the weekend unless has 
> reasons against that.

Thanks for the review!

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fwd: Using existing envars in Renviron on friendly Windows

2021-10-15 Thread Ivan Krylov

On Fri, 15 Oct 2021 16:44:28 +0200
Michał Bojanowski  wrote:

> AVAR=${APPDATA}/foo/bar
> 
> Which is a documented way of referring to existing environment
> variables. Now, with that in R I'm getting:
> 
> Sys.getenv("APPDATA")# That works OK
> [1] "C:\\Users\\mbojanowski\\AppData\\Roaming"
> 
> so OK, but:
> 
> Sys.getenv("AVAR")
> [1] "C:UsersmbojanowskiAppDataRoaming/foo/bar"

Hmm, that

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Fwd: Using existing envars in Renviron on friendly Windows

2021-10-15 Thread Ivan Krylov

Sorry for the noise! I wasn't supposed to send my previous message.

On Fri, 15 Oct 2021 16:44:28 +0200
Michał Bojanowski  wrote:

> AVAR=${APPDATA}/foo/bar
> 
> Which is a documented way of referring to existing environment
> variables. Now, with that in R I'm getting:
> 
> Sys.getenv("APPDATA")# That works OK
> [1] "C:\\Users\\mbojanowski\\AppData\\Roaming"
> 
> so OK, but:
> 
> Sys.getenv("AVAR")
> [1] "C:UsersmbojanowskiAppDataRoaming/foo/bar"

Hmm, a function called by readRenviron does seem to remove backslashes,
but not if they are encountered inside quotes:

https://github.com/r-devel/r-svn/blob/3f8b75857fb1397f9f3ceab6c75554e1a5386adc/src/main/Renviron.c#L149

Would AVAR="${APPDATA}"/foo/bar work?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Using R to wrap NREL's SSC library

2021-11-09 Thread Ivan Krylov

Sorry, this discussion is better continued at
https://stat.ethz.ch/mailman/listinfo/r-package-devel.

On Tue, 09 Nov 2021 16:19:13 +
Ezra Tucker  wrote:

> PKG_LIBS=-L/opt/SAM/2020.11.29/linux_64

There's no good answer, but search "Writing R Extensions" for "rpath"
for potential ways of solving this problem: you'll need to specify the
runtime dynamic library path in addition to the link-time library path.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] string concatenation operator (revisited)

2021-12-05 Thread Ivan Krylov

On Sat, 4 Dec 2021 21:26:05 -0500
Avi Gross via R-devel  wrote:

> In many languages, like PERL, this results in implicated conversion
> to make "text1" the result.

FWIW, Perl5 has a separate string concatenation operator (".") in order
to avoid potential confusion with addition. So do Lua (".."), SQL
("||", only some of the dialects) and Raku ("~", former Perl6). Some of
the potential concerns with string concatenation as an operator in R
could be alleviated by introducing a separate operator, just like matrix
multiplication ("%*%") is separate from elementwise multiplication
("*"), nowadays even in Python ("@" and "*", respectively).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] build failure: 'hashtab' is not an exported object from 'namespace:utils'

2021-12-16 Thread Ivan Krylov

On Thu, 16 Dec 2021 10:13:11 +0100
Stephen Berman  wrote:

> Is this a known issue and is there a fix?

For me, the fix was to remove the already-installed
$SVNROOT/library/utils (which didn't yet contain hashtab) and re-run
make, letting the R build process re-install it from scratch.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] hashtab address arg

2021-12-22 Thread Ivan Krylov

On Sat, 18 Dec 2021 11:50:54 +0100
Arnaud FELD  wrote:

> However, I'm a bit troubled about the "address" argument. What is it
> intended for since (as far as I know) "address equality" is until now
> something that isn't really let for the user to decide within R.

Using the words from "Extending R" by John M. Chambers, the concept of
address identity could be related to the question:

>> If some of the data in the object has changed, is this still the
>> same object?

Most objects in R are defined by their content. If you had a 100x100
matrix and changed an element at [50,50], it's now a different matrix,
even if it's stored in the same variable. If you create another 100x100
matrix in a different variable but fill it with the same numbers, it
should still compare equal to your original matrix.

Not all types of R objects are like that. Environments are good
candidates for pointer equality comparison. For example, the contents
of the global environment change every time you assign some variable in
the R command line, but it remains the same global environment. Indeed,
identical() for environments just compares their pointers: even if two
different environments only contain objects that compare equal, they
cannot be considered the same environment, because different closures
might be referring to them. Similar are data.tables: if you had a giant
dataset and, as part of cleaning it up, removed some outliers, perhaps
it should be considered the same dataset, even if the contents aren't
strictly the same any more. Same goes for reference class and R6
objects: unlike the pass-by-value semantics associated with most
objects in R, these are assumed to carry global state within them, and
modifications to them are reflected everywhere they are referenced, not
limited to the current function call.

I *think* that most (if not all) objects with reference semantics
already use pointer comparison when being compared by identical(), so
the default of "identical" is, as the help page says, almost always the
right choice, but if it matters to your code whether the objects are
actually stored in the same area in the memory, use hashes of type
"address".

(Perhaps this topic could be a better fit for R-help.)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Backslashes, braces, \Sexpr and user macros

2022-03-29 Thread Ivan Krylov



Hello R-devel,

I had been using an Rd trick that runs afoul of the changes in
r81965[*]. After the change, there seems to be a problem with any
user-defined macro that takes R code, forwards it to \Sexpr and gets
passed a string literal containing '{':

\newcommand{\RRd}{\Sexpr[stage=build,results=rd]{#1}}
\RRd{x <- 'a string with { inside it'; 'okay'}

R-4.1.3: okay
R-devel: Error: x.Rd: '\{' is an unrecognized escape in character string
starting "'a string with \{"
Execution halted

My actual use case is a macro that forwards its argument to
\Sexpr{tools::toRd(bibentry(...))}. I've been trying to pass \enc{}{}
as an argument to bibentry; previously, I had to use a workaround to
create the backslash, but now any braces (in addition to backslashes)
inside string literals also result in errors:

\newcommand{\bib}{\Sexpr[stage=build,results=rd]{bibentry(#1)}}
\bib{'Misc',
  author = person('J.', 'Doe'), year = 2022,
  title = paste0(
'A ', rawToChar(as.raw(0x5c)),
'enc{møøse}{moose} bit my sister once'
#   ^-^^-^--- in these four places
  )
}

R-4.1.3: Doe J (2022). “A møøse bit my sister once.”

R-devel: Error: x.Rd: '\{' is an unrecognized escape in character string
starting "'enc\{"
Execution halted

Should I have relied upon \Sexpr{#1} working inside user-defined
macros? Georgi N. Boshnakov's Rdpack does similar things (and has been
an inspiration for my tricks). I'll be glad to re-write some of my Rd
to achieve the same result using less crazy tricks, assuming there is a
right way to do that.

-- 
Best regards,
Ivan

[*]
https://github.com/r-devel/r-svn/commit/7ccd166735e6b978023b85fc9aa4845af8c0c241

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-devel Digest, Vol 229, Issue 21

2022-03-30 Thread Ivan Krylov

On Wed, 30 Mar 2022 17:45:15 +
Georgi Boshnakov  wrote:

> I am surprised that the above macro ever worked as intended.

I'm sorry, somehow I didn't pay enough attention when preparing that
part of the e-mail. The actual macro uses tools::toRd(bibentry(...)).

> I don't know if something was fixed between the R-devel you used and
> probably the most recent one I did but reproducible example and
> system information would be helpful.

I'll remember to provide more reproducible examples next time.

c81965 was reverted, and "take two" of the improvement in c82029
doesn't crash my macros (thanks, Kurt Hornik!), but I'm still willing
to rewrite my macros to give R maintainers more freedom to evolve Rd in
the future. Maybe it's better to use a per-package database of
bibentries and reference them with much simpler macro arguments, but
that would be better fit for R-pkg-devel :)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Writing in R // R 4.2.0 on Windows doesn't work with Dasher

2022-05-15 Thread Ivan Krylov

On Sun, 15 May 2022 11:10:49 -0300
Paulo Barata  wrote:

> I noticed today that R 4.2.0 responds to the "Enter" character of
> Dasher version 5.0.0 beta in "Direct Entry" mode. An "Enter" in
> Dasher causes the R prompt to jump to the next line, as an "Enter"
> should do.
> 
> No other characters (letters, digits, special characters like + = & ~
> | # $) are accepted by R 4.2.0 through Dasher.

This seems to be very similar to the problem discussed here:
https://stat.ethz.ch/pipermail/r-devel/2022-May/081683.html

For the record, here's the part of Dasher responsible for "Direct
Entry" on Windows:
https://github.com/dasher-project/dasher/blob/0ec7d646dbaadf06b8eebfc11b784738a20e75af/Src/Win32/Widgets/Edit.cpp#L302

MSDN says:

>> If KEYEVENTF_UNICODE is specified, SendInput sends a WM_KEYDOWN or
>> WM_KEYUP message to the foreground thread's message queue with
>> wParam equal to VK_PACKET. Once GetMessage or PeekMessage obtains
>> this message, passing the message to TranslateMessage posts a
>> WM_CHAR message with the Unicode character originally specified by
>> wScan.



Previously, Tomas Kalibera has mentioned:

> For Unicode Windows, GraphApp uses WM_IME_COMPOSITION messages to
> read the keys instead of WM_CHAR, which it uses for non-Unicode
> windows.

I think that for a combination of R >= 4.2 and Dasher, this means an
impasse. Either Dasher should be taught to send WM_IME_COMPOSITION
messages to R (?), or R should learn to understand WM_CHAR messages in
its Unicode windows in addition to WM_IME_COMPOSITION it currently uses.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] HTML documentation check: 'condition must be plain text'

2022-06-20 Thread Ivan Krylov

Hi,

Use of \Sexpr in an \if condition in R documentation results in a NOTE,
but only during HTML documentation check, not any of the previous Rd
checks:

  \if{\Sexpr{'TRUE'}}{The condition evaluates to true.}

* checking HTML version of manual ... NOTE
Encountered the following conversion/validation errors:
foo.Rd:10: condition must be plain text

Is this supported? "Writing R documentation" §2.11 seems to agree:

>> Also accepted [as a condition] are TRUE (matching all formats) and
>> FALSE (matching no formats). These could be the output of the \Sexpr
>> macro.

In order to check the HTML documentation, check_Rd2HTML() runs
tools::Rd2HTML() on the results of tools::Rd_db() [1]. The former runs
\Sexpr[stage=render] macros and the latter runs \Sexpr[stage=build]
macros, leaving \Sexpr[stage=install] macros unevaluated. The NOTE goes
away if I switch the "stage" argument to anything but "install".

Is there any downside to adding stages = c('build', 'install') to the
Rd_db call or stages = c('install', 'render') to the Rd2HTML call in
order to make this NOTE go away?

-- 
Best regards,
Ivan

[1]
https://github.com/r-devel/r-svn/blob/d25e77715164e39c96baae4c180d8f980ec93932/src/library/tools/R/check.R#L4911-L4939

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] patch about timezone name of China Standard Time on windows

2023-01-05 Thread Ivan Krylov

Hello Yu Gong!

I'm not an R developer, but I hope I'll be able to help you with my
advice.

On Fri, 6 Jan 2023 04:24:43 +
gong yu  wrote:

> Last week ,I report a issuse about  timezone name about" China
> Standard Time", R on windows will report to Asia/Taipei , but it
> should be Asia/Shanghai, Since  still now  no feedback (maybe because
> for my poor english). So resubmit a patch about this .

Sometimes, everyone is too busy to discuss a patch right now, but a
patch just left on R-devel may be eventually forgotten before there's
time for discussion. Try asking for a Bugzilla account (by following
the guide at ) and submitting the
patch there.

> and in current R implemention , it report " China Standard Time" is
> incorrect ,but "Taipei Standard Time" is correct, so this patch only
> need modify " China Standard Time" and will not affect other timezone.

Thank you for verifying this. As far as I can tell, the file has always
contained both entries referring to "Asia/Taipei":
https://github.com/r-devel/r-svn/blame/22bf3a2cb1dd6844e69c9214d990bf438a485db7/src/extra/tzone/registryTZ.c

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] rhub vs. CRAN fedora-*-devel, using armadillo & slapack

2023-01-10 Thread Ivan Krylov

On Tue, 10 Jan 2023 16:27:53 +
RICHET Yann  wrote:

> In facts, 10 threads are asked by armadillo for some LinAlg, which
> backs to two threads as warned. 

I think you're right about your tests de-facto using two threads, but
it might be a good idea to _default_ to up to two threads in tests and
examples. This is especially valuable for third-party developers who
have to mass-test packages (one of which could be rlibkriging) in
parallel.

> - is there any reason that could explain that fedora-*-devel is so
> slow for this package or compilation of Rcpp/testthat ?

Compilation time is definitely not the reason. Something in tests/*
actually runs for 30 minutes by itself.

> - is there any chance that I can get a deeper log of what happened ?

If you split your tests into separate files under tests/*.R instead of
using a single tests/testthat.R calling the rest of the tests, R will
be able to show you the individual test file that hung and maybe the
line where it happened. (Also, you'll get per-file timing.) But that
is potentially a huge investment: you may have to rewrite the tests to
work outside the testthat harness, and you'd also have to prepare
another CRAN submission just to have those tests run. It's also against
CRAN policy to knowingly submit a package with unfixed ERRORs.

(Currently, R can only tell you that the tests hung in the
test_check('rlibkriging') call in the tests/testthat.R, which isn't
precise enough.)

These lines look a bit scary:

tests/testthat/bench-KrigingFit.R:

pack=list.files(file.path("bindings","R"),pattern = ".tar.gz",full.names = T) 
install.packages(pack,repos=NULL)

tests/testthat/notest-LinearRegression.R:

install.packages(pkgs="rlibkriging_0.7-2.tgz", type="source", repos=NULL)
library(rlibkriging)

I don't think that install.packages does anything besides raising a
warning and letting the test continue (there doesn't seem to be any
.tar.gz files inside the package on CRAN, so installation fails), but
it's probably not a good idea to install packages during testing [*].

This is also not quite right but harmless, because rlibkriging is
already loaded:

library(rlibkriging, lib.loc="bindings/R/Rlibs")

I'm afraid I don't know how to reproduce the timeouts you're seeing on
the CRAN Fedora machine.

-- 
Best regards,
Ivan

[*] I once wrote a test that launched multiple child R processes,
created a temporary library in each of them and installed the package
there. The test was designed to only run on the developer's machine in
order to test file format compatibility between different versions of
R. I ended up moving the test away from tests/*.R in order to make sure
it won't run by accident.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] rhub vs. CRAN fedora-*-devel, using armadillo & slapack

2023-01-11 Thread Ivan Krylov

On Wed, 11 Jan 2023 13:09:25 -0500
Duncan Murdoch  wrote:

> I think Sebastian or my suggestion is easier than redoing all of your 
> tests.  They are each one line changes.

Yes, sorry about that. Definitely try a verbose reporter first instead
of redoing the tests. I'll remember not to give this advice next time.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Sys.getenv(): Error in substring(x, m + 1L) : invalid multibyte string at '' if an environment variable contains \xFF

2023-01-31 Thread Ivan Krylov

Can we use the "bytes" encoding for such environment variables invalid
in the current locale? The following patch preserves CE_NATIVE for
strings valid in the current UTF-8 or multibyte locale (or
non-multibyte strings) but sets CE_BYTES for those that are invalid:

Index: src/main/sysutils.c
===
--- src/main/sysutils.c (revision 83731)
+++ src/main/sysutils.c (working copy)
@@ -393,8 +393,16 @@
char **e;
for (i = 0, e = environ; *e != NULL; i++, e++);
PROTECT(ans = allocVector(STRSXP, i));
-   for (i = 0, e = environ; *e != NULL; i++, e++)
-   SET_STRING_ELT(ans, i, mkChar(*e));
+   for (i = 0, e = environ; *e != NULL; i++, e++) {
+   cetype_t enc = known_to_be_latin1 ? CE_LATIN1 :
+   known_to_be_utf8 ? CE_UTF8 :
+   CE_NATIVE;
+   if (
+   (utf8locale && !utf8Valid(*e))
+   || (mbcslocale && !mbcsValid(*e))
+   ) enc = CE_BYTES;
+   SET_STRING_ELT(ans, i, mkCharCE(*e, enc));
+   }
 #endif
 } else {
PROTECT(ans = allocVector(STRSXP, i));
@@ -416,11 +424,14 @@
if (s == NULL)
SET_STRING_ELT(ans, j, STRING_ELT(CADR(args), 0));
else {
-   SEXP tmp;
-   if(known_to_be_latin1) tmp = mkCharCE(s, CE_LATIN1);
-   else if(known_to_be_utf8) tmp = mkCharCE(s, CE_UTF8);
-   else tmp = mkChar(s);
-   SET_STRING_ELT(ans, j, tmp);
+   cetype_t enc = known_to_be_latin1 ? CE_LATIN1 :
+   known_to_be_utf8 ? CE_UTF8 :
+   CE_NATIVE;
+   if (
+   (utf8locale && !utf8Valid(s))
+   || (mbcslocale && !mbcsValid(s))
+   ) enc = CE_BYTES;
+   SET_STRING_ELT(ans, j, mkCharCE(s, enc));
}
 #endif
}

Here are the potential problems with this approach:

 * I don't know whether known_to_be_utf8 can disagree with utf8locale.
   known_to_be_utf8 was the original condition for setting CE_UTF8 on
   the string. I also need to detect non-UTF-8 multibyte locales, so
   I'm checking for utf8locale and mbcslocale. Perhaps I should be more
   careful and test for (enc == CE_UTF8) || (utf8locale && enc ==
   CE_NATIVE) instead of just utf8locale.

 * I have verified that Sys.getenv() doesn't crash with UTF-8-invalid
   strings in the environment with this patch applied, but now
   print.Dlist does, because formatDL wants to compute the width of the
   string which has the 'bytes' encoding. If this is a good way to
   solve the problem, I can work on suggesting a fix for formatDL to
   avoid the error.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] R2HTML doesn't split paragraphs originating from \Sexpr[results=rd]

2023-02-05 Thread Ivan Krylov

Hello,

Here's an example that renders correctly using Rd2txt / Rd2latex / R
CMD Rd2pdf, but has problems under Rd2HTML:

\name{foo}
\title{foo}
\section{foo}{
  This should be on a separate paragraph

  This should be on a separate paragraph

  This should be on a separate paragraph

  \Sexpr[stage=render,results=rd]{
paste(
  rep('Sexpr: This should be on a separate paragraph', 3),
  collapse = '\n\n'
)
  }
}

For the text I've typed manually, there are ... tags splitting
the text separated by empty lines into paragraphs. The \Sexpr return
value only has newlines, which joints the paragraphs together:

This should be on a separate paragraph

This should be on a separate paragraph

This should be on a separate paragraph

Sexpr: This should be on a separate paragraph

Sexpr: This should be on a separate paragraph

Sexpr: This should be on a separate paragraph


addParaBreaks() is prevented from closing the paragraph tag because
tools:::isBlankLineRd() returns FALSE for blank lines produced by a
\Sexpr. This happens because utils:::getSrcByte() not 1 for these blank
lines. That, in turn, is because the source reference for \Sexpr values
is the whole \Sexpr tag:

# blank line from a \Sexpr
Rd[[3]][[2]][[9]][[2]]
# [1] "\n"
# attr(,"Rd_tag")
# [1] "TEXT"
getSrcref(Rd[[3]][[2]][[9]][[2]])
# \Sexpr[stage=render,results=rd]{
#   paste(
# rep('Sexpr: This should be on a separate paragraph', 3),
# collapse = '\n\n'
#   )
# }

# artisanal hand-crafted blank line
Rd[[3]][[2]][[7]]
# [1] "\n"
# attr(,"Rd_tag")
# [1] "TEXT"
summary(getSrcref(Rd[[3]][[2]][[7]]))
# 

I think I understand that tools:::isBlankLineRd requires
utils:::getSrcByte(x) == 1L because it may be called on things like
"\\eqn{0}\n" where the terminal "\n" might otherwise be considered a
"blank line". How to reconcile isBlankLineRd with blank lines not
directly originating from Rd source?

Rd2latex might have a similar problem (its addParaBreaks() checks for
isBlankLineRd()), but then it works anyway because Rd rules for
paragraph breaks on blank lines are the same as in LaTeX.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] scan(..., skip=1e11): infinite loop; cannot interrupt

2023-02-11 Thread Ivan Krylov

On Fri, 10 Feb 2023 23:38:55 -0600
Spencer Graves  wrote:

> I have a 4.54 GB file that I'm trying to read in chunks using 
> "scan(..., skip=__)".  It works as expected for small values of
> "skip" but goes into an infinite loop for "skip=1e11" and similar
> large values of skip:  I cannot even interrupt it;  I must kill R.

Skipping lines is done by two nested loops. The outer loop counts the
lines to skip; the inner loop reads characters until it encounters a
newline or end of file. The outer loop doesn't check for EOF and keeps
asking for more characters until the inner loop runs at least once for
every line it wants to skip. The following patch should avoid the
wait in such cases:

--- src/main/scan.c (revision 83797)
+++ src/main/scan.c (working copy)
@@ -835,7 +835,7 @@
 attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
 SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr;
-int c, flush, fill, blskip, multiline, escapes, skipNul;
+int c = 0, flush, fill, blskip, multiline, escapes, skipNul;
 R_xlen_t nmax, nlines, nskip;
 const char *p, *encoding;
 RCNTXT cntxt;
@@ -952,7 +952,7 @@
if(!data.con->canread)
error(_("cannot read from this connection"));
}
-   for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */
+   for (R_xlen_t i = 0; i < nskip && c != R_EOF; i++) /* MBCS-safe */
while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF);
 }

Making it interruptible is a bit more work: we need to ensure that a
valid context is set up and check regularly for an interrupt.

--- src/main/scan.c (revision 83797)
+++ src/main/scan.c (working copy)
@@ -835,7 +835,7 @@
 attribute_hidden SEXP do_scan(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
 SEXP ans, file, sep, what, stripwhite, dec, quotes, comstr;
-int c, flush, fill, blskip, multiline, escapes, skipNul;
+int c = 0, flush, fill, blskip, multiline, escapes, skipNul;
 R_xlen_t nmax, nlines, nskip;
 const char *p, *encoding;
 RCNTXT cntxt;
@@ -952,8 +952,6 @@
if(!data.con->canread)
error(_("cannot read from this connection"));
}
-   for (R_xlen_t i = 0; i < nskip; i++) /* MBCS-safe */
-   while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF);
 }

 ans = R_NilValue;  /* -Wall */
@@ -966,6 +964,10 @@
 cntxt.cend = &scan_cleanup;
 cntxt.cenddata = &data;

+if (ii) for (R_xlen_t i = 0, j = 0; i < nskip && c != R_EOF; i++) /* 
MBCS-safe */
+   while ((c = scanchar(FALSE, &data)) != '\n' && c != R_EOF)
+   if (j++ % 1 == ) R_CheckUserInterrupt();
+
 switch (TYPEOF(what)) {
 case LGLSXP:
 case INTSXP:

This way, even if you pour a Decanter of Endless Lines (e.g. mkfifo
LINES; perl -E'print "A"x42 while 1;' > LINES) into scan(), it can
still be interrupted, even if neither newline nor EOF ever arrives.
(We never skip lines when reading from the console? I suppose it makes
sense. I think this needs to be documented and can write a
documentation patch.)

If you actually have 1e11 lines in your file and would like to read it
in chunks, it may help to use

f <- file('...')
chunk1 <- scan(f, n = n1, skip = nskip1)
# the following will continue reading where chunk1 had ended
chunk2 <- scan(f, n = n2, skip = nskip2)

...in order to avoid having to skip over chunks you have already read,
which otherwise makes the algorithm quadratic in number of lines
instead of linear. (I couldn't determine whether you're already doing
this, sorry.)

Skipping a fixed number of lines is hard: since they have variable
length, it's required to read every character in order to determine
whether it starts a new line. With byte ranges, it would have been
possible to use seek(), but not here.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Question on non-blocking socket

2023-02-15 Thread Ivan Krylov

В Wed, 15 Feb 2023 01:24:26 +0100
Ben Engbers  пишет:

> where I can find R documentation on proper use of non-blocking
> sockets and on the proper use of the socketSelect function?

A useful guide to the Berkeley sockets API can be found at
. You'll have to translate between the C
idioms and the R idioms, but it's better than having no guide at all.
In particular, R spares you from having to figure out differently-sized
struct sockaddr objects by converting them to string representations of
the addresses (currently limited to IPv4).

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Robustifying R_CleanTempDir a bit more

2023-02-16 Thread Ivan Krylov

Hello,

This is probably a very minor point, but R_CleanTempDir may still have
a shell injection in it. I couldn't find a way to shoot the user in the
foot in a significant way (by, say, accidentally removing ~), thanks to
R disallowing spaces in the path, but if Sys_TempDir somehow acquires a
value of "/tmp/';echo;'", R_CleanTempDir() will remove /tmp instead of
its aptly-named subdirectory.

While adding the single-quote symbol to the list of special symbols
should suffice (it and the backslash being the only allowed ways to
"un-quote" a single-quoted string), I would like to suggest solving the
problem without the use of quoting:

#include 

char ** argv = { "rm", "-Rf", Sys_TempDir, NULL };
posix_spawnp(NULL, "rm", NULL, NULL, argv, NULL);

Are there Unix-like platforms on which R is intended to work that don't
have posix_spawn()? Circa-2014 versions of both Solaris and OpenBSD
seem to have it. Spawning the process manually by means of [v]fork()
and exec() is probably not worth the maintainer effort required to
perform it correctly.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Robustifying R_CleanTempDir a bit more

2023-02-16 Thread Ivan Krylov

Thanks for the quick reply!

On Thu, 16 Feb 2023 15:43:40 +0100
Tomas Kalibera  wrote:

> Please see 83851 from earlier today which does a bit more of 
> robustification, and if you find any problem in it, please let me
> know.

83851 is an improvement, but it does let single quotes through,
unfortunately, leading to my (contrived) example of "/tmp/';echo;'". 

Given what you say about the temporary nature of the current fix,
adding the single quote to the list of special symbols should be a good
solution for now:

--- src/main/platform.c (revision 83851)
+++ src/main/platform.c (working copy)
@@ -1634,7 +1634,7 @@
/* On Solaris the working directory must be outside this one */
chdir(R_HomeDir());
 #endif
-   char *special = "\\`$\"\n";
+   char *special = "\\`$\"\n'";
int hasspecial = 0;
for(int i = 0; special[i] != '\0'; i++)
if (strchr(Sys_TempDir, special[i])) {

At least I don't see a way out once you disallow single quotes in the
single-quoted string.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R: determine if `suppressMessages()` has been invoked

2023-02-19 Thread Ivan Krylov

On Sun, 19 Feb 2023 15:37:33 +0100 (CET)
Nino Hardt  wrote:

> I would like to create a function that detects if suppressMessages
> has been invoked upon running that same function. 

Would you mind letting us know why? Just curious. Normally, I would
just use message() for everything and let the users decide whether they
want to see it.

> I was looking through [R Internals], but I haven't found an answer. I
> do not understand **how** suppressMessages works.

It works by cooperating with message().

message() itself works by trying to raise a "message" condition and
providing a "muffleMessage" restart that does nothing. If the condition
wasn't handled (the "muffleMessage" restart wasn't called by the
handler), the text of the message is printed.

In turn, suppressMessages() sets up a handler for conditions of class
"message" that invokes the "muffleMessage" restart provided by
message() itself above.

We can use the fact that the availability of the "muffleMessage"
restart is a documented detail and check whether signalling a "message"
condition will call this restart:

are_messages_suppressed <- function() withRestarts(
 {
  signalCondition(simpleMessage(''))
  # we stay here if restart is not invoked
  FALSE
 },
 muffleMessage = function()
  # we jump here if restart is invoked
  TRUE
)
are_messages_suppressed()
# [1] FALSE
suppressMessages(are_messages_suppressed())
# [1] TRUE

I don't think I understand handlers and restarts enough to explain them
well, but the following link seems to be one of the defining documents
for R's condition handling system:
https://homepage.stat.uiowa.edu/~luke/R/exceptions/simpcond.html

Hadley Wickham's Advanced R (first edition only) contains a good
explanation of R's condition system:
http://adv-r.had.co.nz/Exceptions-Debugging.html
http://adv-r.had.co.nz/beyond-exception-handling.html

(In my opinion, this could be a better question for R-help, since we
ought to be using documented R APIs here.)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R: determine if `suppressMessages()` has been invoked

2023-02-19 Thread Ivan Krylov

On Sun, 19 Feb 2023 18:45:55 +0100 (CET)
Nino Hardt  wrote:

> When using C or C++ via Rinside or within a package, those functions
> do not listen to suppressMessages, e.g. `Rprintf` keeps printing to
> the console. 

True, Rprintf is really convenient, but it doesn't raise any conditions
and just prints straight to the console. While calling R message() from
C is possible, it doesn't format anything.

There's a wishlist item for a C entry point for message() (I would
argue that a "messagef" taking a format string would be even better),
but it's waiting for someone with free time to implement it:
https://bugs.r-project.org/show_bug.cgi?id=17612

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Possible NA Propagation Failure in RISC-V64 CPU?

2023-02-24 Thread Ivan Krylov

On Thu, 23 Feb 2023 15:39:08 -0800
Jane He  wrote:

> In short, according to my understanding of R's convention, any
> calculation involving NA but no NaN should result in NA (called NA
> propagation), and any calculation involving NaN but no NA should
> result in NaN. Calculations involving both NA and NaN can result in
> either value.

The ?NA help page already contains a warning that "future CPUs and/or
compilers" may prevent NA from resulting in computations with NA. A
similar problem has been encountered on Apple processors, but a
workaround was found there:
https://blog.r-project.org/2020/11/02/will-r-work-on-apple-silicon/#nanan-payload-propagation

> This failure in NA propagation may cause many R packages like mice to
> not work properly, and results in the `make check` test in the
> `stats` package to fail.

Perhaps the way forward is to update the tests.

Regarding R packages, since is.na() is already documented to return
TRUE for all NaNs, not only the NA value, it should be possible to make
them work even if they currently fail.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping

2023-03-01 Thread Ivan Krylov

В Wed, 1 Mar 2023 01:36:02 -0800
Michael Chirico via R-devel  пишет:

> +comps[non_syntactic] <- paste0("`", comps[non_syntactic], "`")

There are a few more corner cases. For example, comps could contain
backticks (which should be escaped with backslashes) and backslashes
(which should also be escaped). Thankfully, \u-style Unicode escape
sequences are not currently supported inside backticks, and "escape the
backslash" rule already takes care of them.

The deparse() function already knows these rules:

name <- 'hello world ` \\uFF'
cat(deparse1(as.name(name), backtick=TRUE), '\n')
# `hello world \` \\uFF`
`hello world \` \\uFF` <- 'hello'
`hello world \` \\uFF`
# [1] "hello"

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tab-complete for non-syntactic names could attempt backtick-wrapping

2023-03-02 Thread Ivan Krylov

There turn out to be a few more things to fix.

One problem is easy to solve: vapply() needs a third argument
specifying the type of the return value. (Can we have unit tests for
tab completion?)

The other problem is harder: `comps` defaults to an empty string, and
you can't have a symbol consisting of an empty string, because this
value is internally reserved for missing function arguments. I think
you can return(paste0(prefix, op)) if length(comps) == 0L, but this is
still somewhat worrying. R tries to prevent empty names, so I wouldn't
expect specialOpCompletionsHelper() to return an empty string, but I
can't prove it right now.

On the other hand, x$'a string' is the same as x$`a string`. Could we
just drop as.name() and keep the return value being a string literal?
I'm not sure about this, either.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] `dendrapply` Enhancements

2023-03-02 Thread Ivan Krylov

Dear Aidan Lakshman,

To answer your implicit question, VECTOR_ELT() unclasses the nodes
because it doesn't go through the stats:::`[[.dendrogram` method,
instead dereferencing the data pointer directly.

Other *apply functions in base R create a call to the `[[` operator,
letting the language dispatch the generic call, allowing the method to
assign a class to the return value. The following example is taken from
src/main/apply.c:do_lapply():

// prepare a call to FUN(X[[i]], ...)

SEXP isym = install("i");
SEXP tmp = PROTECT(lang3(R_Bracket2Symbol, X, isym));
SEXP R_fcall = PROTECT(lang3(FUN, tmp, R_DotsSymbol));
MARK_NOT_MUTABLE(R_fcall);

// inside the loop: evaluate the call

tmp = R_forceAndCall(R_fcall, 1, rho);

Not sure which way is faster, but it may make sense to try, and it's
probably more correct in (contrived) cases where unclass(node)[[i]] is
invalid because it relies on a hypothetical `[[.subclass-of-dendrogram`
to restore some invariants.

Would you mind telling me more about the following case?

> if(!(inherits(res,c('dendrogram', 'list'{
>  res1 <- lapply(unclass(node), \(x) x)
> }

If you're looking to improve the performance, there might be a way to
avoid the wrapper and this lapply(unclass(node), identity) call in it.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Multiple Assignment built into the R Interpreter?

2023-03-11 Thread Ivan Krylov

On Sat, 11 Mar 2023 11:11:06 -0500
Duncan Murdoch wrote:

> That's clear, but your proposal violates a very basic property of the
> language, i.e. that all statements are expressions and have a value.

How about reframing this feature request from multiple assignment
(which does go contrary to "everything has only one value, even if it's
sometimes invisible(NULL)") to "structured binding" / "destructuring
assignment" [*], which takes this single single value returned by the
expression and subsets it subject to certain rules? It may be easier to
make a decision on the semantics for destructuring assignment (e.g.
languages which have this feature typically allow throwing unneeded
parts of the return value away), and it doesn't seem to break as much
of the rest of the language if implemented.

I see you've already mentioned it ("JavaScript-like"). I think it would
fulfil Sebastian's requirements too, as long as it is considered "true
assignment" by the rest of the language.

The hard part is to propose the actual grammar of the new feature (in
terms of src/main/gram.y, preferably without introducing conflicts) and
its semantics (including the corner cases, some of which you have
already mentioned). I'm not sure I'm up to the task.

--
Best regards,
Ivan

[*]
https://en.cppreference.com/w/cpp/language/structured_binding
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Destructuring_assignment

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Making headers self-contained for static analysis

2023-03-16 Thread Ivan Krylov

Hello Lionel,

Just letting you know off-list that the patch didn't make it through.
Unfortunately, I don't remember the exact rules regarding attachments
(does text/plain work?), but an external link is always an option,
especially for large patches.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] `dendrapply` Enhancements

2023-03-23 Thread Ivan Krylov

Hello Aidan,

Sorry for dropping this for a while.

В Thu, 2 Mar 2023 21:03:59 +
"Lakshman, Aidan H"  пишет:

> //after
> curnode = eval(lang3(R_Bracket2Symbol, parent->node, DEND_IND), env);

lang3() always constructs a new language object. If you do end up using
eval(), it may make sense to move lang3() out of the loop and reuse the
existing object by referring to the DEND_IND variable using its symbol,
like it's done in the lapply() implementation.

> The problem is, it seems like the returned value from `eval` is not
> protected, whereas the value from VECTOR_ELT is (if the source list
> is protected). My understanding is that this happens because
> VECTOR_ELT just copies the pointer, whereas eval(…) calls R code,
> which returns a copy of the object.

That's right, `[[.dendrogram` returns a new object which is not
protected, unlike the raw elements of node that you're keeping a
protected pointer to.

> This ends up being problematic, since it isn’t feasible to protect
> all the nodes until we’re done with them.

I see. It's not easy in R to unprotect arbitrary previously-allocated
objects in a safe way. Have you considered "precious multi-sets"
?
They have R_ReleaseFromMSet(), making it possible to free arbitrary
objects from the set, and they automatically unprotect everything they
contain on destruction.

> It’s also possible to use eval(…) to get the node, apply the function
> to it, save its class in the linked list, and then save the object
> using VECTOR_ELT. This way we get the benefits of `[[` dispatch,
> class preservation, and constant stack space. However, this ends up
> hurting performance significantly (about 4x slower than the current
> new version, making it around half the speed of the version in stats).

That's a clever solution! Can you profile the code to see if there are
visible sources of slowdown? Maybe this can be salvaged.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] removeSource() vs. function literals

2023-03-30 Thread Ivan Krylov

Dear R-devel,

In a package of mine, I use removeSource on expression objects in order
to make expressions that are semantically the same serialize to the
same byte sequences:
https://github.com/cran/depcache/blob/854d68a/R/fixup.R#L8-L34

Today I learned that expressions containing function definitions also
contain the source references for the functions, not as an attribute,
but as a separate argument to the `function` call:

str(quote(function() NULL)[[4]])
# 'srcref' int [1:8] 1 11 1 25 11 25 1 1
# - attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile'
#   

This means that removeSource() on an expression that would define a
function when evaluated doesn't actually remove the source reference
from the object.

Do you think it would be appropriate to teach removeSource() to remove
such source references? What could be a good way to implement that?
if (is.call(fn) && identical(fn[[1]], 'function')) fn[[4]] <- NULL
sounds too arbitrary. if (inherits(fn, 'srcref')) return(NULL) sounds
too broad.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Breaking Change in Rcomplex Layout?

2023-04-04 Thread Ivan Krylov

On Tue, 4 Apr 2023 09:31:33 +0200
Tomas Kalibera  wrote:

> it also matters how Rust can handle anonymous nested structures, or
> what is the right mapping of the involved C types to Rust

It's worth mentioning that Rust's RFC 2102 [1] defines unnamed structs
and unions for the purpose of C interoperability, but it's not yet
implemented.

Would it help Rust's bindgen (and, potentially, other C declaration
parsers) if R could optionally give a name to the anonymous union
member, like it's done in Windows API [2]?

-- 
Best regards,
Ivan

[1] https://github.com/rust-lang/rust/issues/49804

[2] https://devblogs.microsoft.com/oldnewthing/20170907-00/?p=96956

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] removeSource() vs. function literals

2023-04-04 Thread Ivan Krylov

Thanks for the comments and sorry I didn't reply sooner!

On Thu, 30 Mar 2023 12:38:24 -0400
Duncan Murdoch  wrote:

> You'd need to recurse through all expressions in the object.  Some of
> those expressions might be environments, so your changes could leak
> out of the function you're working on.

In my efforts to get arbitrary objects to hash consistently, I already
walk them recursively, but I do stop at environments. (In theory, it
could be possible to create mock environments with the same
relationships between each other and then "fix up" and hash their
contents, but it's hard to do right. Imagine environments e1 and e2
where e1$other <- e2 and e2$other <- e1.)

I think that removeSource() already walks language objects recursively,
it just doesn't remove source references from unevaluated function
expressions.

> Things are simpler if you know the expression is the unmodified
> result of parsing source code, but if you know that, wouldn't you
> usually be able to control things by setting keep.source = FALSE?

I receive the expression object from substitute(). The idea is to hash
the expression, locate and hash its dependencies and then see if
there's already a file named like the resulting hash. In theory, the
user could be constructing elaborate scary-looking expressions and then
calling my function on them, but I think I can be reasonably certain I
get the calls straight from the parser. Unfortunately, this doesn't put
me in a position to be controlling options(keep.source=...).

> Maybe a workable solution is something like parse(deparse(expr,
> control = "exact"), keep.source = FALSE).

Thanks for this idea! At some point I was considering hashing text
representations of objects, but then I got serialize()-hashing working
and forgot about it.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] removeSource() vs. function literals

2023-04-05 Thread Ivan Krylov

On Fri, 31 Mar 2023 08:49:53 +0200
Lionel Henry  wrote:

> If you can afford a dependency on rlang, `rlang::zap_srcref()` deals
> with this. It's recursive over expression vectors, calls (including
> calls to `function` and their hidden srcref arg), and function
> objects.

Thanks for the suggestion! I hope that the source reference argument in
the `function` calls is the last way a source reference could sneak by
in an expression obtained using substitute().

> It's implemented in C for efficiency as we found it to be a
> bottleneck in some applications (IIRC caching). I'd be happy to
> upstream this in base if R core is interested.

A removeSource() that handles all the corner cases would be a nice
improvement.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] sum(), min(), max(), prod() vs. named arguments in ...

2023-04-14 Thread Ivan Krylov



Hello R-devel,

As mentioned on Fosstodon [1] and discussed during RCOH [2], named
arguments to sum(), min(), max() and prod() are currently processed the
same way as the un-named ones. Additionally, when specifying the na.rm
argument more than once, the last specified value is silently taken
into account.

Most of this behaviour is exactly as documented, but it's arguably a
source of mistakes when the na.rm=TRUE argument is mistyped and
converted to 1.

Internally, all these calls land in do_summary(), and the na.rm
argument is processed by fixup_NaRm(). mean() is safe because it's
dispatched via S3 lookup.

Do users normally name their arguments to sum()? If not, it could be
relatively safe to make it a warning to pass named arguments to sum()
that are not na.rm and later transition it to an error:

--- src/main/summary.c  (revision 84252)
+++ src/main/summary.c  (working copy)
@@ -419,6 +419,8 @@
na_value = CAR(a);
if(prev == R_NilValue) args = CDR(a);
else SETCDR(prev, CDR(a));
+   } else if (TAG(a) && TAG(a) != R_NilValue) {
+   warning("Named argument \"%s\" here is probably a mistake", 
CHAR(PRINTNAME(TAG(a;
}
prev = a;
 }

(Can TAG(a) ever be NULL or anything other than R_NilValue or a symbol
here? We'll probably need to translate this message, too.)

Passing na.rm more than once could be made into an error right away:

--- src/main/summary.c  (revision 84252)
+++ src/main/summary.c  (working copy)
@@ -409,6 +409,7 @@
 attribute_hidden
 SEXP fixup_NaRm(SEXP args)
 {
+Rboolean seen_NaRm = FALSE;
 SEXP t, na_value;
 
 /* Need to make sure na.rm is last and exists */
@@ -415,7 +416,9 @@
 na_value = ScalarLogical(FALSE);
 for(SEXP a = args, prev = R_NilValue; a != R_NilValue; a = CDR(a)) {
if(TAG(a) == R_NaRmSymbol) {
+   if(seen_NaRm) error("Please specify na.rm only once");
+   seen_NaRm = TRUE;
if(CDR(a) == R_NilValue) return args;
na_value = CAR(a);
if(prev == R_NilValue) args = CDR(a);
else SETCDR(prev, CDR(a));

(Can we use error(_("formal argument \"%s\" matched by multiple actual
arguments"), "na.rm") to emit an already translated error message?)

Additionally, is it desirable to have S3 methods for sum() and its
friends? Currently, they are dispatched, but something confusing
happens to the arguments: according to the method, the dots contain
what I would expect them to contain (only the numbers to sum), but when
calling NextMethod() to compute the actual sum, extra arguments get
processed:

sum.foo <- function(..., ExtraArg = 999, na.rm = FALSE) {
 print(ExtraArg)
 str(...)
 NextMethod(generic = 'sum', ..., na.rm = na.rm)
}
sum(structure(100, class = 'foo'))
# [1] 999
#  'foo' num 100
# [1] 100
sum(structure(100, class = 'foo'), ExtraArg = -1)
# [1] -1
#  'foo' num 100 # <-- ExtraArg = -1 missing from ...
# [1] 99 # <-- but still subtracted

Maybe I'm just misusing NextMethod(). This is probably not important in
practice due to how S3 dispatch works.

-- 
Best regards,
Ivan

[1]
https://fosstodon.org/@nibsalot/110105034507818285

[2]
https://github.com/r-devel/rcontribution/blob/main/office_hours/2023-04-13_EMEA-APAC.md

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] sum(), min(), max(), prod() vs. named arguments in ...

2023-04-17 Thread Ivan Krylov

В Sun, 16 Apr 2023 01:34:33 -0400
Mikael Jagan  пишет:

> Forbidding or warning against tagged arguments other than 'na.rm'
> would be quite intrusive, since it is not uncommon to see
> do.call(sum, ).

Thank you for providing this counter-example! I did suspect this to be
a non-starter, but didn't have a proof.

> str(...) only gives the structure of the first argument matching the
> dots.

Also, thanks for the detailed explanation of what I was observing in my
example sum() method. I should have paid attention to my str() calls.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Warnings created during R_eval or R_tryEval not shown before R ending or R error.

2023-04-26 Thread Ivan Krylov

В Sun, 23 Apr 2023 13:33:16 -0400
Laurent Gautier  пишет:

> When tracing what happens during an error I found that
> verrorcall_dflt() in src/main/errors.c calls PrintWarnings(). That
> function is not part of R's C-API though.

I've tried reading the source code and came to a similar conclusion.

PrintWarnings() is required for warnings() to work because it creates
the last.warning variable for warnings() to access. When driving an
embedded R, R_ReplDLLdo1() will call it for you between expressions it
processes, but there doesn't seem to be a way to call it yourself.

Interestingly, there is .Internal(printDeferredWarnings()) which
eventually calls PrintWarnings(), but it's not exported as an API, only
used in a few places like try().

Would it help to run with options(warn = 1)? I think that a different
code path is taken in this case, which should emit warnings even
without a working REPL. warnings() would still be empty, unfortunately.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] save.image Non-responsive to Interrupt

2023-05-02 Thread Ivan Krylov

В Sat, 29 Apr 2023 00:00:02 +
Dario Strbenac via R-devel  пишет:

> Could save.image() be redesigned so that it promptly responds to
> Ctrl+C? It prevents the command line from being used for a number of
> hours if the contents of the workspace are large.

This is ultimately caused by serialize() being non-interruptible. A
relatively simple way to hang an R session for a long-ish time would
therefore be:

f <- xzfile(nullfile(), 'a+b')
x <- rep(0, 1e9) # approx. 8 gigabytes, adjust for your RAM size
serialize(x, f)
close(f)

This means that calling R_CheckUserInterrupt() between saving
individual objects is not enough: R also needs to check for interrupts
while saving sufficiently long vectors.

Since the serialize() infrastructure is carefully written to avoid
resource leaks on allocation failures, it looks relatively safe to
liberally sprinkle R_CheckUserInterrupt() where it makes sense to do
so, i.e. once per WriteItem() (which calls itself recursively and
non-recursively) and once per every downstream for loop iteration.
Valgrind doesn't show any new leaks if I apply the patch, interrupt
serialize() and then exit. R also passes make check after the applied
patch.

Do these changes make sense, or am I overlooking some other problem?

-- 
Best regards,
Ivan
--- src/main/serialize.c	(revision 84337)
+++ src/main/serialize.c	(working copy)
@@ -879,8 +879,10 @@
 R_xlen_t len = XLENGTH(s);
 OutInteger(stream, 0); /* place holder to allow names if we want to */
 WriteLENGTH(stream, s);
-for (R_xlen_t i = 0; i < len; i++)
+for (R_xlen_t i = 0; i < len; i++) {
+	R_CheckUserInterrupt();
 	WriteItem(STRING_ELT(s, i), ref_table, stream);
+}
 }
 
 #include 
@@ -901,6 +903,7 @@
 	R_xlen_t done, this;
 	XDR xdrs;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(int)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++)
@@ -916,6 +919,7 @@
 	/* write in chunks to avoid overflowing ints */
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, INTEGER(s) + done,
 			 (int)(sizeof(int) * this));
@@ -923,8 +927,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
 	OutInteger(stream, INTEGER(s)[cnt]);
+	R_CheckUserInterrupt();
+	}
 }
 }
 
@@ -938,6 +944,7 @@
 	R_xlen_t done, this;
 	XDR xdrs;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(double)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++)
@@ -952,6 +959,7 @@
 {
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, REAL(s) + done,
 			 (int)(sizeof(double) * this));
@@ -959,8 +967,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
+	R_CheckUserInterrupt();
 	OutReal(stream, REAL(s)[cnt]);
+	}
 }
 }
 
@@ -975,6 +985,7 @@
 	XDR xdrs;
 	Rcomplex *c = COMPLEX(s);
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	xdrmem_create(&xdrs, buf, (int)(this * sizeof(Rcomplex)), XDR_ENCODE);
 	for(int cnt = 0; cnt < this; cnt++) {
@@ -991,6 +1002,7 @@
 {
 	R_xlen_t done, this;
 	for (done = 0; done < length; done += this) {
+	R_CheckUserInterrupt();
 	this = min2(CHUNK_SIZE, length - done);
 	stream->OutBytes(stream, COMPLEX(s) + done,
 			 (int)(sizeof(Rcomplex) * this));
@@ -998,8 +1010,10 @@
 	break;
 }
 default:
-	for (R_xlen_t cnt = 0; cnt < length; cnt++)
+	for (R_xlen_t cnt = 0; cnt < length; cnt++) {
+	R_CheckUserInterrupt();
 	OutComplex(stream, COMPLEX(s)[cnt]);
+	}
 }
 }
 
@@ -1034,6 +1048,7 @@
 
  tailcall:
 R_CheckStack();
+R_CheckUserInterrupt();
 if (ALTREP(s) && stream->version >= 3) {
 	SEXP info = ALTREP_SERIALIZED_CLASS(s);
 	SEXP state = ALTREP_SERIALIZED_STATE(s);
@@ -1186,15 +1201,19 @@
 	case STRSXP:
 	len = XLENGTH(s);
 	WriteLENGTH(stream, s);
-	for (R_xlen_t ix = 0; ix < len; ix++)
+	for (R_xlen_t ix = 0; ix < len; ix++) {
+		R_CheckUserInterrupt();
 		WriteItem(STRING_ELT(s, ix), ref_table, stream);
+	}
 	break;
 	case VECSXP:
 	case EXPRSXP:
 	len = XLENGTH(s);
 	WriteLENGTH(stream, s);
-	for (R_xlen_t ix = 0; ix < len; ix++)
+	for (R_xlen_t ix = 0; ix < len; ix++) {
+		R_CheckUserInterrupt();
 		WriteItem(VECTOR_ELT(s, ix), ref_table, stream);
+	}
 	break;
 	case BCODESXP:
 	WriteBC(s, ref_table, stream);
@@ -1208,6 +1227,7 @@
 	{
 		R_xlen_t

Re: [Rd] Heap access across multiple calls from R to C++

2023-05-23 Thread Ivan Krylov

On Wed, 24 May 2023 02:08:25 +0200
 wrote:

>   initialize = function() {
> 
> .C("make_native_obj", self$native_obj)
> 
>   })

> extern "C" void make_native_obj(double *obj) {
> 
>   auto native_obj = new NativeObjStruct();
> 
>   memcpy(obj, &native_obj, sizeof(obj));  
> 
> }

> Is there a better way to do this?

The .Call() interface (where functions take an arbitrary number of
native R objects and return a native R object) combined with external
pointers is likely to be a better approach here:

https://cran.r-project.org/doc/manuals/R-exts.html#External-pointers-and-weak-references

On the R side, the returned SEXP will have class 'externalptr', with
not much to do about it programmatically. C code can additionally
register a finalizer on the object in order to release the memory
automatically after the it is discarded.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Building R from source always fails on tools:::sysdata2LazyLoadDB

2023-05-30 Thread Ivan Krylov

On Tue, 30 May 2023 20:09:53 +
Martin Morgan  wrote:

> ~/bin/R-devel/src/library/utils $   R_DEFAULT_PACKAGES=NULL
> ../../../bin/R --vanilla
> > tools:::sysdata2LazyLoadDB("/Users/ma38727/src/R-devel/src/library/utils/R/sysdata.rda","../../../library/utils/R")
> >  
> zsh: killed R_DEFAULT_PACKAGES=NULL ../../../bin/R --vanilla

Can you add the -d lldb flag to the command line (substitute lldb for a
different debugger if that's what you prefer) and get a backtrace at
the moment where R gets killed?

That signal 9 is suspicious. On Linux, I would expect a SIGKILL sent by
the OOM-Killer to a runaway process. On a Mac, I'm not sure at all, but
could it be related to some of the security features? (Does the system
save a crash report somewhere?)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] bug in utils:::format.person

2023-06-02 Thread Ivan Krylov

On Fri, 2 Jun 2023 16:55:59 +0200
Thierry Onkelinx via R-devel  wrote:

> I think I found a bug in utils::format.person when using style = "R"
> with a vector of comments. The comment section is not parsed
> properly.

Good catch! This looks like another occasion of deparse() suddenly
returning a multi-element character vector where the caller expects a
single string:

format(person(c('J.', 'Random'), 'Hacker', comment = c(ORCID =
'---', foo = 'bar bar bar bar bar')), style = 'R')
# [1] "person(given = c(\"J.\", \"Random\"),"
# [2] "   family = \"Hacker\","
# [3] "   comment = c(ORCID = \"---\", foo = \"bar
# bar bar bar bar\"))"
format(person(c('J.', 'Random'), 'Hacker', comment = c(ORCID =
'---', foo = 'bar bar bar bar bar bar')), style = 'R')
# [1] "person(given = c(\"J.\", \"Random\"),"
# [2] "   family = \"Hacker\","
# [3] "   comment = c(\"c(ORCID = \\\"---\\\", foo =
# \\\"bar bar bar bar bar bar\\\"\", \")\"))"

The following seems to fix it:

--- src/library/utils/R/citation.R  (revision 84486)
+++ src/library/utils/R/citation.R  (working copy)
@@ -1014,7 +1014,7 @@
 function(e) {
 e <- e[!vapply(e, is.null, NA)]
 cargs <-
-sprintf("%s = %s", names(e), sapply(e, deparse))
+sprintf("%s = %s", names(e), sapply(e, deparse1))
 .format_call_RR("person", cargs)
 })
 if(length(s) > 1L)

A regression test could be along the lines of:

p <- person(
 'foo', 'bar', comment = c(
  comment = 'just enough to deparse into multiple lines',
  needs = 'multiple entries'
 )
)
stopifnot(all.equal(
 eval(parse(text = format(p, style = 'R')))$comment,
 p$comment
))

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] readLines() fails on non-blocking connections when encoding="UTF-8" or encoding="ASCII"

2023-06-06 Thread Ivan Krylov

В Mon, 5 Jun 2023 21:34:37 -0700
Peter Meilstrup  пишет:

> socketSelect(list(incoming)) #TRUE
> readLines(incoming, 1) # I get character(0) (incorrect)

> readChar(incoming, 100)
> # "again\nagain\nagain\n", so readChar saw what readLines() did not

The difference turns out to be that readChar() uses con->read
in order to get data from the connection, which resolves to sock_read,
which does the right thing.

readLines(), on the other hand, uses Rconn_fgetc, which (naturally)
calls con->fgetc, which turns out to be dummy_fgetc for this connection.

The dummy_fgetc function checks whether the current connection has an
encoding translation layer active (a non-null iconv context in
con->inconv). If it does exist, a check for con->EOF_signalled is
eventually performed, returning R_EOF without trying to read more data
from the connection if the flag is set. This means that once a read
operation fails, Rconn_fgetc will keep returning EOF, even if some data
later appears on the wire.

As far as I can tell, con->EOF_signalled is only used by dummy_fgetc,
and it needs to be there in order to avoid an infinite loop where the
connection is actually at EOF (so con->navail will always be <= 0). But
should it be persistent? Can we make the flag local to a given
invocation of dummy_fgetc?

With the following patch, the problem seems to go away without causing
any `make check` failures:

--- src/main/connections.c  (revision 84506)
+++ src/main/connections.c  (working copy)
@@ -533,6 +533,7 @@
 Rboolean checkBOM = FALSE, checkBOM8 = FALSE;
 
 if(con->inconv) {
+   con->EOF_signalled = FALSE;
while(con->navail <= 0) {
/* Probably in all cases there will be at most one iteration
   of the loop. It could iterate multiple times only if
   the input

But in that case, it seems to be possible to move EOF_signalled out of
the connection structure:

--- src/include/R_ext/Connections.h (revision 84506)
+++ src/include/R_ext/Connections.h (working copy)
@@ -74,7 +74,6 @@
 /* The idea here is that no MBCS char will ever not fit */
 char iconvbuff[25], oconvbuff[50], *next, init_out[25];
 short navail, inavail;
-Rboolean EOF_signalled;
 Rboolean UTF8out;
 void *id;
 void *ex_ptr;
--- src/main/connections.c  (revision 84506)
+++ src/main/connections.c  (working copy)
@@ -400,7 +400,6 @@
tmp = Riconv_open(useUTF8 ? "UTF-8" : "", enc);
if(tmp != (void *)-1) con->inconv = tmp;
else set_iconv_error(con, con->encname, useUTF8 ? "UTF-8" : "");
-   con->EOF_signalled = FALSE;
/* initialize state, and prepare any initial bytes */
Riconv(tmp, NULL, NULL, &ob, &onb);
con->navail = (short)(50-onb); con->inavail = 0;
@@ -533,6 +532,7 @@
 Rboolean checkBOM = FALSE, checkBOM8 = FALSE;
 
 if(con->inconv) {
+   Rboolean EOF_signalled = FALSE;
while(con->navail <= 0) {
/* Probably in all cases there will be at most one iteration
   of the loop. It could iterate multiple times only if the input
@@ -544,7 +544,7 @@
const char *ib;
size_t inb, onb, res;
 
-   if(con->EOF_signalled) return R_EOF;
+   if(EOF_signalled) return R_EOF;
if(con->inavail == -2) {
con->inavail = 0;
checkBOM = TRUE;
@@ -559,7 +559,7 @@
c = buff_fgetc(con);
else
c = con->fgetc_internal(con);
-   if(c == R_EOF){ con->EOF_signalled = TRUE; break; }
+   if(c == R_EOF){ EOF_signalled = TRUE; break; }
*p++ = (char) c;
con->inavail++;
inew++;
@@ -600,7 +600,7 @@
con->description);
con->inavail = 0;
if (con->navail == 0) return R_EOF;
-   con->EOF_signalled = TRUE;
+   EOF_signalled = TRUE;
}
}
}

Again, no apparent `make check` failures. Am I introducing a
performance problem? A breaking API change?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] codetools wrongly complains about lazy evaluation in S4 methods

2023-06-13 Thread Ivan Krylov

On Sat, 3 Jun 2023 11:50:59 -0400
Mikael Jagan  wrote:

>  > setOldClass("qr")
>  > setMethod("qr.X", signature(qr = "qr"), function(qr, complete,
>  > ncol) NULL)  
> 
> The formals of the newly generic 'qr.X' are inherited from the
> non-generic function in the base namespace.  Notably, the inherited
> default value of formal argument 'ncol' relies on lazy evaluation:
> 
>  > formals(qr.X)[["ncol"]]  
>  if (complete) nrow(R) else min(dim(R))
> 
> where 'R' must be defined in the body of any method that might
> evaluate 'ncol'. To my surprise,
> tools:::.check_code_usage_in_package() complains about the undefined
> symbol:
> 
>  qr.X: no visible binding for global variable 'R'
>  qr.X,qr: no visible binding for global variable 'R'
>  Undefined global functions or variables:
>R

In other words, codetools::checkUsage(base::qr.X) says nothing while
codetools::checkUsage(TestPackage::qr.X) complains. I think the
difference is that codetools::findFuncLocals sees an assignment to `R`
in the body of base::qr.X:

codetools::findFuncLocals(formals(base::qr.X), body(base::qr.X))
# [1] "cmplx"   "cn"  "ip"  "p"   "pivoted" "R"   "res"
# [8] "tmp"

The problem, then, is that an S4 generic shouldn't be having such
assignments in its body. One way to fix this would be to modify
codetools::checkUsage to immediately return if inherits(fun,
'standardGeneric'), but I don't know enough about S4 to say whether
this is safe. (A more comprehensive fix would be to check every
encountered method against the formals of the generic, but that sounds
complicated.) Arguably, static analysis will always be wrong about
something, so we're trading a false positive for potential false
negatives.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] New behavior when running script in package directory?

2023-06-21 Thread Ivan Krylov

On Wed, 21 Jun 2023 11:49:24 -0400
Dominick Samperi  wrote:

> 1. Why does Emacs/ESS behave differently depending on the
> current working directory?
> 2. Why is the signal package loaded automatically?

There's a "package development mode" in ESS:
https://github.com/emacs-ess/ESS/blob/master/lisp/ess-r-package.el

I think that ESS automatically detects that you're inside a package
directory and adjusts its behaviour. This could be related to
"namespaced evaluation"

("evaluate code in the context of the package being developed", sounds
useful), but I'm not sure about that.

> 3. Why is that temporary file /tmp/foo.R!djSwRn created?

It's probably some part of ESS's mechanism of running R code. I don't
see where the temporary file name is constructed (maybe a built-in
Emacs function?), but it probably comes from
.
(Look for calls to .ess.source or .ess.ns_source.)

> 4. What is that function ss() in the error message referring to?

`ss` is a function defined by ESS in order to call R's source() in a
way portable between a very wide range of R versions. (A comment in the
same file says "Should work on *all* versions of R. Do not use _ in
names, nor :: , nor 1L", which includes versions of R considered very
old by now.):

https://github.com/emacs-ess/ESS/blob/master/etc/ESSR/R/.basic.R#L178

> 5. Could this be a virus? Under Ubuntu?

A virus has little reason to be doing this to you. There's much more
money to be made from malicious VSCode extensions than Emacs add-on
packages.

Since you've determined ESS to be causing the difference in the
behaviour, perhaps
 is the way
to go? Unfortunately, I'm not a user of ESS or Emacs, so I'm going off
what information I found on the Web, which may be outdated.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] dir.exists returns FALSE on Symlink directory

2023-06-26 Thread Ivan Krylov

В Mon, 26 Jun 2023 10:26:07 -0400
Dipterix Wang  пишет:

> If I symlink a directory, is symlink considered as directory in R?

It seems to work, at least on GNU/Linux:

# (on this system, /var/lock is a symbolic link pointing to /run/lock/)
system('ls -l /var/lock')
# lrwxrwxrwx 1 root root <...> /var/lock -> /run/lock
dir.exists('/var/lock')
# [1] TRUE

> If so, why `dir.exists` returns FALSE on directory?

Which operating system? Judging by your User-Agent, it must be some
version of macOS. Can you provide the output of `ls -l` on the symbolic
link?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] Errors in "An introduction to R"

2023-07-10 Thread Ivan Krylov

Dear Jarkko,

Thank you for spotting these problems and suggesting fixes for them! I
am forwarding your e-mail to the R-devel mailing list (removing the
R-help list, which is for different kind of R problems), together with
a patch implementing your suggested changes.

В Thu, 6 Jul 2023 10:12:16 +0300
Jarkko Toivonen  пишет:
> I noticed a few errors in Version 4.3.1 (2023-06-16)

--- doc/manual/R-intro.texi (revision 84676)
+++ doc/manual/R-intro.texi (working copy)
@@ -2594,7 +2594,7 @@
 
 @noindent
 does not replace the component @code{u} of the data frame, but rather
-masks it with another variable @code{u} in the working directory at
+masks it with another variable @code{u} in the workspace at
 @w{position 1} on the search path.  To make a permanent change to the
 data frame itself, the simplest way is to resort once again to the
 @code{$} notation:
@@ -2631,7 +2631,7 @@
 @subsection Working with data frames
 
 A useful convention that allows you to work with many different problems
-comfortably together in the same working directory is
+comfortably together in the same workspace is
 
 @itemize @bullet
 @item
@@ -2639,7 +2639,7 @@
 in a data frame under a suitably informative name;
 @item
 when working with a problem attach the appropriate data frame at
-@w{position 2}, and use the working directory at @w{level 1} for
+@w{position 2}, and use the workspace at @w{level 1} for
 operational quantities and temporary variables;
 @item
 before leaving a problem, add any variables you wish to keep for future
@@ -2646,7 +2646,7 @@
 reference to the data frame using the @code{$} form of assignment, and
 then @code{detach()};
 @item
-finally remove all unwanted variables from the working directory and
+finally remove all unwanted variables from the workspace and
 keep it as clean of left-over temporary variables as possible.
 @end itemize
 
@@ -2718,7 +2718,7 @@
 rather inflexible.  There is a clear presumption by the designers of
 @R{} that you will be able to modify your input files using other tools,
 such as file editors or Perl@footnote{Under UNIX, the utilities
-@command{sed} or@command{awk} can be used.} to fit in with the
+@command{sed} or @command{awk} can be used.} to fit in with the
 requirements of @R{}.  Generally this is very simple.
 
 If variables are to be held mainly in data frames, as we strongly
@@ -3576,8 +3576,15 @@
 following to use it safely.
 
 Thus given a @math{n} by @math{1} vector @math{y} and an @math{n} by
-@math{p} matrix @math{X} then @math{X \ y} is defined as
+@math{p} matrix @math{X} then
 @ifnottex
+@math{X \ y}
+@end ifnottex
+@tex
+@math{X \\ y}
+@end tex
+is defined as
+@ifnottex
 (X'X)^@{-@}X'y, where (X'X)^@{-@}
 @end ifnottex
 @tex

I chose to replace the "working directory" metaphor with "workspace". I
such word use here is correct.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] _R_CHECK_DEPENDS_ONLY_ vs. packages in .Library [was: Check package without suggests]

2023-07-19 Thread Ivan Krylov

(Moving this one idea to R-devel)

В Wed, 19 Jul 2023 09:21:46 +0200
Henrik Bengtsson  пишет:

> If you're on macOS, and have installed R the default way, it takes
> more work to test on that platform. It works out of the box on Linux
> and MS Windows.  See the '[R-SIG-Mac] CRAN installer for macOS -
> directory permissions' thread started in April 2022
> ,
> continued in May 2022
> , and
> June 2022
> . It
> was then renamed to 'System-wide site library [Was: CRAN installer
> for macOS - directory permissions]' in June 2022
> .

Currently, a check with _R_CHECK_DEPENDS_ONLY_=TRUE assumes that
.Library only has base and recommended packages. This assumption can be
broken on macOS, and also on other operating systems when R is
installed into a writeable directory or is running without installation
(e.g. R-devel from an SVN checkout) and the user doesn't pre-create a
separate library.

What would be the downsides to implementing _R_CHECK_DEPENDS_ONLY_ the
same way that _R_CHECK_NO_RECOMMENDED_ is already implemented? The
latter works by creating fake packages (with a DESCRIPTION and an empty
file called "dummy_for_check" but nothing else in them) in a temporary
library that take precedence over the ones in .Library and fail loading.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tools::parseLatex() crashes on "\\verb{}"

2023-07-20 Thread Ivan Krylov

On Thu, 20 Jul 2023 21:41:44 +0200
Antoine Fabri  wrote:

> tools::parseLatex("\\verb{hello}")
> # crashes the session

Looking at the source [*], this seems to be happening because
parseLatex expects the \verb macro to use the same character as the
delimiter on both sides:

tools::parseLatex('\\verb!hello!')
# \verb!hello!

What the loop doesn't have is a check for EOF, which leads TEXT_PUSH()
to increase the temporary buffer exponentially until unsigned int
nstext overflows and results in a 0-byte allocation, which is then
overrun, corrupting the heap. Any other unterminated \verb!... would
have caused the same crash.

Here's a patch that prevents this particular crash:

--- src/library/tools/src/gramLatex.y   (revision 84714)
+++ src/library/tools/src/gramLatex.y   (working copy)
@@ -846,8 +846,8 @@

 TEXT_PUSH('\\'); TEXT_PUSH('v'); TEXT_PUSH('e'); TEXT_PUSH('r'); 
TEXT_PUSH('b');
 TEXT_PUSH(c);
-while ((c = xxgetc()) != delim) TEXT_PUSH(c);
-TEXT_PUSH(c);
+while (((c = xxgetc()) != delim) && c != R_EOF) TEXT_PUSH(c);
+if (c != R_EOF) TEXT_PUSH(c);

 PRESERVE_SV(yylval = mkString2(stext, bp - stext));
 if(stext != st0) free(stext);

This seems to have been the only remaining while loop in gramLatex.y
that didn't check for R_EOF.

More correctness work is needed: mkMarkup() should avoid calling
mkVerb(R_EOF) when running tools::parseLatex('\\verb'), since otherwise
0xFF becomes a part of the resulting text. All declarations of unsigned
int nstext should probably be replaced by size_t nstext... but then
we'd have an annoying visit from the OOM killer instead of a much faster
crash in case of a runaway TEXT_PUSH(), and nobody expects to parse 4
GB of LaTeX source anyway. TEXT_PUSH() probably needs an integer
overflow check and to free the temporary buffer before calling error().

-- 
Best regards,
Ivan

[*]
https://github.com/r-devel/r-svn/blob/f145419cc4dae162719206a61a29082adff2043d/src/library/tools/src/gramLatex.y#L845-L850

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] tools::parseLatex() crashes on "\\verb{}"

2023-07-21 Thread Ivan Krylov

В Fri, 21 Jul 2023 15:14:09 +0200
Antoine Fabri  пишет:

> On a closer look it seems like roxygen2 introduces those, when using
> markdown backtick quoting, if the quoted content is not syntactic. For
> instance:
> 
> #' `c(c(1)`
> #' `c(c(1))`
> 
> Will convert the first line to `\verb{c(c(1)}` and the second to
> `\code{c(c(1))}`.

roxygen2 tries to do the right thing here. As defined in "Parsing Rd
files" [*], \code{} blocks are supposed to contain syntactically valid
R code. When something that is not valid R is given in a Markdown code
block, roxygen2 should not output \code{}, so it outputs \verb{}.

Also, unlike in LaTeX as understood by tools::parseLatex(), \verb{}
blocks use the {} braces in R documentation, and are understood
correctly by tools::parse_Rd(). Perhaps you also need tools::parse_Rd()?

-- 
Best regards,
Ivan

[*] https://developer.r-project.org/parseRd.pdf

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] random network disconnects

2023-08-05 Thread Ivan Krylov

В Mon, 31 Jul 2023 16:43:17 +0100
Patrick Burns  пишет:

> At work we are getting lots of issues with 'permission denied' or 
> 'network not found' and so forth when reading and writing between our 
> machines and a file server.  This happens randomly so the following 
> function solves the problem for 'cat' commands

These sound like error codes returned from the operating system, which
R has no choice but forward to the user. (Well, in this case the error
turns out to be transient, but files are unfortunately fraught with
peril with no easy solutions [*].)

Even using system call tracing or setting a symbolic debugger breakpoint
on an error return from cat() will only tell you the OS error code and
the file on which the operation failed, both of which you already know.
Getting to the root of the problem will likely involve drilling down
into the details of the exact networked filesystem used, which R is
supposed to know nothing about. Maybe you could ask on ServerFault or
on #linux / #windows / # on Libera.Char?

-- 
Best regards,
Ivan

[*] https://danluu.com/deconstruct-files/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] HTML documentation check works best with Tidy >= 5.0.0

2023-08-05 Thread Ivan Krylov

Hello R-devel,

Old versions of HTML Tidy report false positive NOTEs for the HTML
verison of the manual where Tidy encounters HTML5 features it is not
ready for.

Conveniently, both HTML5 support and release version numbers officially
appeared in HTML Tidy version 5.0.0 [*]. For example, the last version
of the "old Tidy" I could find fails on an R help page:

 cvs -z3 -d:pserver:anonym...@tidy.cvs.sourceforge.net:/cvsroot/tidy \
  co -P tidy
 cd tidy
 make -C build/gmake
 bin/tidy -v
 # HTML Tidy for Linux released on 25 March 2009
 R-devel CMD Rdconv .../R-devel/src/library/stats/man/lm.Rd -t html | \
  bin/tidy >/dev/null
 # line 4 column 1 - Warning:  inserting "type" attribute
 # line 12 column 1 - Warning:

Re: [Rd] HTML documentation check works best with Tidy >= 5.0.0

2023-08-06 Thread Ivan Krylov

В Sun, 6 Aug 2023 12:18:09 +0200
Kurt Hornik  пишет:

> IIrc all Linux versions advertize themselves as something like
> 
>   HTML Tidy for Linux version 5.8.0
> 
> What about windows and macOS?

I've checked the "modern" Windows binaries of HTML Tidy, and they say
so too. Cannot check the macOS version easily.

I think that any released version older than 5.0.0 (before the
development moved from https://tidy.sf.net/ to
https://www.html-tidy.org/ in 2015) will only identify itself by the
release date. Judging by an R-SIG-Mac thread I've found, "Apple Inc.
build 2649" that may be bundled with some macOS versions must be of the
SourceForge vintage too, before they started using version numbers.

There are commits in the "modern" Tidy source tree using 4.x.x version
numbers, but I don't think they were considered to be formally
released. For example, Debian went from the SourceForge CVS snapshots
straight to 5.2.0 in 2016 without packaging the 4.x.x versions.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] A demonstrated shortcoming of the R package management system

2023-08-07 Thread Ivan Krylov

В Sun, 6 Aug 2023 16:05:03 -0500
Dirk Eddelbuettel  пишет:

> One possibility may be to add a new (versioned) field 'Breaks:'.
> Matrix could then have added 'Breaks: SeuratObject (<= 4.1.3)'
> preventing an installation of Matrix 1.6.0 when SeuratObject 4.1.3
> (or earlier) is present, but permitting an update to Matrix 1.6.0
> alongside a new version, say, 4.1.4 of SeuratObject which could
> itself have a versioned Depends: Matrix (>= 1.6.0).

I wouldn't entirely agree that Matrix 1.6.0 breaks SeuratObject 4.1.3,
given that it's still possible to install first Matrix 1.6.0 and then
SeuratObject 4.1.3. The breakage definitely exists, but not on the
source package level.

It may also not be easy for the package developer to notice breaking a
binary package while performing reverse dependency checks, in time to
add such a notice to their package. The recommended way to do that is
tools::check_packages_in_dir(), which works on source packages.

Would it help to reframe the problem in terms of binary packages
acquiring dependency constraints that are more strict than those of the
corresponding source packages? If a package that imports S4 classes
from another package and thus ends up caching their definitions, R
could compute a hash of the classes being imported, store it together
with the installed package and complain noisily if the hash doesn't
match later at load time. This could be used to detect such problems
automatically (but could also result in false positives!).

This is not the only way a binary package could accidentally depend on
internals of another binary package. I remember reading about (but
cannot find it now!) some packages "importing" a function from ggplot2
(I think) by assigning it into their namespace:

 foo <- ggplot2::useful_function

This worked for quite a while, but later broke because
ggplot2::useful_function called an internal function which ceased to
exist in a new version of ggplot2. This is arguably a bug and probably
even harder to track, but are there any other ways to catch a "binary
dependency" for a package?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-11 Thread Ivan Krylov

Dear 叶月光,

Thank you for your message, but please follow the posting guide in your
future messages: https://www.r-project.org/posting-guide.html
https://www.r-project.org/bugs.html

I understand from your link that list.files() ends up skipping some
Chinese filenames in R-4.3.1 (but not R-4.2.2) on Windows, but would you
(or perhaps Yihui Xie who I see is also participating in the discussion)
mind translating the rest of your findings into English? Have you been
able to narrow down the problem to certain character ranges, for
example?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-12 Thread Ivan Krylov

Dear Yihui,

Thanks a lot for your help!

Unfortunately, I was not able to reproduce this. I've tried creating
files with Chinese characters in their names and populating them
with valid UTF-8 and valid non-UTF-8 text, but R seems to be able to
list them all in my case.

I'm running a US English evaluation ISO image of a slightly newer build
of Windows 10, and I also compiled R-4.3.1 from source, anticipating
having to single-step through the list.files() implementation:

sessionInfo()
# R version 4.3.1 (2023-06-16 ucrt)
# Platform: x86_64-w64-mingw32/x64 (64-bit)
# Running under: Windows 10 x64 (build 19045)
# 
# Matrix products: default
# 
# 
# locale:
# [1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United
# States.utf8
# [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
# [5] LC_TIME=English_United States.utf8
# 
# time zone: America/Los_Angeles
# tzcode source: internal
# 
# attached base packages:
# [1] stats graphics  grDevices utils datasets  methods   base
#
# loaded via a namespace (and not attached):
# [1] compiler_4.3.1
dir("测试文件")
# [1] "测试中文-non-utf8-Ъ.txt" "测试中文-utf-8.txt" 
system('cmd /c dir /s *.txt')
#  Volume in drive C has no label.
#  Volume Serial Number is A85A-AA74
# 
#  Directory of C:\R\R-4.3.1\bin\x64\
# 
# 08/12/2023  07:57 AM22 -non-utf8-?.txt
# 08/12/2023  07:56 AM18 -utf-8.txt
#2 File(s) 40 bytes
# 
#  Total Files Listed:
#2 File(s) 40 bytes
#0 Dir(s)  29,538,418,688 bytes free
# [1] 0

(The OEM codepage cannot represent the characters I used in the file
names, but all the files are present in both lists.)

In order to find out what's wrong, it will be needed to download the R
source code and compile it [*], install gdb using pacman (part of
Rtools), then set a breakpoint on the list_files function from
src/main/platform.c and step through it [**], paying attention to the
R_readdir calls. Do the missing file names not even come out from
FindNextFile()? Are they somehow skipped around the time of regex match?

(I could help with the details of this, maybe off-list, if there's
interest.)

Unless Tomas Kalibera is able to deduce the root cause from the
observed symptoms, someone who can reproduce the problem will have to
investigate further.

-- 
Best regards,
Ivan

[*] https://cran.r-project.org/bin/windows/base/howto-R-devel.html

[**] https://beej.us/guide/bggdb/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-13 Thread Ivan Krylov

Dear 叶月光,

I believe you that there's a problem with list.files() and file names
in Chinese. There is no need for additional proof. Unfortunately, it's
impossible to fix the problem unless its source is found:
https://www.chiark.greenend.org.uk/~sgtatham/bugs-cn.html

Can you give me more examples of file names, _as text_, that I could
_copy and paste_ into my computer in order to (hopefully) reproduce the
problem here?

Alternatively, can you use a debugger for programs written in C? Do you
know someone who does?

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-13 Thread Ivan Krylov

Found it! Looks like a buffer length problem. This isn't limited to
Chinese, just more likely to happen when a character takes three bytes
to represent in UTF-8. (Any filename containing characters which take
more than one byte to represent in UTF-8 may fail.)

If a directory contains a file with a sufficiently long name,
FindNextFile() fails with ERROR_MORE_DATA (0xEA, 234), making
R_readdir() return NULL, stopping list_files() prematurely:

# everything seems to work fine...

list.files("测试文件")
# [1] "测试中文-non-utf8-Ъ
测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文.txt"
# [2] "测试中文-non-utf8-Ъ.txt"
# [3] "测试中文-utf-8.txt"

# now create a file with an even longer name

list.files("测试文件")
# [1] "测试中文-non-utf8-Ъ
测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文测试中文.txt"

# the files are still there, but not visible to list.files():

system("cmd /c dir /s *.txt")
#  Volume in drive C has no label.
#  Volume Serial Number is A85A-AA74
#
#  Directory of C:\R\R-4.3.1\bin\x64\
#
# 08/12/2023  07:57 AM22 -non-utf8-?
.txt
# 08/12/2023 07:57 AM22 -non-utf8-?
.txt
# 08/12/2023  07:57 AM22 -non-utf8-?.txt
# 08/12/2023  07:56 AM18 -utf-8.txt
# 4 File(s) 84 bytes
# 
#   Total Files Listed:
#4 File(s) 84 bytes
#0 Dir(s)  29,281,538,048 bytes free
# [1] 0

Increasing the path length limits [*] doesn't help, since it's the
filename length limit that we're bumping against. While both
WIN32_FIND_DATAA and WIN32_FIND_DATAW contain fixed-size buffers, a
valid filename may take more than MAX_PATH bytes to represent in UTF-8
while still being under the limit of MAX_PATH wide characters. This may
mean having to rewrite list_files in terms of R_wopendir()/R_wreaddir()
for Windows. As a workaround, we may use the short filename (which
sometimes may not exist, alas) when FindNextFile() fails with
ERROR_MORE_DATA.

-- 
Best regards,
Ivan

[*]
https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-15 Thread Ivan Krylov

В Tue, 15 Aug 2023 08:38:11 +0200
Tomas Kalibera  пишет:

> As this was reported to be regression in 4.3, it is entirely possible 
> this change came with a regression (though a bit surprising we didn't 
> catch it earlier by testing), so it would be a great help if I could 
> have the example and debug it.

Sorry, let me try to be more clear.

The Windows filename length limit is 255(?) wide characters. The
WIN32_FIND_DATAA structure contains a 260-byte buffer for the filename
to be returned by FindFirstFileA()/FindNextFileA(). If a wide character
takes more than one byte to be represented in UTF-8, it may overflow
the 260 byte limit in the WIN32_FIND_DATAA structure despite being
below the 260 wide character limit. When such an overflow happens,
FindNextFile() returns FALSE with GetLastError() == ERROR_MORE_DATA,
which results in R_readdir() returning NULL and makes list_files() stop
before listing the rest of the directory.

This is easier to make happen by accident with Chinese characters,
because they take three UTF-8 bytes per character.

Take the ø (\uf8) letter. It takes two bytes to represent in UTF-8.
Create a file with a name consisting of this symbol repeated 140 times.
When you run list.files() on the resulting directory on Windows with a
UTF-8 locale, Windows tries to fit (0xc3 0xb8) times 140 into a
260-byte buffer, which doesn't work. I'm afraid the only way to avoid
such a failure is to rewrite R_readdir using the wide character API and
convert the file names on the fly. (Just like mingw readdir() did in
the past?)

stopifnot(.Platform$OS.type == 'windows', l10n_info()$`UTF-8`)
# any character for which nchar(enc2utf8(.), 'bytes') > 1 will do
# any number >260/2 should do
file.create(strrep('\uf8', 140))
list.files()

Does this work? I don't have access to a UTF-8 Windows machine right
now.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R-4.3 version list.files function could not work correctly in chinese

2023-08-16 Thread Ivan Krylov

On Wed, 16 Aug 2023 09:42:09 +0200
Tomas Kalibera  wrote:

> Fixed in R-devel (84960). Please let me know if you see any problem
> with the fix.

Thank you for implementing the fix! I gave 叶月光 the link to the
GitHub Action build of the r84960 installer.

I'm worried that 叶月光 was seeing FindNextFileA fail for a different
reason (all the examples given at the Capital of Statistics forum
seemed to use less than 256/4 = 64 characters per file name...), but
maybe this won't reappear with the switch to FindNextFileW. If this
keeps happening, it might be worth producing a warning when
FindNextFileW() fails with an unexpected GetLastError() value.

fs::dir_fs() uses NtQueryDirectoryFile() and WideCharToMultiByte()
instead of FindNextFileW() and wcstombs(), but maybe this shouldn't
matter. In particular, both list.files() and fs::dir_fs() would fail
given a file name that cannot be represented in UTF-8 (invalid UTF-16
surrogate pairs?)

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Security assessment

2023-08-30 Thread Ivan Krylov

В Tue, 29 Aug 2023 15:43:24 +
"Jones, Jonathan D [US] (SP)"  пишет:

> Has any consideration or work been done to document or perform
> vulnerability testing for the R packages?

Is is specifically about third-party R packages or about R ecosystem as
a whole, including R itself?

This depends on your threat model, but generally, it's best not to
process untrusted data in R. As an example, see the stack overflow in
unserialize(), reported on R Bugzilla a few years ago. I am not aware
of any current vulnerabilities in R's built-in help server or the
network server packages hosted on CRAN, but I am not aware of them
having passed security audits, either, so it's best not to let R listen
on network ports on public networks.

> It would be a huge help to have a way to package whatever
> tools/libraries/etc into a adhoc package

CRAN packages are supposed to declare their third-party dependencies in
the SystemRequirements: field of their DESCRIPTION, but that's not the
only way a package could be bringing third-party code in the address
space of the R process. For example, some packages bundle their
dependencies inside the package archive without declaring anything,
which may mean falling behind in terms of security updates.

The CRAN team does their best to enforce the policy regarding the
third-party dependencies [*], but CRAN packages come with no warranty.

> or a breakdown of an R release contents

Would R Installation and Administration [**] help? R has a few
third-party dependencies, slightly different depending on the platform
(Windows/macOS/Unix-alikes), so make sure to check the sections for all
operating systems.

> Developers ask for specific library files and if I could map them to
> a package it would greatly reduce the amount of research.

By library files, do you mean external dependencies of a package, the
packages themselves, or something completely different?

-- 
Best regards,
Ivan

[*] https://cran.r-project.org/web/packages/policies.html

[**] https://cran.r-project.org/doc/manuals/r-release/R-admin.html

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Updated `dendrapply`

2023-09-01 Thread Ivan Krylov

В Fri, 1 Sep 2023 09:17:57 +
"Lakshman, Aidan H"  пишет:

> - Ivan previously mentioned issues with user specific `[[.dendrogram`
> implementations, and it doesn't seem that you've fixed that.

> This is correct. I discovered during the R project sprint that
> `stats::dendrapply` does not respect user-specific implementations of
> `[[.dendrogram`. stats::`[[.dendrogram` has its own issues; if the
> user defines multiple classes for a dendrogram object, double bracket
> subsetting will remove the class (a bug report will be filed for this
> shortly).

True, my warning about not handing potential subclasses of dendrogram
was purely theoretical.

(A hypothetical subclass of dendrogram could work with the current
[[.dendrogram if it ensured that its own class name always precedes
'dendrogram' in the class vector, thus never being downstream from
stats:::`[[.dendrogram` in a chain of NextMethod() calls. But that's
still hypothetical.)

I see that your current implementation very nicely bounds the PROTECT()
stack usage and avoids the need to deallocate arbitrary SEXPs, which is
awkward to do with R's garbage collector API. Congratulations!

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] On PRINTNAME() encoding, EncodeChar(), and being painted into a corner

2023-09-18 Thread Ivan Krylov

Hello R-devel,

I have originally learned about this from the following GitHub issue:
. In short,
in various places of the R source code, symbol names are accessed using
translateChar(), EncodeChar(), and CHAR(), and it might help to unify
their use.

Currently, R is very careful to only create symbols with names in the
native encoding. I have verified this by tracing the ways a symbol can
be created (allocSExp) or have a name assigned (SET_PRINTNAME) using
static analysis (Coccinelle). While it's possible to create a symbol
with a name in an encoding different from the native encoding using
SET_PRINTNAME(symbol, mkCharCE(...)), neither R nor CRAN packages
invoke code like this for an arbitrary encoding; symbols are always
created using either install() or installTrChar(). (install("invalid
byte sequence") is, of course, still possible, but is a different
problem.)

This means that translateChar(PRINTNAME(...)) is currently unnecessary,
but it may be worth adding a check (opt-in, applicable only during R
CMD check, to avoid a performance hit?) to SET_PRINTNAME() to ensure
that only native-encoding (or ASCII) symbol names are used. I could also
suggest a patch for Writing R Extensions or R Internals to document this
assumption.

The following translateChar() doesn't hurt (it returns CHAR(x) right
away without allocating any memory), but it stands out against most
uses of CHAR(PRINTNAME(.)) and EncodeChar(PRINTNAME(.)):

--- src/main/subscript.c(revision 85160)
+++ src/main/subscript.c(working copy)
@@ -186,7 +186,7 @@
PROTECT(names);
for (i = 0; i < nx; i++)
if (streql(translateChar(STRING_ELT(names, i)),
-  translateChar(PRINTNAME(s {
+  CHAR(PRINTNAME(s {
indx = i;
break;
}

The following translateChar() can be safely replaced with EncodeChar(),
correctly printing funnily-named functions in tracemem() reports:

--- src/main/debug.c(revision 85160)
+++ src/main/debug.c(working copy)
@@ -203,7 +203,7 @@
&& TYPEOF(cptr->call) == LANGSXP) {
SEXP fun = CAR(cptr->call);
Rprintf("%s ",
-   TYPEOF(fun) == SYMSXP ? translateChar(PRINTNAME(fun)) :
+   TYPEOF(fun) == SYMSXP ? EncodeChar(PRINTNAME(fun)) : 
"");
}
 }

tracemem(a <- 1:10)
`\r\v\t\n` <- function(x) x[1] <- 0
`\r\v\t\n`(a)
# Now correctly prints:
# tracemem[0x55fd11e61e00 -> 0x55fd1081d2a8]: \r\v\t\n
# tracemem[0x55fd1081d2a8 -> 0x55fd113277e8]: \r\v\t\n

What about EncodeChar(PRINTNAME(.))? This is the intended way to report
symbols in error messages. Without EncodeChar(),
.Internal(`\r\v\t\n`()) actually prints the newlines to standard output
as part of the error message instead of escaping them. Unfortunately,
EncodeChar() uses a statically-allocated buffer for its return value,
*and* the comments say that it's unsafe to use together with
errorcall(): errorcall_cpy() must be used instead. I think that's
because EncodeChar() may be called while deparsing the call,
overwriting the statically-allocated buffer before the format arguments
(which also contain the return value of EncodeChar()) are processed. In
particular, this means that EncodeChar() is unsafe to use with any kind
of warnings. The following Coccinelle script locates uses of
CHAR(PRINTNAME(.)) inside errors and warnings:

@@
expression x;
expression list arg1, arg2;
identifier fun =~ "(Rf_)?(error|warning)(call)?(_cpy)?";
@@
 fun(
  arg1,
* CHAR(PRINTNAME(x)),
  arg2
 )

Some of these, which already use errorcall(), are trivial to fix by
replacing CHAR() with EncodeChar() and upgrading errorcall() to
errorcall_cpy():

--- src/main/names.c
+++ src/main/names.c
@@ -1367,7 +1367,7 @@ attribute_hidden SEXP do_internal(SEXP c
errorcall(call, _("invalid .Internal() argument"));
 if (INTERNAL(fun) == R_NilValue)
-   errorcall(call, _("there is no .Internal function '%s'"),
+   errorcall_cpy(call, _("there is no .Internal function '%s'"),
- CHAR(PRINTNAME(fun)));
+ EncodeChar(PRINTNAME(fun)));
 
 #ifdef CHECK_INTERNALS
 if(R_Is_Running > 1 && getenv("_R_CHECK_INTERNALS2_")) {

--- src/main/eval.c
+++ src/main/eval.c
@@ -1161,7 +1161,7 @@ SEXP eval(SEXP e, SEXP rho)
const char *n = CHAR(PRINTNAME(e));
-   if(*n) errorcall(getLexicalCall(rho),
+   if(*n) errorcall_cpy(getLexicalCall(rho),
 _("argument \"%s\" is missing, with no default"),
-CHAR(PRINTNAME(e)));
+EncodeChar(PRINTNAME(e)));
else errorcall(getLexicalCall(rho),
   _("argument is missing, with no default"));
}

--- src/main/match.c
+++ src/main/match.c
@@ -229,7 +229,7 @@ attribute_hidden SEXP matchArgs_NR(SEXP

Re: [Rd] proposal: 'dev.capabilities()' can also query Unicode capabilities of current graphics device

2023-09-23 Thread Ivan Krylov

On Wed, 20 Sep 2023 12:39:50 +0200
Martin Maechler  wrote:

> The problem is that some pdf *viewers*,
> notably `evince` on Fedora Linux, for several years now,
> do *not* show *some* of the UTF-8 glyphs because they do not use
> the correct fonts 

One more problem that makes it nontrivial to use Unicode with pdf() is
the graphics device not knowing some of the font metrics:

x <- '\u410\u411\u412'
pdf()
plot(1:10, main = x)
# Warning messages:
# 1: In title(...) : font width unknown for character 0xb0
# 2: In title(...) : font width unknown for character 0xe4
# 3: In title(...) : font width unknown for character 0xfc
# 4: In title(...) : font width unknown for character 0x7f
dev.off()

In the resulting PDF file, the three letters are visible, at least in
Evince 3.38.2, but they are all positioned in the same space.

I understand that this is strictly speaking not pdf()'s fault
(grDevices contains the font metrics for all standard Adobe fonts and a
few more), but I'm not sure what to do as a user. Should I call
pdfFonts(...), declaring a font with all symbols I need? Where does one
even get Type-1 Cyrillic Helvetica (or any other font) with separate
font metrics files for use with pdf()?

Actually, the wrong number of sometimes random character codes reminds
me of stack garbage. In src/library/grDevices/src/devPS.c, function
static double PostScriptStringWidth, there's this bit of code:

if(!strIsASCII((char *) str) &&
   /*
* Every fifth font is a symbol font:
* see postscriptFonts()
*/
   (face % 5) != 0) {
R_CheckStack2(strlen((char *)str)+1);
char buff[strlen((char *)str)+1];
/* Output string cannot be longer */
mbcsToSbcs((char *)str, buff, encoding, enc);
str1 = (unsigned char *)buff;
}

Later the characters in str1 are iterated over in order to calculate
the total width of the string. I didn't notice this myself until I saw
in the debugger that after a few iterations of the loop, the contents
of str1 are completely different from the result of mbcsToSbcs((char
*)str, buff, encoding, enc), and went to investigate. Only after the
debugger told me that there's no variable called "buff" I realised that
the VLA pointed to by str1 no longer exists.

--- src/library/grDevices/src/devPS.c   (revision 85214)
+++ src/library/grDevices/src/devPS.c   (working copy)
@@ -721,6 +721,8 @@
 unsigned char p1, p2;

 int status;
+/* May be about to allocate */
+void *alloc = vmaxget();
 if(!metrics && (face % 5) != 0) {
/* This is the CID font case, and should only happen for
   non-symbol fonts.  So we assume monospaced with multipliers.
@@ -755,9 +757,8 @@
* Every fifth font is a symbol font:
* see postscriptFonts()
*/
-  (face % 5) != 0) {
-   R_CheckStack2(strlen((char *)str)+1);
-   char buff[strlen((char *)str)+1];
+  (face % 5) != 0 && metrics) {
+   char *buff = R_alloc(strlen((char *)str)+1, 1);
/* Output string cannot be longer */
mbcsToSbcs((char *)str, buff, encoding, enc);
str1 = (unsigned char *)buff;
@@ -792,6 +793,7 @@
}
}
 }
+vmaxset(alloc);
 return 0.001 * sum;
 }

After this patch, I'm consistently getting the right character codes in
the warnings, but I still don't know how to set up the font metrics.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Crashes and other problems uncovered by static analysis

2023-09-26 Thread Ivan Krylov

Hello R-devel,

I've been trying to see if static analysers could have caught PR18602.
Surprisingly, the analysers I've tried don't seem to notice it, but
they do uncover other problems.

While most of these either stem from slightly too defensive programming
("condition is always true/false" warnings), are mostly harmless style
issues ("variable assigned twice without using it in between" /
"unreachable assignment of  after NORET Rf_error(...)"),
are unlikely to cause issues in practice (a few unchecked dereferences
of pointers that could in theory be null), or are just plain false
positives (e.g. it's a warning every time the code calls snprintf(...,
gettext(...), ...)), I now also have a small collection of stupid ways
to crash R.

For example, there are stack buffer overflows in contour() (caused by
unrealistically long labels) and the "concise traceback" generator
(caused by unrealistically long function names). (The runtime
terminates R immediately upon noticing the overflow; the SEGV handler
is not called.) Also, there's a mostly harmless integer overflow in
polyroot() that may end up asking for 134217728 Tb of memory given
gigabyte-sized input. Additionally, gzcon() (but *not* gzfile()) will
silently fail to work with .gz files containing embedded filenames that
have 0xff bytes in them (e.g. ÿ on Latin-1 systems or Ъ on KOI-8
systems).

I'm approximately halfway through the ~600 warnings, so there will
likely be a few more real problems. I can post crashes (and likely
fixes) to the Bugzilla, unlikely as they are to happen in real life,
but how serious should a non-crashing problem be to warrant a PR and a
suggested patch? I'd rather avoid making it like the SQLite forum where
anyone with a new fuzzer or a static analyser posts a crash (or just a
suspected warning) then goes off to celebrate and pad their CV.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] proposal: 'dev.capabilities()' can also query Unicode capabilities of current graphics device

2023-09-26 Thread Ivan Krylov

On Tue, 26 Sep 2023 12:39:57 +1300
Paul Murrell  wrote:

> Yes, you can set up your own font and TeX installations are a good 
> source of Type 1 fonts.  Here is an example (paths obviously specific
> to my [Ubuntu 20.04] OS and TeX installation) ...

Many thanks for the explanation! Your example works on my system
unchanged. It's notable that the PDF files produced with and without
the patch only differ in /CreationDate and /ModDate. This may mean that
the memory problem is much less serious than it looked.

I'll remember to use the fonts from my TeX installation together with
pdf().

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Minor bug with stats::isoreg

2023-09-27 Thread Ivan Krylov

В Wed, 27 Sep 2023 13:49:58 -0700
Travers Ching  пишет:

> Calling isoreg with an Inf value causes a segmentation fault, tested
> on R 4.3.1 and R 4.2. A reproducible example is: `isoreg(c(0,Inf))`

Indeed, the code in src/library/stats/src/isoreg.c contains the
following loop:

do {
slope = R_PosInf;
for (i = known + 1; i <= n; i++) {
tmp = (REAL(yc)[i] - REAL(yc)[known]) / (i - known);
// if `tmp` becomes +Inf or NaN...
// or both `tmp` and `slope` become -Inf...
if (tmp < slope) { // <-- then this is false
slope = tmp;
ip = i; // <-- so this assignment never happens
}
}/* tmp := max{i= kn+1,.., n} slope(p[kn] -> p[i])  and
  *  ip = argmax{...}... */
INTEGER(iKnots)[n_ip++] = ip; // <-- heap overflow and crash
// ...
} while ((known = ip) < n); // <-- this loop never terminates

I'm not quite sure how to fix this. Checking for tmp <= slope would
have been a one-character patch, but it changes the reference outputs
and doesn't handle isnan(tmp), so it's probably not correct. The
INTEGER(iKnots)[n_ip++] = ip; assignment should only be reached in case
of knots, but since the `ip` index never progresses past the
+/-infinity, the knot condition is triggered repeatedly.

Least squares methods don't handle infinities well anyway, so maybe
it's best to put the check in the R function instead:

--- src/library/stats/R/isoreg.R(revision 85226)
+++ src/library/stats/R/isoreg.R(working copy)
@@ -22,8 +22,8 @@
 {
 xy <- xy.coords(x,y)
 x <- xy$x
-if(anyNA(x) || anyNA(xy$y))
-   stop("missing values not allowed")
+if(!all(is.finite(x)) || !all(is.finite(xy$y)))
+   stop("missing and infinite values not allowed")
 isOrd <- ((!is.null(xy$xlab) && xy$xlab == "Index")
   || !is.unsorted(x, strictly = TRUE))
 if(!isOrd) {


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R-pkg-devel] Problem with "compacting" pdf files.

2023-10-02 Thread Ivan Krylov

Dear Rolf,

(Moving this one to R-devel...)

On Sun,  1 Oct 2023 21:01:13 +
Rolf Turner  wrote:

> I *really* think that the instructions from CRAN could have been
> clearer!  Without your guidance I'd have been at a total loss.

Since the CRAN e-mails quote the R CMD check messages verbatim, would
it have been enough if R CMD check suggested using --compact-vignettes?

Index: src/library/tools/R/check.R
===
--- src/library/tools/R/check.R (revision 85249)
+++ src/library/tools/R/check.R (working copy)
@@ -3079,7 +3079,8 @@
  "  'qpdf' made some significant size reductions:\n",
  paste("  ", res, collapse = "\n"),
  "\n",
- "  consider running tools::compactPDF() on these 
files\n")
+ "  consider running tools::compactPDF() on these 
files,\n",
+ "  or build the source package with 
--compact-vignettes\n")
 }
 if (R_check_doc_sizes2) {
 gs_cmd <- find_gs_cmd()
@@ -3093,7 +3094,8 @@
  "  'gs+qpdf' made some significant size 
reductions:\n",
  paste("  ", res, collapse = "\n"),
  "\n",
- '  consider running 
tools::compactPDF(gs_quality = "ebook") on these files\n')
+ '  consider running 
tools::compactPDF(gs_quality = "ebook") on these files,\n',
+ '  or build the source package with 
--compact-vignettes=both\n')
 }
 } else {
 if (!any) noteLog(Log)

Or is there anything else you would prefer to be reworded? Should the
message link to Writing R Extensions, section 1.4? Recently there was a
project to improve the R CMD check messages [*], but I managed to miss
almost all of it.

-- 
Best regards,
Ivan

[*] https://github.com/r-devel/r-project-sprint-2023/issues/55

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [External] On PRINTNAME() encoding, EncodeChar(), and being painted into a corner

2023-10-03 Thread Ivan Krylov

Dear Luke Tierney,

Thank you for the reply and apologies for not getting back to you
earlier.

On Fri, 22 Sep 2023 16:14:58 -0500 (CDT)
luke-tier...@uiowa.edu wrote:

> I think it would be best to modify errorcall so errorcall_cpy is not
> necessary. As things are now it is just too easy to forget that
> sometimes errorcall_cpy should be used (and this has lead to some bugs
> recently).

At the end of this e-mail is a large patch that makes errorcall() and
warningcall() safer by processing the format arguments before calling
any R APIs. It's fairly invasive because it rewires all error() /
warning() / errorcall() / warningcall() processing into two common
subroutines that call R_vsnprintf() as soon as possible and keep a
single message buffer afterwards. It passes make check-devel. I am open
to other, less invasive ways to implement this change.

Additionally, I fixed a problem detected by a static analyser: given
unsigned msg_len, max(msg_len - strlen(head), 0) can only be 0 if
msg_len == strlen(head). That's because both msg_len and size_t end up
being promoted to unsigned in order to compute the subtraction. I
couldn't come up with a sufficiently laconic alternative, so I left the
ternary operator in for now.

>> The only solution to the latter problem is an EncodeChar() variant
>> that allocates its memory dynamically. Would R_alloc() be
>> acceptable in this context? With errors, the allocation stack would
>> be quickly reset (except when withCallingHandlers() is in effect?),
>> but with warnings, the code would have to restore it manually every
>> time.  
> 
> Or allow/require a buffer to be provided. So replacing the calls like
> 
> CHAR(PRINTNAME(sym))
> 
> with
> 
> EncodeSymbol(sym, buf, buf_size)

I can also implement this. Does this mean replacing every occasion of
EncodeChar(PRINTNAME(sym)) with EncodeSymbol(sym, , sizeof
)?

Index: src/main/envir.c
===
--- src/main/envir.c(revision 85251)
+++ src/main/envir.c(working copy)
@@ -1582,7 +1582,7 @@
}
rho = ENCLOS(rho);
 }
-errorcall_cpy(call,
+errorcall(call,
   _("could not find function \"%s\""),
   EncodeChar(PRINTNAME(symbol)));
 /* NOT REACHED */
@@ -3924,7 +3924,7 @@
 SEXP nsname = PROTECT(callR1(R_getNamespaceNameSymbol, ns));
 if (TYPEOF(nsname) != STRSXP || LENGTH(nsname) != 1)
errorcall(call, "bad value returned by `getNamespaceName'");
-errorcall_cpy(call,
+errorcall(call,
  _("'%s' is not an exported object from 'namespace:%s'"),
  EncodeChar(PRINTNAME(name)),
  CHAR(STRING_ELT(nsname, 0)));
Index: src/main/errors.c
===
--- src/main/errors.c   (revision 85251)
+++ src/main/errors.c   (working copy)
@@ -58,7 +58,7 @@
 /*
 Different values of inError are used to indicate different places
 in the error handling:
-inError = 1: In internal error handling, e.g. `verrorcall_dflt`, others.
+inError = 1: In internal error handling, e.g. `errorcall_dflt`, others.
 inError = 2: Writing traceback
 inError = 3: In user error handler (i.e. options(error=handler))
 */
@@ -387,32 +387,45 @@
return c ? c->call : R_NilValue;
 }
 
-void warning(const char *format, ...)
+/* declarations for internal condition handling */
+
+static void signalError(SEXP call, const char *msg);
+static void signalWarning(SEXP call, const char *msg);
+NORET static void invokeRestart(SEXP, SEXP);
+
+static void impl_vwarning(SEXP call, Rboolean immediate, const char *format, 
va_list ap)
 {
 char buf[BUFSIZE], *p;
 
-va_list(ap);
-va_start(ap, format);
 size_t psize;
 int pval;
 
 psize = min(BUFSIZE, R_WarnLength+1);
 pval = Rvsnprintf_mbcs(buf, psize, format, ap);
-va_end(ap);
 p = buf + strlen(buf) - 1;
 if(strlen(buf) > 0 && *p == '\n') *p = '\0';
 RprintTrunc(buf, pval >= psize);
-SEXP call = PROTECT(getCurrentCall());
-warningcall(call, "%s", buf);
-UNPROTECT(1);
+
+// must not call into R before stringifying the format arguments
+int nprotect = 0;
+if (!call) {
+   call = PROTECT(getCurrentCall());
+   ++nprotect;
+}
+if (immediate) immediateWarning = 1;
+signalWarning(call, buf);
+if (immediate) immediateWarning = 0;
+UNPROTECT(nprotect);
 }
 
-/* declarations for internal condition handling */
+void warning(const char *format, ...)
+{
+va_list(ap);
+va_start(ap, format);
+impl_vwarning(NULL, FALSE, format, ap);
+va_end(ap);
+}
 
-static void vsignalError(SEXP call, const char *format, va_list ap);
-static void vsignalWarning(SEXP call, const char *format, va_list ap);
-NORET static void invokeRestart(SEXP, SEXP);
-
 static void reset_inWarning(void *data)
 {
 inWarning = 0;
@@ -437,12 +450,11 @@
 return nc;
 }
 
-static void vwarningcall_dflt(SEXP call, const char

Re: [Rd] as(, "dgTMatrix")' is deprecated.

2023-10-03 Thread Ivan Krylov

On Tue, 3 Oct 2023 16:50:55 +
"Koenker, Roger W"  wrote:

>  I thought it might come from Rmosek, but mosek folks don’t think so.

I downloaded the Rmosek source package using

download.packages(
 'Rmosek', '.',
 repos='https://download.mosek.com/R/10.1'
)

...and there are the deprecated calls:


Rmosek/R$ grep -C2 -r as\(.*dgT
toCSCMatrix.R-  }
toCSCMatrix.R-  else if (is(obj,"dgCMatrix")) {
toCSCMatrix.R:obj <- as(obj,"dgTMatrix")
toCSCMatrix.R-  }
toCSCMatrix.R-  else if (is(obj,"list") && 
setequal(names(obj),c("i","j","v","ncol","nrow"))) {
--
toCSCMatrix.R- x=obj[['v']],
toCSCMatrix.R- dims=c(obj[['nrow']], obj[['ncol']]) )
toCSCMatrix.R:obj <- as(tmp, "dgTMatrix")
toCSCMatrix.R-  }
toCSCMatrix.R-  else if (canCoerce(obj,"dgTMatrix")) {
toCSCMatrix.R-# Assume coercion is meaningful, and that
toCSCMatrix.R-# users are aware of computational overhead.
toCSCMatrix.R:obj <- as(obj,"dgTMatrix")
toCSCMatrix.R-  }
toCSCMatrix.R-  else {


-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] system()/system2() using short paths of commands on Windows?

2023-10-31 Thread Ivan Krylov

On Mon, 30 Oct 2023 15:36:50 -0500
Yihui Xie  wrote:

> I have read about "system() not using a shell on Windows" on the help
> page many times before but never understood what it means technically.

This means resolving the path to the executable, then calling
CreateProcess() to launch that executable with the given parameters and
other settings:
https://github.com/wch/r-source/blob/635a672151a18a0e475986af592fab59e7479a9b/src/gnuwin32/run.c#L378-L386
https://learn.microsoft.com/en-us/windows/win32/api/processthreadsapi/nf-processthreadsapi-createprocessa

As far as I know, CreateProcess() is the most direct officially
supported way to launch a process on Windows.

The alternative would be assembling the command line and giving it to
cmd.exe to interpret and launch, as done by shell(). This can be more
convenient in certain cases, as the shell can expand environment
variables from %STRINGS% or launch multiple commands separated by &,
but it also requires extreme care: quoting rules are subtle and have to
be understood on both the cmd.exe level *and* the C runtime level
(on Windows, the C runtime is responsible for parsing the string
returned from GetCommandLine() and creating the `argv` command line
argument array from it).

Working with shell command lines is like using string operations on
deparse() output instead of directly operating on language objects:
less complicated than rocket surgery, but better avoided if
alternatives exist.

-- 
Best regards,
Ivan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] svd() of a 30000 by 30000 matrix segfaults: 32-bit length overflow in LAPACK?

2023-11-03 Thread Ivan Krylov

Dear Dr. Robert M Flight, dear R-developers,

By an accident, I've noticed this problem reported on Mastodon [1].

On a computer with ≥32G of RAM, running the following code (which may
take 5 CPU-hours and allocate 27G of RAM!) results in a segfault:

n_val <- 3
tmp_matrix <- matrix(rnorm((n_val ^ 2) / 2), nrow = n_val, ncol = n_val)
svd_res <- svd(tmp_matrix)

Tim Taylor says [2] that this doesn't fail for a 2 by 2 matrix,
which suggests a 32-bit integer overflow: 3*3*8 is 7.2e9, which
is more than 4.3e9, which is more than 2^32, while 2*2*8 is
3.2e9, slightly below the threshold.

Here's the backtrace from the crash:

#0  0x7462ac71 in dlasd3 (nl=468, nr=467, sqre=1, k=921, d=..., q=..., 
ldq=921, dsigma=..., u=...,
ldu=3, u2=..., ldu2=936, vt=..., ldvt=3, vt2=..., ldvt2=937, 
idxc=..., ctot=..., z=..., info=0)
at ../../../../src/modules/lapack/dlapack.f:82312
#1  0x7464578b in dlasd1 (nl=468, nr=467, sqre=1, d=..., 
alpha=-0.58048560936028271,
beta=-0.40955467846435345, u=..., ldu=3, vt=..., ldvt=3, idxq=..., 
iwork=..., work=..., info=0)
at ../../../../src/modules/lapack/dlapack.f:81271
#2  0x74681bf2 in dlasd0 (n=29964, sqre=0, d=..., e=..., u=..., 
ldu=3, vt=..., ldvt=3, smlsiz=25,
iwork=..., work=..., info=0) at 
../../../../src/modules/lapack/dlapack.f:80956
#3  0x7468fd5e in dbdsdc (uplo=..., compq=..., n=3, d=..., e=..., 
u=..., ldu=3, vt=...,
ldvt=3, q=..., iq=..., work=..., iwork=..., info=0, _uplo=1, _compq=1)
at ../../../../src/modules/lapack/dlapack.f:1525
#4  0x746c1f81 in dgesdd (jobz=..., m=3, n=3, a=..., lda=3, 
s=..., u=..., ldu=3, vt=...,
ldvt=3, work=..., lwork=201, iwork=..., info=0, _jobz=1)
at ../../../../src/modules/lapack/dlapack.f:23312
#5  0x74c5550f in La_svd (jobu=, x=, 
s=0x57b88670, u=0x7ffbc379f010,
vt=0x7ffa1652a010) at ../../../../src/include/Rinlinedfuns.h:120

The place of the crash and the state of the registers suggests some
kind of buffer overrun:

(gdb) frame 0
#0  0x7462ac71 in dlasd3 (nl=468, nr=467, sqre=1, k=921, d=..., q=..., 
ldq=921, dsigma=..., u=...,
ldu=3, u2=..., ldu2=936, vt=..., ldvt=3, vt2=..., ldvt2=937, 
idxc=..., ctot=..., z=..., info=0)
at ../../../../src/modules/lapack/dlapack.f:82312
82312   Q( J, I ) = U( JC, I ) / TEMP
(gdb) disas
...
   0x7462ac60 <+1824>:  movslq -0x4(%r10,%rdx,4),%rsi
   0x7462ac65 <+1829>:  add%r12,%rsi
   0x7462ac68 <+1832>:  movsd  (%rcx,%rsi,8),%xmm1
   0x7462ac6d <+1837>:  divsd  %xmm0,%xmm1
=> 0x7462ac71 <+1841>:  movsd  %xmm1,(%rbx,%rdx,8)
...
(gdb) info reg
rbx0x745c38d8  140737293072600
rcx0x7ffbc379f040  140719293067328
rdx0xe5229
rsi0x517d2c5340460

0x745c38d8 looks precariously close to the end of the valid
userspace address range for processes typically running on my computer.
I was also able to reproduce this with OpenBLAS (0.3.5, LAPACK 3.8.0).

We seem to be overflowing the WORK array. The DGESDD documentation says
that for JOBZ = 'S', WORK must be of length >= 4*mn*mn + 7*mn, where
mn = min(M, N). For M = N = 3, this comes out to 360021, while
the actual length of the WORK array is only 201 (as returned from a
previous call to DGESDD with LWORK = -1).

I think this is caused by an overflow in the value of the BDSPAC
variable. At the point where MINWRK is assigned (MAX(MINWRK,MAXWRK) to
become LWORK later), the following can be observed:

22721   MINWRK = 3*N + MAX( M, BDSPAC )
0x77e9a2e0cmp%ecx,%esi
(gdb) p M
$48 = 3
(gdb) p BDSPAC
$49 = 
(gdb) p $ecx
$50 = -1594847296
(gdb) p $esi
$51 = 3

The debugger insists that the BDSPAC variable is optimised out for most
of the duration of its existence, but it can be proven that it
overflows:

22613  BDSPAC = 3*N*N + 4*N
(gdb) ptype BDSPAC
type = integer(kind=4)
(gdb) p BDSPAC
$60 = 
(gdb) p (int32_t)(3*M*M + 4*M)
$61 = -1594847296

Is there anything R can do at this point, given the required length of
LWORK being impossible to represent using a signed 32-bit integer?

-- 
Best regards,
Ivan

[1] https://mastodon.social/@rmflight/111341279750995911

[2] https://mastodon.social/@_timtay...@fosstodon.org/111341425112070441

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

1 2 >

1 - 100 of 192 matches

Mail list logo