I get an R error and no segfault: > parse(textConnection(text), srcfile = srcfile) Error in parse(textConnection(text), srcfile = srcfile) : test.r:1:1: unexpected $end 1: × ^
This is R 4.3.0, so maybe the bug has been introduced since then... Version and system info: > version _ platform x86_64-pc-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 4 minor 3.0 year 2023 month 04 day 21 svn rev 84292 language R version.string R version 4.3.0 (2023-04-21) nickname Already Tomorrow > sessionInfo() R version 4.3.0 (2023-04-21) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.4 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.20.so; LAPACK version 3.10.0 locale: [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 [5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8 [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C time zone: Europe/London tzcode source: system (glibc) attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] compiler_4.3.0 On Tue, May 28, 2024 at 7:42 PM Tomas Kalibera <tomas.kalib...@gmail.com> wrote: > This email originated outside the University. Check before clicking links > or attachments. > > On 5/28/24 19:35, Hadley Wickham wrote: > > Hi all, > > > > When I run the following code, R segfaults: > > > > text <- "×" > > srcfile <- srcfilecopy("test.r", text) > > parse(textConnection(text), srcfile = srcfile) > > > > It doesn't segfault if text is ASCII, or it's not wrapped in > > textConnection, or srcfile isn't set. > > Thanks, this is because R parser doesn't support non-ASCII UTF-8 outside > string literals and comments, plus a missing bounds check. The "correct" > result should be an R error, which I get in a debug build. > > The tokenizer ends up with a negative token and then when the parse data > are being finalized, creating a table of token names, there is an out of > bounds access (yytname array). Probably the check should go right away > into the tokenizer. > > Tomas > > > > > Hadley > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel