On Wed, 15 Jan 2025 11:41:34 +0100 Martin Maechler wrote: > >>>>> Jani V?limaa > >>>>> on Tue, 14 Jan 2025 20:39:19 +0200 writes: > > > Hello, > > I don't know what's changed or how to figure out why as.roman() started > > to work different way lately on Mageia Cauldron. Cauldron is the > > latest development version of Mageia Linux. > > > Expected bahavior: > >> as.roman(strrep("I", 1:5)) > > [1] I II III IV V > > > Current behavior: > >> as.roman(strrep("I", 1:5)) > > [1] I II III IV <NA> > > Warning message: > > In .roman2numeric(x) : invalid roman numeral: IIIII > > > as.roman() doesn't handle "IIIII" -> "V" anymore and thus 'make check' > > fails when building any 4.3.x or 4.4.x versions from the sources. > > Not yet. > For me, (on Linux Fedora 40), > on current R-4.4.2, R-patched and R-devel I get the same good > results from > > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr) > > > (cc <- strrep("I", 1:5)); (rr <- as.roman(cc)); dput(rr) > [1] "I" "II" "III" "IIII" "IIIII" > [1] I II III IV V > structure(1:5, class = "roman") > > > > The code behind this uses grep() and grepl() > and I assume this somehow does not work correctly on your > platform? > > Digging a bit further, the crucial part in this case happens in > the (namespace hidden) function utils ::: .roman2numeric > which you probably already know from the above warning. > For me, > > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)); dput(r2) > > gives > > > (cc <- strrep("I", 1:5)); (r2 <- utils:::.roman2numeric(cc)) > [1] "I" "II" "III" "IIII" "IIIII" > [1] 1 2 3 4 5 > > > > this must be different in your case. > > You can use > debug(utils:::.roman2numeric) > and > utils:::.roman2numeric(cc) > > to find out where the problem happens. > This will show almost surely that the problem is indeed in a > grepl() call. > > I'm close to sure it is this: > > > grepl("^M{,3}D?C{,4}L?X{,4}V?I{,4}$", cc) > [1] TRUE TRUE TRUE TRUE TRUE > > where you don't get the same, but probably > > [1] TRUE TRUE TRUE TRUE FALSE > > which I *do* get, too if I use grepl(....., perl=TRUE) > .. see also below. > > > The code we use is our own tweaked version of 'TRE' (in <Rsrc>/extra/tre/ ), > and I do think we've occasionally seen platform dependencies. > > Also, yes, in 2022 there have been several changes, related to > fixing bugs, though several ones *before* releasing R 4.3.0. > > Last, but not (at all!) least: > > Actually, I *am* confused a bit why this ever worked (and still > works for most of us): > > I'm using {,2} instead of {,4} to make things faster to grasp; > I see > > > grepl("^I{,2}$", c("II", "III", "IIII")) > [1] TRUE TRUE FALSE > > > > and I wonder why 'I{,2}' matches 3 "I"s. ... I'd thought {,2} to > mean " up to 2 occurrences (of the previous <entity>)" > (where here <entity> = character). > > In our real example, I{,4} matched 5 "I"s > > and as I mentioned above, the somewhat more maintained > perl=TRUE option does *not*. > > We could change the code to use I{,5} to make 5x"I", i.e. "IIIII" > work for you .. but then that would also match > "IIIIII" (6 x "I") for "everybody" else with our current TRE engine.. >
Thanks for your insights. Mageia uses system TRE with R via --with-system-tre configure option. TRE was updated some time ago to version 0.9.0, and looks like the 'issue' started at the same time. And indeed as.roman() works as before after I rebuilt R with bundled TRE 0.8.0 using --with-system-tre=no. So, something changed in TRE 0.9.0 and grepl().
pgpIUarjY9lqc.pgp
Description: OpenPGP-allekirjoitus
______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide https://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.