On Sep 7, 2014, at 7:40 PM, Henrik Bengtsson wrote: > Thank you David - it took me awhile to get back to this and dig into > it. It's clever to imitate gtools::mixedorder() as far as possible. > A few comments: > > 1. It took me a while to understand why you picked 3899 in your > Roman-to-integer table; it's because roman(x) is NA for x > 3899. > (BTW, in 'utils', there's utils:::.roman2numeric() which could be > utilized, but it's currently internal.)
Yes, that was the reason. I didn't think I needed a Roman-to-numeric function because I discovered the roman numbers were actually simple numeric vectors to which a class had been assigned and that it was the class-facilities that did all the work. The standard Ops functions were just acting on numeric vectors. If one doesn't take care, their "romanity" can be lost: > R <- as.roman(10^(0:4)) > R [1] I X C M <NA> > unclass(R) [1] 1 10 100 1000 NA > sum(R, na.rm=TRUE) [1] 1111 > as.roman(sum(R, na.rm=TRUE)) [1] MCXI > > 2. I think you forgot D=500 and M=1000. Quite possible. I suspect Greg will have corrected the omission, but if not, this will be helpful to him. > > 3. There was a typo in your code; I think you meant rank.roman instead > of rank.numeric in one place. > I understood Greg's intention to wrap this into the mixedorder and mixed sort duo. Best; David. > 4. The idea behind nonnumeric() is to identify non-numeric substrings > by is.na(as.numeric()). Unfortunately, for romans that does not work. > Instead, we need to use is.na(numeric(x)) here, i.e. > > nonnumeric <- function(x) { > suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), NA)) > } > > Actually, gtools::mixedorder() could use the same. > > 5. I undid your ".numeric" to ".roman" to minimize any differences to > gtools::mixedorder(). > > > With the above fixes, we now have: > > mixedorderRoman <- function (x) > { > if (length(x) < 1) > return(NULL) > else if (length(x) == 1) > return(1) > if (is.numeric(x)) > return(order(x)) > delim = "\\$\\@\\$" > # NOTE: Note that as.roman(x) is NA for x > 3899 > romanC <- as.character( as.roman(1:3899) ) > numeric <- function(x) { > suppressWarnings(match(x, romanC)) > } > nonnumeric <- function(x) { > suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), > NA)) > } > x <- as.character(x) > which.nas <- which(is.na(x)) > which.blanks <- which(x == "") > if (length(which.blanks) > 0) > x[which.blanks] <- -Inf > if (length(which.nas) > 0) > x[which.nas] <- Inf > delimited <- gsub("([IVXCLM]+)", > paste(delim, "\\1", delim, sep = ""), x) > step1 <- strsplit(delimited, delim) > step1 <- lapply(step1, function(x) x[x > ""]) > step1.numeric <- lapply(step1, numeric) > step1.character <- lapply(step1, nonnumeric) > maxelem <- max(sapply(step1, length)) > step1.numeric.t <- lapply(1:maxelem, function(i) sapply(step1.numeric, > function(x) x[i])) > step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, > function(x) x[i])) > rank.numeric <- sapply(step1.numeric.t, rank) > rank.character <- sapply(step1.character.t, function(x) > as.numeric(factor(x))) > rank.numeric[!is.na(rank.character)] <- 0 > rank.character <- t(t(rank.character) + apply(matrix(rank.numeric), > 2, max, na.rm = TRUE)) > rank.overall <- ifelse(is.na(rank.character), rank.numeric, > rank.character) > order.frame <- as.data.frame(rank.overall) > if (length(which.nas) > 0) > order.frame[which.nas, ] <- Inf > retval <- do.call("order", order.frame) > return(retval) > } > > > The difference to gtools::mixedorder() is minimal: > > < romanC <- as.character( as.roman(1:3899) ) > 21c11 > < suppressWarnings(match(x, romanC)) > --- >> suppressWarnings(as.numeric(x)) > 24c14 > < suppressWarnings(ifelse(is.na(numeric(x)), toupper(x), > --- >> suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), > 34c24 > < delimited <- gsub("([IVXCLDM]+)", > --- >> delimited <- >> gsub("([+-]{0,1}[0-9]+\\.{0,1}[0-9]*([eE][\\+\\-]{0,1}[0-9]+\\.{0,1}[0-9]*){0,1})", > 59,62d48 > > This difference is so small that the above could now be an option to > mixedorder() with minimal overhead added, e.g. mixedorder(y, > type=c("decimal", "roman")). One could even imagine adding support > for "binary", "octal" and "hexadecimal" (not done). > > Greg (maintainer of gtools; cc:ed), is this something you would > consider adding to gtools? I've modified the gtools source code > available on CRAN (that's the only source I found), added package > tests, updated the Rd and verified it passes R CMD check. If > interested, please find the updates at: > > https://github.com/HenrikBengtsson/gtools/compare/cran:master...master > > Thanks > > Henrik > > On Tue, Aug 26, 2014 at 6:46 PM, David Winsemius <dwinsem...@comcast.net> > wrote: >> >> On Aug 26, 2014, at 5:24 PM, Henrik Bengtsson wrote: >> >>> Hi, >>> >>> does anyone know of an implementation/function that sorts strings that >>> *contain* roman numerals (I, II, III, IV, V, ...) which are treated as >>> numbers. In 'gtools' there is mixedsort() which does this for strings >>> that contains (decimal) numbers. I'm looking for a "mixedsortroman()" >>> function that does the same but with roman numbers, e.g. >> >> It's pretty easy to sort something you know to be congruent with the >> existing roman class: >> >> romanC <- as.character( as.roman(1:3899) ) >> match(c("I", "II", "III","X","V"), romanC) >> #[1] 1 2 3 10 5 >> >> But I guess you already know that, so you want a regex approach to parsing. >> Looking at the path taken by Warnes, it would involve doing something like >> his regex based insertion of a delimiter for "Roman numeral" but simpler >> because he needed to deal with decimal points and signs and exponent >> notation, none of which you appear to need. If you only need to consider >> character and Roman, then this hack of Warnes tools succeeds: >> >> mixedorderRoman <- function (x) >> { >> if (length(x) < 1) >> return(NULL) >> else if (length(x) == 1) >> return(1) >> if (is.numeric(x)) >> return(order(x)) >> delim = "\\$\\@\\$" >> roman <- function(x) { >> suppressWarnings(match(x, romanC)) >> } >> nonnumeric <- function(x) { >> suppressWarnings(ifelse(is.na(as.numeric(x)), toupper(x), >> NA)) >> } >> x <- as.character(x) >> which.nas <- which(is.na(x)) >> which.blanks <- which(x == "") >> if (length(which.blanks) > 0) >> x[which.blanks] <- -Inf >> if (length(which.nas) > 0) >> x[which.nas] <- Inf >> delimited <- gsub("([IVXCL]+)", >> paste(delim, "\\1", delim, sep = ""), x) >> step1 <- strsplit(delimited, delim) >> step1 <- lapply(step1, function(x) x[x > ""]) >> step1.roman <- lapply(step1, roman) >> step1.character <- lapply(step1, nonnumeric) >> maxelem <- max(sapply(step1, length)) >> step1.roman.t <- lapply(1:maxelem, function(i) sapply(step1.roman, >> function(x) x[i])) >> step1.character.t <- lapply(1:maxelem, function(i) sapply(step1.character, >> function(x) x[i])) >> rank.roman <- sapply(step1.roman.t, rank) >> rank.character <- sapply(step1.character.t, function(x) >> as.numeric(factor(x))) >> rank.roman[!is.na(rank.character)] <- 0 >> rank.character <- t(t(rank.character) + apply(matrix(rank.roman), >> 2, max, na.rm = TRUE)) >> rank.overall <- ifelse(is.na(rank.character), rank.numeric, >> rank.character) >> order.frame <- as.data.frame(rank.overall) >> if (length(which.nas) > 0) >> order.frame[which.nas, ] <- Inf >> retval <- do.call("order", order.frame) >> return(retval) >> } >> >> y[mixedorderRoman(y)] >> [1] "chr I" "chr II" "chr III" "chr IV" "chr IX" >> [6] "chr V" "chr VI" "chr VII" "chr VIII" "chr X" >> [11] "chr XI" "chr XII" >> >> >> -- >> David. >>> >>> ## DECIMAL NUMBERS >>>> x <- sprintf("chr %d", 12:1) >>>> x >>> [1] "chr 12" "chr 11" "chr 10" "chr 9" "chr 8" >>> [6] "chr 7" "chr 6" "chr 5" "chr 4" "chr 3" >>> [11] "chr 2" "chr 1" >>> >>>> sort(x) >>> [1] "chr 1" "chr 10" "chr 11" "chr 12" "chr 2" >>> [6] "chr 3" "chr 4" "chr 5" "chr 6" "chr 7" >>> [11] "chr 8" "chr 9" >>> >>>> gtools::mixedsort(x) >>> [1] "chr 1" "chr 2" "chr 3" "chr 4" "chr 5" >>> [6] "chr 6" "chr 7" "chr 8" "chr 9" "chr 10" >>> [11] "chr 11" "chr 12" >>> >>> >>> ## ROMAN NUMBERS >>>> y <- sprintf("chr %s", as.roman(12:1)) >>>> y >>> [1] "chr XII" "chr XI" "chr X" "chr IX" >>> [5] "chr VIII" "chr VII" "chr VI" "chr V" >>> [9] "chr IV" "chr III" "chr II" "chr I" >>> >>>> sort(y) >>> [1] "chr I" "chr II" "chr III" "chr IV" >>> [5] "chr IX" "chr V" "chr VI" "chr VII" >>> [9] "chr VIII" "chr X" "chr XI" "chr XII" >>> >>>> mixedsortroman(y) >>> [1] "chr I" "chr II" "chr III" "chr IV" >>> [5] "chr V" "chr VI" "chr VII" "chr VIII" >>> [9] "chr IX" "chr X" "chr XI" "chr XII" >>> >>> The latter is what I'm looking for. >>> >>> Before hacking together something myself (e.g. identify roman numerals >>> substrings, translate them to decimal numbers, use gtools::mixedsort() >>> to sort them and then translate them back to roman numbers), I'd like >>> to hear if someone already has this implemented/know of a package that >>> does this. >>> >>> Thanks, >>> >>> Henrik >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> David Winsemius >> Alameda, CA, USA >> David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.