Dear Gabor, Thank you (again) for this second suggestion, which does exactly what I want. At the risk of appearing ungrateful, and although the judgment is admittedly subjective, I don't find it simpler than mysort().
For curiosity, I tried some timings of the two functions for the sample problems that I supplied: > system.time(for (i in 1:100) mysort(s)) user system elapsed 1.498 0.006 1.503 > system.time(for (i in 1:100) mysort2(s)) user system elapsed 6.026 0.028 6.059 > system.time(for (i in 1:100) mysort(t)) user system elapsed 0.858 0.003 0.874 > system.time(for (i in 1:100) mysort2(t)) user system elapsed 2.736 0.014 2.757 This is on a 2.4 GHz Core 2 Duo MacBook. I don't know of course whether this generalizes to other problems. I suspect that the recursive solution will look worse as the number of "components" of the names increases, but of course names are unlikely to have a large number of components. Best, John On Sun, 21 Dec 2008 23:28:51 -0500 "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote: > Another possibility is to use strapply in gsubfn giving a solution > that is non-recursive and shorter: > > library(gsubfn) > > mysort2 <- function(s) { > L <- strapply(s, "([0-9]+)|([^0-9]+)", > ~ if (nchar(x)) sprintf("%9d", as.numeric(x)) else y) > L2 <- t(do.call(cbind, lapply(L, ts))) > L3 <- replace(L2, is.na(L2), "") > ord <- do.call(order, as.data.frame(L3, stringsAsFactors = FALSE)) > s[ord] > } > > > First strapply breaks up each string into a character vector of the > numeric > and non-numeric components. We pad each numeric component on the > left with spaces using sprintf so they are all 9 wide. The next line > turns that > into a matrix L2 and then we replace the NAs giving L3. Finally we > order it > and apply the ordering, ord, to get the sorted version. > > The gsubfn home page is at: > http://gsubfn.googlecode.com > > Here is some sample output: > > > mysort2(s) > [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" > "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10" > > mysort(s) > [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" > "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10" > > > mysort2(t) > [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" > > mysort(t) > [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" > > > On Sun, Dec 21, 2008 at 9:57 PM, John Fox <j...@mcmaster.ca> wrote: > > Dear Gabor, > > > > Thanks for this -- I was unaware of mixedsort(). As you point out, > > however, mixedsort() doesn't cover all of the cases in which I'm > > interested and which are handled by mysort(). > > > > Regards, > > John > > > > On Sun, 21 Dec 2008 20:51:17 -0500 > > "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote: > >> mixedsort in gtools will give the same result as mysort(s) but > >> differs in the case of t. > >> > >> On Sun, Dec 21, 2008 at 8:33 PM, John Fox <j...@mcmaster.ca> > wrote: > >> > Dear r-helpers, > >> > > >> > I'm looking for a way of sorting variable names in a "natural" > >> order, when > >> > the names are composed of digits and other characters. I know > that > >> this is a > >> > vague idea, and that sorting character strings is a complex > topic, > >> but > >> > perhaps a couple of examples will clarify what I mean: > >> > > >> >> s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2", > >> > + "y10a10", "y10a1", "y2", "var10a2", "var2", "y10") > >> > > >> >> sort(s) > >> > [1] "var10a2" "var2" "x02" "x02a" "x02b" "x1a" > >> > [7] "x1b" "y10" "y10a1" "y10a10" "y10a2" "y1a1" > >> > [13] "y2" > >> > > >> >> mysort(s) > >> > [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" > >> > [7] "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" > >> > [13] "y10a10" > >> > > >> >> t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2") > >> > > >> >> sort(t) > >> > [1] "q10.1.1" "q10.10.2" "q10.2.1" "q2.1.1" > >> > > >> >> mysort(t) > >> > [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" > >> > > >> > Here, sort() is the standard R function and mysort() is a > >> replacement, which > >> > sorts the names into the order that seems natural to me, at > least > >> in the > >> > cases that I've tried: > >> > > >> > mysort <- function(x){ > >> > sort.helper <- function(x){ > >> > prefix <- strsplit(x, "[0-9]") > >> > prefix <- sapply(prefix, "[", 1) > >> > prefix[is.na(prefix)] <- "" > >> > suffix <- strsplit(x, "[^0-9]") > >> > suffix <- as.numeric(sapply(suffix, "[", 2)) > >> > suffix[is.na(suffix)] <- -Inf > >> > remainder <- sub("[^0-9]+", "", x) > >> > remainder <- sub("[0-9]+", "", remainder) > >> > if (all (remainder == "")) list(prefix, suffix) > >> > else c(list(prefix, suffix), Recall(remainder)) > >> > } > >> > ord <- do.call("order", sort.helper(x)) > >> > x[ord] > >> > } > >> > > >> > I have a couple of applications in mind, one of which is > >> recognizing > >> > repeated-measures variables in "wide" longitudinal datasets, > which > >> often are > >> > named in the form x1, x2, ... , xn. > >> > > >> > mysort(), which works by recursively slicing off pairs of > non-digit > >> and > >> > digit strings, seems more complicated than it should have to be, > >> and I > >> > wonder whether anyone has a more elegant solution. I don't think > >> that > >> > efficiency is a serious issue for the applications I'm > considering, > >> but of > >> > course a more efficient solution would be of interest. > >> > > >> > Thanks, > >> > John > >> > > >> > ------------------------------ > >> > John Fox, Professor > >> > Department of Sociology > >> > McMaster University > >> > Hamilton, Ontario, Canada > >> > web: socserv.mcmaster.ca/jfox > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible > code. > >> > > > > > -------------------------------- > > John Fox, Professor > > Department of Sociology > > McMaster University > > Hamilton, Ontario, Canada > > http://socserv.mcmaster.ca/jfox/ > > -------------------------------- John Fox, Professor Department of Sociology McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.