Note that mysort2 is slightly more general as it handles the case that the strings begin with numerics:
> u <- c("51a2", "2a4") > mysort(u) [1] "51a2" "2a4" > mysort2(u) [1] "2a4" "51a2" On Mon, Dec 22, 2008 at 12:32 AM, John Fox <j...@mcmaster.ca> wrote: > Dear Gabor, > > Thank you (again) for this second suggestion, which does exactly what I > want. At the risk of appearing ungrateful, and although the judgment is > admittedly subjective, I don't find it simpler than mysort(). > > For curiosity, I tried some timings of the two functions for the sample > problems that I supplied: > >> system.time(for (i in 1:100) mysort(s)) > user system elapsed > 1.498 0.006 1.503 > >> system.time(for (i in 1:100) mysort2(s)) > user system elapsed > 6.026 0.028 6.059 > >> system.time(for (i in 1:100) mysort(t)) > user system elapsed > 0.858 0.003 0.874 > >> system.time(for (i in 1:100) mysort2(t)) > user system elapsed > 2.736 0.014 2.757 > > This is on a 2.4 GHz Core 2 Duo MacBook. I don't know of course > whether this generalizes to other problems. I suspect that the > recursive solution will look worse as the number of "components" of the > names increases, but of course names are unlikely to have a large > number of components. > > Best, > John > > On Sun, 21 Dec 2008 23:28:51 -0500 > "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote: >> Another possibility is to use strapply in gsubfn giving a solution >> that is non-recursive and shorter: >> >> library(gsubfn) >> >> mysort2 <- function(s) { >> L <- strapply(s, "([0-9]+)|([^0-9]+)", >> ~ if (nchar(x)) sprintf("%9d", as.numeric(x)) else y) >> L2 <- t(do.call(cbind, lapply(L, ts))) >> L3 <- replace(L2, is.na(L2), "") >> ord <- do.call(order, as.data.frame(L3, stringsAsFactors = FALSE)) >> s[ord] >> } >> >> >> First strapply breaks up each string into a character vector of the >> numeric >> and non-numeric components. We pad each numeric component on the >> left with spaces using sprintf so they are all 9 wide. The next line >> turns that >> into a matrix L2 and then we replace the NAs giving L3. Finally we >> order it >> and apply the ordering, ord, to get the sorted version. >> >> The gsubfn home page is at: >> http://gsubfn.googlecode.com >> >> Here is some sample output: >> >> > mysort2(s) >> [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" >> "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10" >> > mysort(s) >> [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" >> "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" "y10a10" >> >> > mysort2(t) >> [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" >> > mysort(t) >> [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" >> >> >> On Sun, Dec 21, 2008 at 9:57 PM, John Fox <j...@mcmaster.ca> wrote: >> > Dear Gabor, >> > >> > Thanks for this -- I was unaware of mixedsort(). As you point out, >> > however, mixedsort() doesn't cover all of the cases in which I'm >> > interested and which are handled by mysort(). >> > >> > Regards, >> > John >> > >> > On Sun, 21 Dec 2008 20:51:17 -0500 >> > "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote: >> >> mixedsort in gtools will give the same result as mysort(s) but >> >> differs in the case of t. >> >> >> >> On Sun, Dec 21, 2008 at 8:33 PM, John Fox <j...@mcmaster.ca> >> wrote: >> >> > Dear r-helpers, >> >> > >> >> > I'm looking for a way of sorting variable names in a "natural" >> >> order, when >> >> > the names are composed of digits and other characters. I know >> that >> >> this is a >> >> > vague idea, and that sorting character strings is a complex >> topic, >> >> but >> >> > perhaps a couple of examples will clarify what I mean: >> >> > >> >> >> s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2", >> >> > + "y10a10", "y10a1", "y2", "var10a2", "var2", "y10") >> >> > >> >> >> sort(s) >> >> > [1] "var10a2" "var2" "x02" "x02a" "x02b" "x1a" >> >> > [7] "x1b" "y10" "y10a1" "y10a10" "y10a2" "y1a1" >> >> > [13] "y2" >> >> > >> >> >> mysort(s) >> >> > [1] "var2" "var10a2" "x1a" "x1b" "x02" "x02a" >> >> > [7] "x02b" "y1a1" "y2" "y10" "y10a1" "y10a2" >> >> > [13] "y10a10" >> >> > >> >> >> t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2") >> >> > >> >> >> sort(t) >> >> > [1] "q10.1.1" "q10.10.2" "q10.2.1" "q2.1.1" >> >> > >> >> >> mysort(t) >> >> > [1] "q2.1.1" "q10.1.1" "q10.2.1" "q10.10.2" >> >> > >> >> > Here, sort() is the standard R function and mysort() is a >> >> replacement, which >> >> > sorts the names into the order that seems natural to me, at >> least >> >> in the >> >> > cases that I've tried: >> >> > >> >> > mysort <- function(x){ >> >> > sort.helper <- function(x){ >> >> > prefix <- strsplit(x, "[0-9]") >> >> > prefix <- sapply(prefix, "[", 1) >> >> > prefix[is.na(prefix)] <- "" >> >> > suffix <- strsplit(x, "[^0-9]") >> >> > suffix <- as.numeric(sapply(suffix, "[", 2)) >> >> > suffix[is.na(suffix)] <- -Inf >> >> > remainder <- sub("[^0-9]+", "", x) >> >> > remainder <- sub("[0-9]+", "", remainder) >> >> > if (all (remainder == "")) list(prefix, suffix) >> >> > else c(list(prefix, suffix), Recall(remainder)) >> >> > } >> >> > ord <- do.call("order", sort.helper(x)) >> >> > x[ord] >> >> > } >> >> > >> >> > I have a couple of applications in mind, one of which is >> >> recognizing >> >> > repeated-measures variables in "wide" longitudinal datasets, >> which >> >> often are >> >> > named in the form x1, x2, ... , xn. >> >> > >> >> > mysort(), which works by recursively slicing off pairs of >> non-digit >> >> and >> >> > digit strings, seems more complicated than it should have to be, >> >> and I >> >> > wonder whether anyone has a more elegant solution. I don't think >> >> that >> >> > efficiency is a serious issue for the applications I'm >> considering, >> >> but of >> >> > course a more efficient solution would be of interest. >> >> > >> >> > Thanks, >> >> > John >> >> > >> >> > ------------------------------ >> >> > John Fox, Professor >> >> > Department of Sociology >> >> > McMaster University >> >> > Hamilton, Ontario, Canada >> >> > web: socserv.mcmaster.ca/jfox >> >> > >> >> > ______________________________________________ >> >> > R-help@r-project.org mailing list >> >> > https://stat.ethz.ch/mailman/listinfo/r-help >> >> > PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html >> >> > and provide commented, minimal, self-contained, reproducible >> code. >> >> > >> > >> > -------------------------------- >> > John Fox, Professor >> > Department of Sociology >> > McMaster University >> > Hamilton, Ontario, Canada >> > http://socserv.mcmaster.ca/jfox/ >> > > > -------------------------------- > John Fox, Professor > Department of Sociology > McMaster University > Hamilton, Ontario, Canada > http://socserv.mcmaster.ca/jfox/ > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.