Re: [R] sorting variable names containing digits

John Fox Sun, 21 Dec 2008 18:59:32 -0800

Dear Gabor,

Thanks for this -- I was unaware of mixedsort(). As you point out,
however, mixedsort() doesn't cover all of the cases in which I'm
interested and which are handled by mysort().


Regards,
 John

On Sun, 21 Dec 2008 20:51:17 -0500
 "Gabor Grothendieck" <ggrothendi...@gmail.com> wrote:
> mixedsort in gtools will give the same result as mysort(s) but
> differs in the case of t.
> 
> On Sun, Dec 21, 2008 at 8:33 PM, John Fox <j...@mcmaster.ca> wrote:
> > Dear r-helpers,
> >
> > I'm looking for a way of sorting variable names in a "natural"
> order, when
> > the names are composed of digits and other characters. I know that
> this is a
> > vague idea, and that sorting character strings is a complex topic,
> but
> > perhaps a couple of examples will clarify what I mean:
> >
> >> s <- c("x1b", "x1a", "x02b", "x02a", "x02", "y1a1", "y10a2",
> > +   "y10a10", "y10a1", "y2", "var10a2", "var2", "y10")
> >
> >> sort(s)
> >  [1] "var10a2" "var2"    "x02"     "x02a"    "x02b"    "x1a"
> >  [7] "x1b"     "y10"     "y10a1"   "y10a10"  "y10a2"   "y1a1"
> > [13] "y2"
> >
> >> mysort(s)
> >  [1] "var2"    "var10a2" "x1a"     "x1b"     "x02"     "x02a"
> >  [7] "x02b"    "y1a1"    "y2"      "y10"     "y10a1"   "y10a2"
> > [13] "y10a10"
> >
> >> t <- c("q10.1.1", "q10.2.1", "q2.1.1", "q10.10.2")
> >
> >> sort(t)
> > [1] "q10.1.1"  "q10.10.2" "q10.2.1"  "q2.1.1"
> >
> >> mysort(t)
> > [1] "q2.1.1"   "q10.1.1"  "q10.2.1"  "q10.10.2"
> >
> > Here, sort() is the standard R function and mysort() is a
> replacement, which
> > sorts the names into the order that seems natural to me, at least
> in the
> > cases that I've tried:
> >
> > mysort <- function(x){
> >  sort.helper <- function(x){
> >    prefix <- strsplit(x, "[0-9]")
> >    prefix <- sapply(prefix, "[", 1)
> >    prefix[is.na(prefix)] <- ""
> >    suffix <- strsplit(x, "[^0-9]")
> >    suffix <- as.numeric(sapply(suffix, "[", 2))
> >    suffix[is.na(suffix)] <- -Inf
> >    remainder <- sub("[^0-9]+", "", x)
> >    remainder <- sub("[0-9]+", "", remainder)
> >    if (all (remainder == "")) list(prefix, suffix)
> >    else c(list(prefix, suffix), Recall(remainder))
> >    }
> >  ord <- do.call("order", sort.helper(x))
> >  x[ord]
> >   }
> >
> > I have a couple of applications in mind, one of which is
> recognizing
> > repeated-measures variables in "wide" longitudinal datasets, which
> often are
> > named in the form x1, x2, ... , xn.
> >
> > mysort(), which works by recursively slicing off pairs of non-digit
> and
> > digit strings, seems more complicated than it should have to be,
> and I
> > wonder whether anyone has a more elegant solution. I don't think
> that
> > efficiency is a serious issue for the applications I'm considering,
> but of
> > course a more efficient solution would be of interest.
> >
> > Thanks,
> >  John
> >
> > ------------------------------
> > John Fox, Professor
> > Department of Sociology
> > McMaster University
> > Hamilton, Ontario, Canada
> > web: socserv.mcmaster.ca/jfox
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >

--------------------------------
John Fox, Professor
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] sorting variable names containing digits

Reply via email to