Thank you all for the help and enlightening comments. EC
On Sun, Nov 23, 2008 at 4:40 PM, jim holtman <[EMAIL PROTECTED]> wrote: > > You do have to read a little further on the help page to make sure > that duplicates are removed if they appear after, and not before, > others in the vector to see that the order is preserved: > > "Note that unlike the Unix command uniq this omits duplicated and not > just repeated elements/rows. That is, an element is omitted if it is > identical to any previous element and not just if it is the same as > the immediately previous one. " > > This does make it clear that the original order is preserved since it > is succeeding elements that are removed. So from this, I assume that > the use of > > unique(x,y) > > does preserve the original ordering of the elements. > > > > On Sun, Nov 23, 2008 at 2:36 AM, Prof Brian Ripley > <[EMAIL PROTECTED]> wrote: > > On Sun, 23 Nov 2008, jim holtman wrote: > > > >> You are right. union used 'unique(c(x,y))' and I am not sure if > >> 'unique' preserves the order, but the help page seems to indicate that > >> "an element is omitted if it is identical to any previous element "; > >> this might mean that the order is preserved. > > > > It says > > > > 'unique' returns a vector, data frame or array like 'x' but with > > duplicate elements/rows removed. > > > > Although it is a generic function, it is hard to see how that can be > > interpreted to allow the order to be changed. > > > > The claim that union would be more efficiently implemented via sorting is > > made with no evidence: do look up a basic computer science textbook for this > > kind of thing, as well as how R actually does it. (Also 'efficient' was not > > defined: both speed and memory usage are potentially measures of > > efficiency.) But for example > > > >> x <- rnorm(1e7) > >> system.time(unique(x)) > > > > user system elapsed > > 2.258 0.261 2.523 > >> > >> system.time(sort(x)) > > > > user system elapsed > > 4.102 0.112 4.231 > >> > >> system.time(sort(x, method="quick")) > > > > user system elapsed > > 1.928 0.109 2.047 > > > > will indicate that unique() is comparable in speed to sorting. > > > > > >> > >> On Sat, Nov 22, 2008 at 11:43 PM, Stavros Macrakis > >> <[EMAIL PROTECTED]> wrote: > >>> > >>> On Sat, Nov 22, 2008 at 10:20 AM, jim holtman <[EMAIL PROTECTED]> wrote: > >>>> > >>>> c.Factor <- > >>>> function (x, y) > >>>> { > >>>> newlevels = union(levels(x), levels(y)) > >>>> m = match(levels(y), newlevels) > >>>> ans = c(unclass(x), m[unclass(y)]) > >>>> levels(ans) = newlevels > >>>> class(ans) = "factor" > >>>> ans > >>>> } > >>> > >>> This algorithm depends crucially on union preserving the order of the > >>> elements of its arguments. As far as I can tell, the spec of union > >>> does not require this. If union were to (for example) sort its > >>> arguments then merge them (generally a more efficient algorithm), this > >>> function would no longer work. > >>> > >>> Fortunately, the fix is simple. Instead of union, use: > >>> > >>> newlevels <- c(levels(x),setdiff(levels(y),levels(x)) > >>> > >>> which is guaranteed to preserve the order of levels(x). > >>> > >>> -s > >>> > >> > >> > >> > >> -- > >> Jim Holtman > >> Cincinnati, OH > >> +1 513 646 9390 > >> > >> What is the problem that you are trying to solve? > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > >> > > > > -- > > Brian D. Ripley, [EMAIL PROTECTED] > > Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ > > University of Oxford, Tel: +44 1865 272861 (self) > > 1 South Parks Road, +44 1865 272866 (PA) > > Oxford OX1 3TG, UK Fax: +44 1865 272595 > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.