On Sun, 23 Nov 2008, jim holtman wrote:

You are right.  union used 'unique(c(x,y))' and I am not sure if
'unique' preserves the order, but the help page seems to indicate that
"an element is omitted if it is identical to any previous element ";
this might mean that the order is preserved.

It says

     'unique' returns a vector, data frame or array like 'x' but with
     duplicate elements/rows removed.

Although it is a generic function, it is hard to see how that can be interpreted to allow the order to be changed.

The claim that union would be more efficiently implemented via sorting is made with no evidence: do look up a basic computer science textbook for this kind of thing, as well as how R actually does it. (Also 'efficient' was not defined: both speed and memory usage are potentially measures of efficiency.) But for example

x <- rnorm(1e7)
system.time(unique(x))
   user  system elapsed
  2.258   0.261   2.523
system.time(sort(x))
   user  system elapsed
  4.102   0.112   4.231
system.time(sort(x, method="quick"))
   user  system elapsed
  1.928   0.109   2.047

will indicate that unique() is comparable in speed to sorting.



On Sat, Nov 22, 2008 at 11:43 PM, Stavros Macrakis
<[EMAIL PROTECTED]> wrote:
On Sat, Nov 22, 2008 at 10:20 AM, jim holtman <[EMAIL PROTECTED]> wrote:
 c.Factor <-
function (x, y)
{
   newlevels = union(levels(x), levels(y))
   m = match(levels(y), newlevels)
   ans = c(unclass(x), m[unclass(y)])
   levels(ans) = newlevels
   class(ans) = "factor"
   ans
}

This algorithm depends crucially on union preserving the order of the
elements of its arguments. As far as I can tell, the spec of union
does not require this.  If union were to (for example) sort its
arguments then merge them (generally a more efficient algorithm), this
function would no longer work.

Fortunately, the fix is simple.  Instead of union, use:

    newlevels <- c(levels(x),setdiff(levels(y),levels(x))

which is guaranteed to preserve the order of levels(x).

            -s




--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Brian D. Ripley,                  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to