Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

Michael Lawrence Thu, 08 Jan 2015 15:40:47 -0800

Currently unique() does duplicated() internally and then extracts. One
could make a countUnique that simply counts, rather than allocate the
logical return value of duplicated(). But so much of the cost is in the
hash operation that it probably won't help much, but that might depend on
the sizes of things. The more unique elements, the better it would perform.



On Thu, Jan 8, 2015 at 2:06 PM, Peter Haverty <[email protected]>
wrote:

> How about unique them both and compare the lengths?  It's less work,
> especially allocation.
>
>
>
> Pete
>
> ____________________
> Peter M. Haverty, Ph.D.
> Genentech, Inc.
> [email protected]
>
> On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard <[email protected]> wrote:
>
> > If you look at the definition of %in%, you'll find that it is implemented
> > using match, so if we did as you suggest, I give it about three days
> before
> > someone suggests to inline the function call... Readability of source
> code
> > is not usually our prime concern.
> >
> > The && idea does have some merit, though.
> >
> > Apropos, why is there no setcontains()?
> >
> > -pd
> >
> > > On 06 Jan 2015, at 22:02 , Hervé Pagès <[email protected]> wrote:
> > >
> > > Hi,
> > >
> > > Current implementation:
> > >
> > > setequal <- function (x, y)
> > > {
> > >  x <- as.vector(x)
> > >  y <- as.vector(y)
> > >  all(c(match(x, y, 0L) > 0L, match(y, x, 0L) > 0L))
> > > }
> > >
> > > First what about replacing 'match(x, y, 0L) > 0L' and 'match(y, x, 0L)
> >
> > 0L'
> > > with 'x %in% y' and 'y %in% x', respectively. They're strictly
> > > equivalent but the latter form is a lot more readable than the former
> > > (isn't this the "raison d'être" of %in%?):
> > >
> > > setequal <- function (x, y)
> > > {
> > >  x <- as.vector(x)
> > >  y <- as.vector(y)
> > >  all(c(x %in% y, y %in% x))
> > > }
> > >
> > > Furthermore, replacing 'all(c(x %in% y, y %in x))' with
> > > 'all(x %in% y) && all(y %in% x)' improves readability even more and,
> > > more importantly, reduces memory footprint significantly on big vectors
> > > (e.g. by 15% on integer vectors with 15M elements):
> > >
> > > setequal <- function (x, y)
> > > {
> > >  x <- as.vector(x)
> > >  y <- as.vector(y)
> > >  all(x %in% y) && all(y %in% x)
> > > }
> > >
> > > It also seems to speed up things a little bit (not in a significant
> > > way though).
> > >
> > > Cheers,
> > > H.
> > >
> > > --
> > > Hervé Pagès
> > >
> > > Program in Computational Biology
> > > Division of Public Health Sciences
> > > Fred Hutchinson Cancer Research Center
> > > 1100 Fairview Ave. N, M1-B514
> > > P.O. Box 19024
> > > Seattle, WA 98109-1024
> > >
> > > E-mail: [email protected]
> > > Phone:  (206) 667-5791
> > > Fax:    (206) 667-1319
> > >
> > > ______________________________________________
> > > [email protected] mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> > --
> > Peter Dalgaard, Professor,
> > Center for Statistics, Copenhagen Business School
> > Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> > Phone: (+45)38153501
> > Email: [email protected]  Priv: [email protected]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
>
>         [[alternative HTML version deleted]]
>
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

Reply via email to