Matthew,
Yes, the case I am thinking of is a 1-column key; sorry for the
overgeneralization. I haven't thought much about the multi-column key case.
-s
On Mon, Nov 7, 2011 at 12:48, Matthew Dowle wrote:
> Stavros Macrakis alum.mit.edu> writes:
> >
> > data.table certainly has some us
Le dimanche 06 novembre 2011 à 19:00 -0500, Stavros Macrakis a écrit :
> Milan, Jeff, Patrick,
>
>
> Thank you for your comments and suggestions.
>
>
> Milan,
>
>
> This is far from a "completely theoretical problem". I am performing
> text analytics on a corpus of about 2m documents. There
Stavros Macrakis alum.mit.edu> writes:
>
> data.table certainly has some useful mechanisms, and I've been
> experimenting with it as an implementation mechanism, though it's not a
> drop-in substitute for factors. Also, though it is efficient for set
> operations between small sets and large set
Milan, Jeff, Patrick,
Thank you for your comments and suggestions.
Milan,
This is far from a "completely theoretical problem". I am performing text
analytics on a corpus of about 2m documents. There are tens of thousands
of distinct words (lemmata). It seems to me that the natural
representat
Perhaps 'data.table' would be a package
on CRAN that would be acceptable.
On 05/11/2011 16:45, Jeffrey Ryan wrote:
Or better still, extend R via the mechanisms in place. Something akin
to a fast factor package. Any change to R causes downstream issues in
(hundreds of?) millions of lines of dep
Or better still, extend R via the mechanisms in place. Something akin
to a fast factor package. Any change to R causes downstream issues in
(hundreds of?) millions of lines of deployed code.
It almost seems hard to fathom that a package for this doesn't already
exist. Have you searched CRAN?
Je
Le vendredi 04 novembre 2011 à 19:19 -0400, Stavros Macrakis a écrit :
> R factors are the natural way to represent factors -- and should be
> efficient since they use small integers. But in fact, for many (but
> not all) operations, R factors are considerably slower than integers,
> or even chara
R factors are the natural way to represent factors -- and should be
efficient since they use small integers. But in fact, for many (but
not all) operations, R factors are considerably slower than integers,
or even character strings. This appears to be because whenever a
factor vector is subsetted