See data.table:::duplist which does that (or at least very similar) in C,
for multiple columns too.
Matthew
http://datatable.r-forge.r-project.org/
"peter dalgaard" wrote in message
news:660991c3-b52b-4d58-b819-eadc95ecc...@gmail.com...
>
> On Sep 21, 2010, at 16:27 , Joshua Wiley wrote:
>
>>
On Sep 21, 2010, at 16:27 , Joshua Wiley wrote:
> On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle wrote:
>>
>>
>> All the solutions in this thread so far use the lapply(split(...)) paradigm
>> either directly or indirectly. That paradigm doesn't scale. That's the
>> likely
>> source of quite a
Probably true, thats cunning, but look at base::match. The
first thing it does is coerce factor to character (an allocate
and copy needed internally). data.table doesn't do that
either, see data.table:::sortedmatch.
I made first basic steps towards a proper reproducible test
suite (timings.Rnw).
On Tue, Sep 21, 2010 at 3:09 AM, Matthew Dowle wrote:
>
>
> All the solutions in this thread so far use the lapply(split(...)) paradigm
> either directly or indirectly. That paradigm doesn't scale. That's the
> likely
> source of quite a few 'out of memory' errors and performance issues in R.
Thi
All the solutions in this thread so far use the lapply(split(...)) paradigm
either directly or indirectly. That paradigm doesn't scale. That's the
likely
source of quite a few 'out of memory' errors and performance issues in R.
data.table doesn't do that internally, and it's syntax is pretty eas
On 09/20/2010 08:01 PM, David Winsemius wrote:
> indexfoo
> 1.6 1 -3.0267759
> 1.7 1 -1.3725536
> 1.19 1 -1.1476048
> 1.16 1 -1.0963967
> 1.2 1 -1.0684793
> 2.29 2 -1.6601486
> 2.21 2 -1.2633632
> 2.22 2 -0.9875626
> 2.38 2 -0.9515301
> 2.30
On Mon, Sep 20, 2010 at 11:15 AM, David Winsemius
wrote:
>
> On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:
>
>>
>> On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
>>
>>> On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
>>> wrote:
Harold -
Two ways that come to mind:
Richard Tan asked a very similar question last week
('get top n rows group by a column from a dataframe').
You could use ave() to make a sequence-number-within-group
vector and choose rows with a small enough value there:
tmp[ave(integer(nrow(tmp)), tmp$index, FUN=seq_along)<=N, ]
If there are f
On Sep 20, 2010, at 2:01 PM, David Winsemius wrote:
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
wrote:
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(fo
On Sep 20, 2010, at 1:40 PM, Joshua Wiley wrote:
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
wrote:
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
3) do.call(rbind, by(tmp, tmp$index
On 09/20/2010 07:16 PM, Doran, Harold wrote:
> tmp1 <- tmp1[1:5,]
> tmp2 <- tmp2[1:5,]
> result <- rbind(tmp1, tmp2)
>
> Does anyone see a way to subset and subsequently bind without a loop?
>
> do.call(rbind,lapply(split(tmp,tmp$index),head,5))
indexfoo
1.11 1 -1.5124909
1.10
On Mon, Sep 20, 2010 at 10:27 AM, Phil Spector
wrote:
> Harold -
> Two ways that come to mind:
>
> 1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
> 2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
3) do.call(rbind, by(tmp, tmp$index, .Primitive("["), 1:5, 1:2))
Josh
>
>
Hi Harold,
I thought of one way to do this, but maybe (probably) there is a faster way:
tmp <- data.frame(index = gl(3,20), foo = rnorm(60))
subset.first.x.elements <- function(INDEX, num.of.elements = 5)
{
t.INDEX <- table(factor(INDEX, levels = unique(INDEX)))
running.indexes <- unlist(sappl
Very nice, Phil. Thank you.
-Original Message-
From: Phil Spector [mailto:spec...@stat.berkeley.edu]
Sent: Monday, September 20, 2010 1:28 PM
To: Doran, Harold
Cc: R-help
Subject: Re: [R] Sorting and subsetting
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp
Harold -
Two ways that come to mind:
1) do.call(rbind,lapply(split(tmp,tmp$index),function(x)x[1:5,]))
2) subset(tmp,unlist(tapply(foo,index,seq))<=5)
- Phil Spector
Statistical Computing Facility
15 matches
Mail list logo