subject:"\[R\] Faster Subsetting"

Re: [R] Faster Subsetting

2016-09-28 Thread Dénes Tóth

Hi Harold, Generally: you can not beat data.table, unless you can represent your data in a matrix (or array or vector). For some specific cases, Hervé's suggestion might be also competitive. Your problem is that you did not put any effort to read at least part of the very extensive documentati

Re: [R] Faster Subsetting

2016-09-28 Thread Martin Morgan

On 09/28/2016 02:53 PM, Hervé Pagès wrote: Hi, I'm surprised nobody suggested split(). Splitting the data.frame upfront is faster than repeatedly subsetting it: tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20)) idList <- unique(tmp$id) system.time(for (i in idList) tmp

Re: [R] Faster Subsetting

2016-09-28 Thread Bert Gunter

"I'm surprised nobody suggested split(). " I did. by() is a data frame oriented version of tapply(), which uses split(). Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom C

Re: [R] Faster Subsetting

2016-09-28 Thread Hervé Pagès

Hi, I'm surprised nobody suggested split(). Splitting the data.frame upfront is faster than repeatedly subsetting it: tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20)) idList <- unique(tmp$id) system.time(for (i in idList) tmp[which(tmp$id == i),]) # user system el

Re: [R] Faster Subsetting

2016-09-28 Thread Weiser, Dr. Constantin

eplicate(500, subset(tmp2, id == idList[1]))) > > From: Dominik Schneider [mailto:dosc3...@colorado.edu] > Sent: Wednesday, September 28, 2016 12:27 PM > To: Doran, Harold > Cc: r-help@r-project.org > Subject: Re: [R] Faster Subsetting > > I regularly crunch through this amo

Re: [R] Faster Subsetting

2016-09-28 Thread Dominik Schneider

I regularly crunch through this amount of data with tidyverse. You can also try the data.table package. They are optimized for speed, as long as you have the memory. Dominik On Wed, Sep 28, 2016 at 10:09 AM, Doran, Harold wrote: > I have an extremely large data frame (~13 million rows) that rese

Re: [R] Faster Subsetting

2016-09-28 Thread Dominik Schneider

I regularly crunch through this amount of data with tidyverse. You can also try the data.table package. They are optimized for speed, as long as you have the memory. Dominik On Wed, Sep 28, 2016 at 10:09 AM, Doran, Harold wrote: > I have an extremely large data frame (~13 million rows) that rese

Re: [R] Faster Subsetting

2016-09-28 Thread Enrico Schumann

On Wed, 28 Sep 2016, "Doran, Harold" writes: > I have an extremely large data frame (~13 million rows) that resembles > the structure of the object tmp below in the reproducible code. In my > real data, the variable, 'id' may or may not be ordered, but I think > that is irrelevant. > > I have a p

Re: [R] Faster Subsetting

2016-09-28 Thread Bert Gunter

compared to the indexing method. > > Perhaps I'm using it incorrectly? > > > > -Original Message- > From: Constantin Weiser [mailto:constantin.wei...@hhu.de] > Sent: Wednesday, September 28, 2016 12:55 PM > To: r-help@r-project.org > Cc: Doran, Harold > Subj

Re: [R] Faster Subsetting

2016-09-28 Thread Doran, Harold

compared to the indexing method. Perhaps I'm using it incorrectly? -Original Message- From: Constantin Weiser [mailto:constantin.wei...@hhu.de] Sent: Wednesday, September 28, 2016 12:55 PM To: r-help@r-project.org Cc: Doran, Harold Subject: Re: [R] Faster Subsetting I just mod

Re: [R] Faster Subsetting

2016-09-28 Thread Bert Gunter

.data.table(tmp) # data.table > > system.time(replicate(500, tmp2[which(tmp$id == idList[1]),])) > > system.time(replicate(500, subset(tmp2, id == idList[1]))) > > From: Dominik Schneider [mailto:dosc3...@colorado.edu] > Sent: Wednesday, September 28, 2016 12:27 PM > To: Doran, H

Re: [R] Faster Subsetting

2016-09-28 Thread ruipbarradas

Hello, If you work with a matrix instead of a data.frame, it usually runs faster, but your column vectors must all be numeric. ### Fast, but not fast enough system.time(replicate(500, tmp[which(tmp$id == idList[1]),])) user system elapsed 0.050.000.04 ### Not fast at all, a

Re: [R] Faster Subsetting

2016-09-28 Thread Doran, Harold

Schneider [mailto:dosc3...@colorado.edu] Sent: Wednesday, September 28, 2016 12:27 PM To: Doran, Harold Cc: r-help@r-project.org Subject: Re: [R] Faster Subsetting I regularly crunch through this amount of data with tidyverse. You can also try the data.table package. They are optimized for speed

[R] Faster Subsetting

2016-09-28 Thread Doran, Harold

I have an extremely large data frame (~13 million rows) that resembles the structure of the object tmp below in the reproducible code. In my real data, the variable, 'id' may or may not be ordered, but I think that is irrelevant. I have a process that requires subsetting the data by id and then

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

Re: [R] Faster Subsetting

[R] Faster Subsetting

14 matches

Site Navigation

Mail list logo

Footer information