Answer to my own question:
ush <- data.table(read.csv(...))
setkey(ush, product_id)
s1 <- ush[J[product.id]]
> user system elapsed
> 0.000 0.000 0.003
>
It seems like that's the method to use! Amazing.
--
View this message in context:
http://r.789695.n4.nabble.com/Data-Frame-Searc
Update from email outside of this thread:
Justin Haynes writes:
> matrices will help, but two quick solutions:
>
> if you are looking for single items to go in the some_value space, use ==
> instead of %in% and you'll notice speedups. The second more involved
> option is to take a look at the
So, here is the result time from using the datatable package:
> user system elapsed
> 0.800 0.012 1.847
>
Here are the methods that I am using:
ush <- data.table(read.csv(...))
setkey(ush, product_id)
s1 <- subset(ush, product_id == product.id)
Seems like a minor improvement but not
Wow, these specs are fantastic:
> user system elapsed
>0.330.000.39
>
I wonder how much of that is because of the capacity of the box that you are
running R on. Can you post pertinent specs? This suggest to me that
hardware upgrades (RAM specifically) may also be in order.
Inv
take a look at using the 'data.table' package. Here are some times to
do the lookup using dataframes, matrices and data.tables: data.tables
give the answer is less than 0.1 seconds.
> str(x.df)
'data.frame': 250 obs. of 4 variables:
$ x : Factor w/ 455063 levels "","AAAB",..: 200683
Hey All,
So - I promise to write a blog post on this topic and post it somewhere on
the internet once I get to the bottom of this. Basically, the set-up to the
problem is like this:
1. I have a data frame with dim (2547290, 4)
2. I need to make SQL like lookups on the dataframe. I have been u
6 matches
Mail list logo