Re: [R] data frame subset too slow

2011-01-12 Thread Duke
Sorry for the late response. I was away for vacation and was unable to keep on working on the codes. Anyway, I was unable to provide *str* of that specific data since they are all in a big package with lots of inputs/outputs. Quickly gazing through the code, I narrowed them down (and made a ba

Re: [R] data frame subset too slow

2010-12-30 Thread jim holtman
If you want the data in the first column of the dataframe, then you should be using '[['. Notice what comes back in each of these cases: > str(dat) 'data.frame': 8 obs. of 5 variables: $ sample.1.200..n..TRUE.: int 25 199 70 124 93 157 49 137 192 57 ... $ runif.n. : num 0.

Re: [R] data frame subset too slow

2010-12-30 Thread Duke
Actually there are different ways of doing subsetting: [1] [[1]] [,1] $V1 Please let me know which one is the fastest (and most used) one. Thanks. D. On 12/30/10 11:28 AM, Duke wrote: Hi Jim, Is this really a problem for me to use [1] instead of [[1]]? Will this make it run slower? Also, if

Re: [R] data frame subset too slow

2010-12-30 Thread Duke
Hi Jim, Is this really a problem for me to use [1] instead of [[1]]? Will this make it run slower? Also, if I use dat$V1 %in% list$V1, will it be fine? Anyway, my data and list are basically gene lists (tab delimited): $ head test.txt Xkr4chr1-3204562366157932061023661

Re: [R] data frame subset too slow

2010-12-30 Thread jim holtman
You should be using dat[[1]]. Here is an example with 8 rows that take about 0.02 seconds to get the subset. Provide an 'str' of what your data looks like > n <- 8 # rows to create > dat <- data.frame(sample(1:200, n, TRUE), runif(n), runif(n), runif(n), > runif(n)) > lst <- data.frame

[R] data frame subset too slow

2010-12-30 Thread Duke
Hi all, First I dont have much experience with R so be gentle. OK, I am dealing with a dataset (~ tens of thousand lines, each line ~ 10 columns of data). I have to create some subset of this data based on some certain conditions (for example, same first column with another dataset etc...). H