[R] data frame subset too slow

Duke Thu, 30 Dec 2010 07:24:49 -0800

Hi all,

First I dont have much experience with R so be gentle. OK, I am dealingwith a dataset (~ tens of thousand lines, each line ~ 10 columns ofdata). I have to create some subset of this data based on some certainconditions (for example, same first column with another dataset etc...).Here is how I did it:


# import data
dat <- read.table( "test.txt", header=TRUE, fill=TRUE, sep="\t" )
list <- read.table( "list.txt", header=TRUE, fill=TRUE, sep="\t" )
# create sub data
subdat <- dat[dat[1] %in% list[1],]

So the third line is to create a new data frame with all the same firstcolumn in both dat and list. There is no problem with the code as itruns just fine with testing data (small). When I tried with my real data(~80k lines, ~ 15MB size), it takes like forever (few hours). I dontknow why it takes that long, but I think it shouldnt. I think even witha for loop in C++, I can get this done in say few minutes.


So anyone has any idea/advice/suggestion?

Thanks so much in advance and Happy New Year to all of you.

D.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] data frame subset too slow

Reply via email to