Hi all,
First I dont have much experience with R so be gentle. OK, I am dealing
with a dataset (~ tens of thousand lines, each line ~ 10 columns of
data). I have to create some subset of this data based on some certain
conditions (for example, same first column with another dataset etc...).
Here is how I did it:
# import data
dat <- read.table( "test.txt", header=TRUE, fill=TRUE, sep="\t" )
list <- read.table( "list.txt", header=TRUE, fill=TRUE, sep="\t" )
# create sub data
subdat <- dat[dat[1] %in% list[1],]
So the third line is to create a new data frame with all the same first
column in both dat and list. There is no problem with the code as it
runs just fine with testing data (small). When I tried with my real data
(~80k lines, ~ 15MB size), it takes like forever (few hours). I dont
know why it takes that long, but I think it shouldnt. I think even with
a for loop in C++, I can get this done in say few minutes.
So anyone has any idea/advice/suggestion?
Thanks so much in advance and Happy New Year to all of you.
D.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.