Is this what you are after? > pos V1 V2 V3 V4 V5 V6 1 c22 1445 - CG 1 4 2 c22 1542 + CG 2 3 3 c22 1678 + CG 13 15 > reg V1 V2 V3 V4 V5 V6 V7 1 c22 1440 1500 cpg: 44 56 ...... 2 c22 1520 1700 cpg: 56 87 ...... 3 c22 1800 1900 cpg: 58 90 ...... > # iterate through the 'reg' printing put match 'pos' entries > result <- lapply(seq(nrow(reg)), function(i){ + # get indices of match + indx <- (pos$V2 >= reg$V2[i]) & (pos$V2 <= reg$V3[i]) + if (!any(indx)) return(NULL) # no match + # create new dataframe + cbind(reg[rep(i, sum(indx)), 1:3], pos[indx, ]) + }) > do.call(rbind, result) V1 V2 V3 V1 V2 V3 V4 V5 V6 1 c22 1440 1500 c22 1445 - CG 1 4 2 c22 1520 1700 c22 1542 + CG 2 3 2.1 c22 1520 1700 c22 1678 + CG 13 15 >
On Mon, May 23, 2011 at 12:00 AM, ajn21 <aj...@case.edu> wrote: > Hello, > > I was hoping that someone would be able to help me or at least point me in > the right direction regarding a problem I am having. I am a new R user, and > I've been trying to read tutorials but they haven't been much help to me so > far. > > The problem is relatively simple as I've already created working solutions > in Java and Perl, but I need a solution in R as well. > > I have two text files, say pos.txt and reg.txt. In pos.txt, the data is > listed for example: > > c22 1445 - CG 1 4 > c22 1542 + CG 2 3 > c22 1678 + CG 13 15 > ... > > etc. for thousands of lines. The most important column is column 2, which > lists "position" (e.g. 1445, 1542, 1678). In reg.txt, data is listed as: > > c22 1440 1500 cpg: 44 56 ...... > c22 1520 1700 cpg: 56 87 ...... > c22 1800 1900 cpg: 58 90 ...... > ... > > where the values in column 2 is the "start" position and values in column 3 > are the "end" position. There are 10 columns total but I just listed the > first few. Also, the text files are different lengths. > > > Essentially, my problem is trying to take the position listed in column 2 of > pos.txt and try to find the region (based on start and end positions) listed > in reg.txt. Then I need to print: > > c22 "start" "end" "position" + 1 5 > > where the last 3 columns are from pos.txt as well (i.e. all of the lines > don't end in + 1 5, but rather the values for the columns in pos.txt). > Also, the position needs to be within the start and end position. > > So far I've been able to use read.table to create a data frame for each text > file, and I've also named each column (e.g. reg.data$end) and I can output > each column individually. However, the problem I keep facing is how to > compare the numbers for "position" in pos.txt to the numbers for "start" and > "end" in reg.txt. I tried to use: > > if ((pos >= start) | (pos <= end)).. > > but an error comes up that says the files aren't the same length. > > In Java and Perl I used nested loops to cycle through each element in one > file, and compare it to every element in the other file, and then printed to > a new text file. As such, I was trying to learn a bit more about arrays in > R, but if you know of a better way in R to do this then please let me know. > > Any help is greatly appreciated. > > Thank you, > AJ > > -- > View this message in context: > http://r.789695.n4.nabble.com/Help-with-isolating-and-comparing-data-from-two-files-tp3543170p3543170.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.