Perhaps, if R FAQ 7.31 isn't a problem, this would work. (df.1$AffyIds)[match(df.2$rMF, df.1$rMF)]
Michael On Wed, Nov 16, 2011 at 1:11 PM, Rob Griffin <robgriffin...@hotmail.com> wrote: > As another potential route could I put something in to the original code > that makes df.2 (maindata2) which picks one of the AffyIds at random for the > duplicated FlybaseCG values (shown below) > > maindata2<-aggregate(maindata[,c(161,172,168,255,254,258,264,265,263,271)], > by = maindata[,167, drop = F], mean) > > Rob > > -----Original Message----- From: Rob Griffin > Sent: Wednesday, November 16, 2011 4:35 PM > To: Dennis Murphy > Cc: r-help@r-project.org > Subject: Re: [R] create list of names where two df contain == values > > Ok, thanks for looking in to this so far, I seem to have confused you all a > little though so I think I need to make this a bit clearer: > > in the real situation: > df.1 is 271*13891, and contains (amongst others) columns with Flybase.CG, > rMF, and Affyid values. > df.2 is 14*12572 and is made from subset of df.1 which removed rows with > duplicated Flybase.CG values, and df.2 also includes the rMF column > because df.2 is made from the non-duplicated values it is shorter. > > I now need to put the Affyid column from df.1 in to df.2 - > > My idea is: > to match a value on each row that is unique to that row (within column) but > shared on both datasets - rMF contains such numbers > then get R to copy the corresponding Affyid value (an alphanumeric id) from > df.1 and place it in df.2$Affy (or at least in to a list which I could then > put in to a column) with all "shared" rMF values and ignore all others > > for example df.1 and df.2 both contain the rMF value 0.3393211 which > corresponds to the same data point which in df.1 has this Affyid: 1638273_at > > if you imagine the two rMF columns lined up next to each other they start > the same and run in the same order, but df.2's has had "random" points > removed as was the aim of making df.2, so as soon as you get to that point > the rest of the list doesn't line up. > What R needs to do is go down the df.2 rMF list one by one, and for each > df.2 rMF check the entire df.1 rMF list for a match, then take the > corresponding Affyid. > > for example df.1 and df.2 both contain the rMF value 0.3393211 > which corresponds to the same sample point which in df.1 has this > Affyid: 1638273_at but they occur on different rows in the data frame. > > is that a bit clearer? I know this is pretty complex. > > David, your idea with ifelse worked for the first few lines then as soon as > it got to a point where one of the Flybase.CG values had been removed during > the process of making df.2 it got out of line between the data frames and > just gave NA after there. > > > Rob > > > > > > -----Original Message----- From: Dennis Murphy > Sent: Wednesday, November 16, 2011 4:03 PM > To: Rob Griffin > Cc: r-help@r-project.org > Subject: Re: [R] create list of names where two df contain == values > > Hi: > > I think you're overthinking this problem. As is usually the case in R, > a vectorized solution is clearer and provides more easily understood > code. > > It's not obvious to me exactly what you want, so we'll try a couple of > variations on the same idea. Equality of floating point numbers is a > difficult computational problem (see R FAQ 7.31), but if it makes > sense to define a threshold difference between floating numbers that > practically equates to zero, then you're in business. In your example, > the difference in numb1 for letter h in the two data frames is far > from zero, so define 'equal' to be a difference < 10 ^{-6}. Then: > > # Return the entire matching data frame > df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, ] > Letters numb1 extra.col id > 1 a 0.3735462 1 CG234 > 2 b 1.1836433 2 CG232 > 3 c 0.1643714 3 CG441 > 4 d 2.5952808 4 CG128 > 5 e 1.3295078 5 CG125 > 6 f 0.1795316 6 CG182 > 7 g 1.4874291 7 CG982 > 9 i 1.5757814 9 CG282 > 10 j 0.6946116 10 CG154 > > # Return the matching letters only as a vector: > df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, 'Letters' ] > > If you want the latter object to remain a data frame, use drop = FALSE > as an extra argument after 'Letters'. If you want to create a list > object such that each letter comprises a different list component, > then the following will do - the as.character() part coerces the > factor Letters into a character object: > > as.list(as.character(df.1[abs(df.1$numb1 - df.2$numb1) < 0.000001, > 'Letters' ])) > > HTH, > Dennis > > > On Wed, Nov 16, 2011 at 5:03 AM, Rob Griffin <robgriffin...@hotmail.com> > wrote: >> >> Hello again... sorry to be posting yet again, but I hadn't anticipated >> this >> problem. >> >> I am trying to now put the names found in one column in data frame 1 (lets >> call it df.1[,1]) in to a list from the rows where the values in df.1[,2] >> match values in a column of another dataframe (df.2[3]) >> I tried to write this function so that it put the list of names (called >> Iffy) where the 2 criteria (df.1[141] and df.2[21]) matched but I think >> its >> too complex for a beginner R-enthusiast >> >> ify<-function(x,y,a,b,c) if(x[[,a]]==y[[,b]]) {list(x[[,c]])} else {NULL} >> Iffy<-apply( df.1, 1, FUN=ify, x=df.1, y=df.2, a=2, b=3, c=1 ) >> >> But this didn't work... Error in FUN(newX[, i], ...) : unused argument(s) >> (newX[, i]) >> >> >> Here is a dataset that replicates the problem, you'll notice the "h" >> criteria values are different between the two dataframes and therefore it >> would produce a list of the 9 letters where the two criteria columns >> matched (a,b,c,d,e,f,g,i,j): >> >> >> >> df.1<-data.frame(rep(letters[1:10])) >> colnames(df.1)[1]<-("Letters") >> set.seed(1) >> df.1$numb1<-rnorm(10,1,1) >> df.1$extra.col<-c(1,2,3,4,5,6,7,8,9,10) >> >> df.1$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") >> df.1 >> >> df.2<-data.frame(rep(letters[1:10])) >> colnames(df.2)[1]<-("Letters") >> set.seed(1) >> df.2$extra.col<-c(1,2,3,4,5,6,7,8,9,10) >> df.2$numb1<-rnorm(10,1,1) >> >> df.2$id<-c("CG234","CG232","CG441","CG128","CG125","CG182","CG982","CG541","CG282","CG154") >> df.2[8,3]<-12 >> >> df.1 >> df.2 >> >> >> >> >> Your patience is much appreciated, >> Rob >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.