On Jun 3, 2012, at 3:22 PM, Kai Mx wrote:

Hi all,
probably really simple to solve, but having no background in programming I
haven't been able to figure this out: I have two dataframes like

df1 <- data.frame(names1=c('aa','ab', 'ac', 'ad'), var1=c(1,5,7,12))
df2 <- data.frame(names2=c('aa', 'ab', 'ac', 'ad', 'ae'),
var2=c(3,6,9,12,15))

Now I want merge var1 to df2 by matching the dataframes by the 'names'
columns, i.e. something like

df3 <- merge (df2, df1, by.x='names2', by.y='names1', all.x=T)

However, the original dataframes have quite a lot of columns and I thought
that I should be able to address the var1 column by something like
df1$var[[df2$name2]].

Well there is no df1$var object or even a column with that reference. Even if you meant to type `var1`, the object df1$var[[df2$name2]] would not make much sense, since that would still be a failed attempt to access an named vector and df1$var1 is not named.

> names( df1$var1)
NULL

The "[[" operation is different than the "[" operation. "[[" returns only one item. "[" returns multiple items. In the case of dataframes (of which df1$var1 is _not_ an example) , "[[" returns one entire column as a vector. If you had been trying to access a named vector using the 'names' in the character vector df1$names1 and there were any matches then you might have had some success with '['.

Even then there are gators in the swamp.

vec1 <- c(aa=3, gx =10,  ac=4, cc = 12)
vec1[df1$names1]
aa gx ac cc
 3 10  4 12

WTF?

Well, by default R's dataframes construct factor variables for character arguments and have an underlying numeric representation, so by the time df1$names got coerced it ended up as 1,2,3,4 and referenced all of vec1. These other methods would return something appropriate:

vec1[as.character(df1$names1)]
  aa <NA>   ac <NA>
   3   NA    4   NA


vec1[which(names(vec1) %in% df1$names1)]
aa ac
 3  4

I happen to think that returning NA is unfortunate in the first inatance, but I did not construct the language and there must have been some good reason to make it that way.

Could somebody please enlighten me and/or maybe
suggest a short tutorial for the extraction operator?

Arguments to "[" can be numeric, character, or logical. If numeric, it will return values at the sequence locations along the referenced object. If character, it will return the matched items with those names. if logical, the call will return those items for which the index is TRUE (and there will be argument recycling, so this will return every second item in df1$var1

> df1$var1[c(FALSE, TRUE)]
[1]  5 12


Spend some time working through the examples on ?Extract and then re- reading that help page at least three times, although I probably took me ten or twenty times to get a pretty good grasp of it. The material there is accurate and precise, but the subtleties are numerous.

--

David Winsemius, MD
West Hartford, CT

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to