On Sep 6, 2012, at 6:42 AM, Sam Steingold wrote:

>> * David Winsemius <qjvafrz...@pbzpnfg.arg> [2012-09-05 21:02:16 -0700]:
>> 
>> On Sep 5, 2012, at 8:51 PM, Sam Steingold wrote:
>> 
>>> I have a list of data frames:
>>> 
>>>> str(data)
>>> List of 4
>>> $ :'data.frame':    700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':    700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':    700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200130446465779" "200070050127778"
>>> "200030633708779" "200010587002779" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 ...
>>> $ :'data.frame':    700773 obs. of  3 variables:
>>> ..$ V1: chr [1:700773] "200160325893778" "200130647544079"
>>> "200130446465779" "200120186959078" ...
>>> ..$ V2: int [1:700773] 0 0 0 0 0 0 0 0 0 0 ...
>>> ..$ V3: num [1:700773] 1 1 1 1 1 1 1 1 1 1 ...
>>> 
>>> I want to merge them.
>> 
>> Why? What are you expecting?
> 
> these are the results of applying a model to the test data.
> the first column is the ID

In which case you should be using the 'by' argument to `merge` and specifying 
that it is only the first column that is to be used. In the curent situation 
merge will attempt to use all of the columns because the three columns have the 
same names.

Notice the difference in these results:

> merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) )
[1] a b
<0 rows> (or 0-length row.names)

> merge( data.frame(a=1:3, b=5:7), data.frame(a=1:3, b=10:12) , by=1)
  a b.x b.y
1 1   5  10
2 2   6  11
3 3   7  12

(`merge` "by" arguments can be column numbers.)


> the second column is the actual value
> the third column is the model score
> 
> after I will merge the frames, I will
> 1. check that all the V2 columns are identical and drop all but one
> (I guess I could just merge on c("V1","V2") instead, right?)

Depends what you want. I already suggested you only what to merge on the id 
column.

> 
> 2. compute the sum (or the mean, whatever is easier) of all the V3
> columns

`aggregate should do that without difficulty.
> 
> 3. sort by the sum/mean of the V3 columns and evaluate the combined
> model using the lift quality metric
> (http://dl.acm.org/citation.cfm?id=380995.381018)

That's going to require more background (or more money since they want $15.00 
for a pdf.


> 
> I have many more score files (not just 4), so it is not practical for me
> to rename the column to something unique.

Which column?

> 
> 
> 
> -- 
> Sam Steingold (http://sds.podval.org/) on Ubuntu 12.04 (precise) X 
> 11.0.11103000
> http://www.childpsy.net/ http://www.memritv.org http://truepeace.org
> http://jihadwatch.org http://mideasttruth.com http://americancensorship.org
> To be popular with ladies one has to be smart, handsome & rich. Or to be a 
> cat.

David Winsemius, MD
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to