Hi

This question is far less simple than the title suggests, please read 
carefully, thanks.

I have 2 sets of data, both read into R

>data1<-read.table ("1.txt", header=T, sep="\t")
>data2<-read.table ("2.txt", header=T, sep="\t")

>data1

Taxon   stage1   stage2   stage3   stage4
T1          0          0          1          1
T2          0          1          1          0
T3          0          0          0          1
T4          1          0          0          0


>data2 # this is a library file, it contains all possible values of stage 
>(Col_1) that may be contained in the data1 file (headers of each column), and 
>what they correspond to  
           # in the Col_2 ie stages 1:2 == Group1

Col_1        Col_2
Stage1      Group1
Stage2      Group1
Stage3      Group2
Stage4      Group2

 I want to get R to combine the columns in data1 based on the information in 
data2 (Col_2), eg in this instance reduce the columns in data1 from 4 to 2, 
summing up the 
 values within each column of data1 to get the result below

Taxon   group1   group2

T1          0          1

T2          1          1

T3          0          1

T4          1          0

i have many datasets which have different numbers of stage eg one dataset will 
have stage1-10, another will have stage15-35 (data2, Col_2 has all possilbe 
stage values so will say what group they correspond to)

so far i can isolate the rows of data2 which contains the stages in data1 with 
this:

> data1.names<-names(data1[,-1])                        #take the header names 
> from data1 minus the 1st column (this is not found in the data2 library file)
> row.numbers<-match(data1.names, data2[,1])     #match the vector containing 
> the data1 column header names to those found in the library file of data2
> data2.small<-data2[row.numbers]                       #reduce the data2 to 
> only include the same stages as found in the data1 file 

 from here on i dont know what to, really i wanted to just be able to change 
the header names of data1 to their corresponding name that is found in Col_2 
and then use some statement that could merge columns in data1 which were the 
same (and also sum the values at each row and dividing by their value if they 
were greater than 1 (so i only have 0 or 1 again) but i dont know how to do 
that.

Can someone help me to get the desired result  (as in the example above) that 
doe not require me to manually merge columns? ie get the example output in an 
automated way that could take any version of the data1 file (ie with different 
stage values) and using the data2 file (library file - same in each instance) 
get the output similar as in the example above?


Thanks

Martin








                                          
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to