Hi Ana, This seems to work. It shouldn't be too hard to do the renaming and reordering of columns.
output11.frq<-read.table(text="CHR SNP A1 A2 MAF NCHROBS 1 1:775852:T:C T C 0.1707 3444 1 1:1120590:A:C C A 0.08753 3496 1 1:1145994:T:C C T 0.1765 3496 1 1:1148494:A:G A G 0.1059 3464 1 1:1201155:C:T T C 0.07923 3496", header=TRUE,stringsAsFactors=FALSE) marker_info<-read.csv(text="1,742429,SNP_A-1909444,ss66079302,rs3094315,36.2,G,A,C,T,A,GCACAGCAAGAGAAAC[A/G]TTTGACAGAGAATACA,Sty,+,-,y,,,127,phs000018 1,769185,SNP_A-4303947,ss66273559,rs4040617,36.2,A,G,A,G,A,GCTGTGAGAGAGAACA[A/G]TGTCCCAATTTTGCCC,Sty,+,+,n,,,127,phs000018 1,775852,SNP_A-1886933,ss66317030,rs2980300,36.2,T,C,A,G,A,GAATGACTGTGTCTCT[C/T]TGAGTTAGTGAAGTCA,Nsp,-,+,y,,,127,phs000018 1,782343,SNP_A-2236359,ss66185183,rs2905036,36.2,C,T,C,T,A,CTCGATTTGTGTTCAA[C/T]ATATTTCATTTGTACC,Sty,-,-,n,,,127,phs000018 1,1201155,SNP_A-2205441,ss66174584,rs4245756,36.2,C,T,C,T,A,CCAGTGCTTTCAACCA[C/T]ACTCACTTTTCACTGT,Sty,+,+,n,,,127,phs000018", header=FALSE,stringsAsFactors=FALSE) # create new columns for the merge output11.frq$match_col<-unlist(lapply(lapply(strsplit(output11.frq$SNP,":"),"[", 1:2), paste,collapse=":")) marker_info$match_col<-apply(t(marker_info[,1:2]),2,paste,collapse=":") # merge to get the result newout<-merge(output11.frq,marker_info[,c("V5","match_col")],by="match_col") Jim On Tue, Mar 31, 2020 at 11:09 AM Ana Marija <sokovic.anamar...@gmail.com> wrote: > > I have a file like this: (has 308545 lines) > > head output11.frq > CHR SNP A1 A2 MAF NCHROBS > 1 1:775852:T:C T C 0.1707 3444 > 1 1:1120590:A:C C A 0.08753 3496 > 1 1:1145994:T:C C T 0.1765 3496 > 1 1:1148494:A:G A G 0.1059 3464 > 1 1:1201155:C:T T C 0.07923 3496 > ... > > And another file (marker-info) which has the first 24 commented lines > and is comma separated that looks like this (has total of 500593 > lines): > > > 1,742429,SNP_A-1909444,ss66079302,rs3094315,36.2,G,A,C,T,A,GCACAGCAAGAGAAAC[A/G]TTTGACAGAGAATACA,Sty,+,-,y,,,127,phs000018 > > 1,769185,SNP_A-4303947,ss66273559,rs4040617,36.2,A,G,A,G,A,GCTGTGAGAGAGAACA[A/G]TGTCCCAATTTTGCCC,Sty,+,+,n,,,127,phs000018 > > 1,775852,SNP_A-1886933,ss66317030,rs2980300,36.2,T,C,A,G,A,GAATGACTGTGTCTCT[C/T]TGAGTTAGTGAAGTCA,Nsp,-,+,y,,,127,phs000018 > > 1,782343,SNP_A-2236359,ss66185183,rs2905036,36.2,C,T,C,T,A,CTCGATTTGTGTTCAA[C/T]ATATTTCATTTGTACC,Sty,-,-,n,,,127,phs000018 > > 1,1201155,SNP_A-2205441,ss66174584,rs4245756,36.2,C,T,C,T,A,CCAGTGCTTTCAACCA[C/T]ACTCACTTTTCACTGT,Sty,+,+,n,,,127,phs000018 > ... > > I want to replace in output11.frq second column with the 5th column in > marker-info that has the matching value in 1st and 2nd column so for > this example the result of the output11.frq would look like this: > > 1 rs2980300 T C 0.1707 3444 > 1 rs4245756 T C 0.07923 3496 > > I tried doing this in bash but I got empty file: > > vi tst.awk > NR==FNR { map[$1,$2]=$5; next } > ($1,$4) in map { $2=map[$1,$4]; print } > awk -f tst.awk FS=',' marker-info FS='\t' output11.frq > output11X.frq > > Can this be done in R? > > Thanks > Ana > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.