Hello all, I am working on merging two text files with fields separated by commas. The files are in this format:
File ONE: *Species, Protein ID, E value, Length* Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256, Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256, Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256, Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260, File TWO: *Protein ID, Locus Tag, Start/Stop* ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916) ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756) I looked around for other posts about merging text files and I have this program: one = open("final.txt",'r') two = open("final_gen.txt",'r') merge = open("merged.txt",'w') merge.write("Species, Locus_Tag, E_value, Length, Start/Stop\n") for line in one: print(line.rstrip() + two.readline().strip()) merge.write(str([line.rstrip() + two.readline().strip()])) merge.write("\n") merge.close() inc = file("merged.txt","r") outc = open("final_merge.txt","w") for line in inc: line = line.replace('[','') line = line.replace(']','') line = line.replace('{','') line = line.replace('}','') outc.write(line) inc.close() outc.close() one.close() two.close() This does merge the files. Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256,ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256,ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047 But file one has multiple instances of the same Protein ID such as ZP_05482482. So the data doesn't line up anymore. I would like the program to search for each Protein ID number and write the entry from file 2 in each place and then move on to the next ID number. Example of desired output: Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, 2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, 8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916) I was thinking about writing the text files into a dictionary and then searching for each ID and then insert the content from file TWO into where the IDs match. But I am not sure how to start. Is there a more pythony way to go about doing this? Thank you for your time and help. Regards, Ara -- Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an sub cardine glacialis ursae.
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor