Hello all,
I am working on merging two text files with fields separated by
commas.
The files are in this format:
File ONE:
*Species, Protein ID, E value, Length* Streptomyces sp. AA4,
ZP_05482482, 2.8293600000000001e-140, 5256, Streptomyces sp. AA4,
ZP_05482482, 8.0333299999999997e-138, 5256, Streptomyces sp. AA4,
ZP_05482482, 1.08889e-124, 5256, Streptomyces sp. AA4, ZP_07281899,
2.9253900000000001e-140, 5260,
File TWO:
*Protein ID, Locus Tag, Start/Stop*
ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916) ZP_07281899, SSMG_05939,
complement(NZ_GG657746.1:6565974..6581756)
I looked around for other posts about merging text files and I have this
program:
one = open("final.txt",'r')
two = open("final_gen.txt",'r')
merge = open("merged.txt",'w')
merge.write("Species, Locus_Tag, E_value, Length, Start/Stop\n")
for line in one:
print(line.rstrip() + two.readline().strip())
merge.write(str([line.rstrip() + two.readline().strip()]))
merge.write("\n")
merge.close()
inc = file("merged.txt","r")
outc = open("final_merge.txt","w")
for line in inc:
line = line.replace('[','')
line = line.replace(']','')
line = line.replace('{','')
line = line.replace('}','')
outc.write(line)
inc.close()
outc.close()
one.close()
two.close()
This does merge the files.
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
5256,ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
ZP_05482482, 8.0333299999999997e-138, 5256,ZP_05477599,
StAA4_010100005861, NZ_ACEV01000013.1:86730..102047
But file one has multiple instances of the same Protein ID such as
ZP_05482482. So the data doesn't line up anymore. I would like the
program to search for each Protein ID number and write the entry from
file 2 in each place and then move on to the next ID number.
Example of desired output:
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
2.8293600000000001e-140, 5256,
complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
ZP_05482482, StAA4_010100030484, 8.0333299999999997e-138, 5256,
complement(NZ_ACEV01000078.1:25146..40916)
I was thinking about writing the text files into a dictionary and then
searching for each ID and then insert the content from file TWO into
where the IDs match. But I am not sure how to start. Is there a more
pythony way to go about doing this?
Thank you for your time and help.
Regards,
Ara