Dear Ara,

I have been working on something similar.

In the end I used a dictionary for each line in the file, and stored data from each file in a different set. I then matched using one (or more) element from each dictionary. This is really very close doing a join in a database, though, and if I had more time you might want to explore that route (csv -> sqlite, manipulate using sqlobject/ sqlalchemy/ django/ etc.) the csv module has some good facilities for reading/ writing csv files. However, as yet I don't think it, or csvutilities, lets you do the sort of merging you say.

HTH,

Matt

Robert Jackiewicz wrote:
On Wed, 13 Oct 2010 14:16:21 -0600, Ara Kooser wrote:

Hello all,

  I am working on merging two text files with fields separated by
  commas.
The files are in this format:

File ONE:
*Species, Protein ID, E value, Length* Streptomyces sp. AA4,
ZP_05482482, 2.8293600000000001e-140, 5256, Streptomyces sp. AA4,
ZP_05482482, 8.0333299999999997e-138, 5256, Streptomyces sp. AA4,
ZP_05482482, 1.08889e-124, 5256, Streptomyces sp. AA4, ZP_07281899,
2.9253900000000001e-140, 5260,

File TWO:
*Protein ID, Locus Tag, Start/Stop*
ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916) ZP_07281899, SSMG_05939,
complement(NZ_GG657746.1:6565974..6581756)

I looked around for other posts about merging text files and I have this
program:
one = open("final.txt",'r')
two = open("final_gen.txt",'r')

merge = open("merged.txt",'w')
merge.write("Species,  Locus_Tag,  E_value,  Length, Start/Stop\n")

for line in one:
     print(line.rstrip() + two.readline().strip())
     merge.write(str([line.rstrip() + two.readline().strip()]))
     merge.write("\n")
merge.close()

inc = file("merged.txt","r")
outc = open("final_merge.txt","w")
for line in inc:
    line = line.replace('[','')
    line = line.replace(']','')
    line = line.replace('{','')
    line = line.replace('}','')
    outc.write(line)

inc.close()
outc.close()
one.close()
two.close()

This does merge the files.
Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140,
5256,ZP_05482482, StAA4_010100030484,
complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
ZP_05482482, 8.0333299999999997e-138, 5256,ZP_05477599,
StAA4_010100005861, NZ_ACEV01000013.1:86730..102047

But file one has multiple instances of the same Protein ID such as
ZP_05482482. So the data doesn't line up anymore.  I would like the
program to search for each Protein ID number and write the entry from
file 2 in each place and then move on to the next ID number.

Example of desired output:
Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484,
2.8293600000000001e-140, 5256,
complement(NZ_ACEV01000078.1:25146..40916) Streptomyces sp. AA4,
ZP_05482482, StAA4_010100030484, 8.0333299999999997e-138, 5256,
complement(NZ_ACEV01000078.1:25146..40916)

I was thinking about writing the text files into a dictionary and then
searching for each ID and then insert the content from file TWO into
where the IDs match. But I am not sure how to start. Is there a more
pythony way to go about doing this?

Thank you for your time and help.

Regards,
Ara

Why don't you try using the csv library which is part of the standard python library to parse you files. It allows simple and efficient manipulation of comma separated value files.

-Rob Jackiewicz

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to