Whoops: 1) dictionary.has_key() ??? 2) I don't know if it's a typo or oversight, but there's a comma in you dictionary key, line.split(',')[0]. 3) Forget the database if it's part of a larger workflow unless your job is to adapt a biological workflow database for your lab.
On Thu, Oct 14, 2010 at 09:48, Ara Kooser <ghashsn...@gmail.com> wrote: > Morning all, > > I took the pseudocode that Emile provided and tried to write a python > program. I may have taken the pseudocode to literally. > > So what I wrote was this: > xml = open("final.txt",'r') > gen = open("final_gen.txt",'r') > > PIDS = {} > for proteinVals in gen: > > ID = proteinVals.split()[0] > PIDS[ID] = proteinVals > > print PIDS > > for line in xml: > ID = proteinVals.split()[1] > rslt = "%s,%s"% (line,PIDS[ID]) > print rslt > > So the first part I get. I read in gen that has this format as a text file: > > *Protein ID, Locus Tag, Start/Stop* > ZP_05482482, StAA4_010100030484, complement(NZ_ACEV01000078.1:25146..40916) > ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756) > ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047 > ... > Put that into a dictionary with a key that is the Protein ID at position 0 > in the dictionary. > > The second part reads in the file xml which has this format: > > *Species, Protein ID, E Value, Length* > Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256, > Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256, > Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256, > Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260, > Streptomyces sp. AA4, ZP_07281899, 8.2369599999999995e-138, 5260, > .... > *same protein id multiple entries > > The program splits the file and does something with the 1 position which is > the proten id in the xml file. After that I am not really sure what is > happening. I can't remember what the %s means. Something with a string? > > When this runs I get the following error: > Traceback (most recent call last): > File "/Users/ara/Desktop/biopy_programs/merge2.py", line 18, in <module> > rslt = "%s,%s"% (line,PIDS[ID]) > KeyError: 'StAA4_010100017400,' > > From what I can tell it's not happy about the dictionary key. > > In the end I am looking for a way to merge these two files and for each > protein ID add the locus tag and start/stop like this: > *Species, Protein ID, Locus Tag, E Value, Length*, *Start/Stop* > > Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, > 2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916) > Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, > 8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916) > Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, 1.08889e-124, 5256, > complement(NZ_ACEV01000078.1:25146..40916) > Streptomyces sp. AA4, ZP_07281899, SSMG_05939, 2.9253900000000001e-140, > 5260, complement(NZ_GG657746.1:6565974..6581756) > Streptomyces sp. AA4, ZP_07281899, SSMG_05939, 8.2369599999999995e-138, > 5260, complement(NZ_GG657746.1:6565974..6581756) > > Do you have any suggestions for how to proceed. It feels like I am getting > closer. :) > > > Note: > When I change this part of the code to 0 > for line in xml: > ID = proteinVals.split()[0] > rslt = "%s,%s"% (line,PIDS[ID]) > print rslt > > I get the following output: > Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256, > ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 > > > Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256, > ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 > > > Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260, > ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 > > Which seems closer but all it's doing is repeating the same Locus Tag and > Start/Stop for each entry. > > Thank you! > > Ara > > > -- > Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an > sub cardine glacialis ursae. > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > > -- Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. --Clifford Stoll
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor