I sent both emails and may have confused things: 1. PIDS.has_key(ID) returns True/False. you need to make sure the dictionary has the key before you fetch PIDS[NotAKey] and get a KeyError. 2. line.split() splits at and removes whitespace, leaving commas. line.split(",") splits at and removes commas.
On Thu, Oct 14, 2010 at 13:43, Adam Lucas <ademloo...@gmail.com> wrote: > Whoops: > > 1) dictionary.has_key() ??? > 2) I don't know if it's a typo or oversight, but there's a comma in you > dictionary key, line.split(',')[0]. > 3) Forget the database if it's part of a larger workflow unless your job is > to adapt a biological workflow database for your lab. > > > > On Thu, Oct 14, 2010 at 09:48, Ara Kooser <ghashsn...@gmail.com> wrote: > >> Morning all, >> >> I took the pseudocode that Emile provided and tried to write a python >> program. I may have taken the pseudocode to literally. >> >> So what I wrote was this: >> xml = open("final.txt",'r') >> gen = open("final_gen.txt",'r') >> >> PIDS = {} >> for proteinVals in gen: >> >> ID = proteinVals.split()[0] >> PIDS[ID] = proteinVals >> >> print PIDS >> >> for line in xml: >> ID = proteinVals.split()[1] >> rslt = "%s,%s"% (line,PIDS[ID]) >> print rslt >> >> So the first part I get. I read in gen that has this format as a text >> file: >> >> *Protein ID, Locus Tag, Start/Stop* >> ZP_05482482, StAA4_010100030484, >> complement(NZ_ACEV01000078.1:25146..40916) >> ZP_07281899, SSMG_05939, complement(NZ_GG657746.1:6565974..6581756) >> ZP_05477599, StAA4_010100005861, NZ_ACEV01000013.1:86730..102047 >> ... >> Put that into a dictionary with a key that is the Protein ID at position 0 >> in the dictionary. >> >> The second part reads in the file xml which has this format: >> >> *Species, Protein ID, E Value, Length* >> Streptomyces sp. AA4, ZP_05482482, 2.8293600000000001e-140, 5256, >> Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256, >> Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256, >> Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260, >> Streptomyces sp. AA4, ZP_07281899, 8.2369599999999995e-138, 5260, >> .... >> *same protein id multiple entries >> >> The program splits the file and does something with the 1 position which >> is the proten id in the xml file. After that I am not really sure what is >> happening. I can't remember what the %s means. Something with a string? >> >> When this runs I get the following error: >> Traceback (most recent call last): >> File "/Users/ara/Desktop/biopy_programs/merge2.py", line 18, in <module> >> rslt = "%s,%s"% (line,PIDS[ID]) >> KeyError: 'StAA4_010100017400,' >> >> From what I can tell it's not happy about the dictionary key. >> >> In the end I am looking for a way to merge these two files and for each >> protein ID add the locus tag and start/stop like this: >> *Species, Protein ID, Locus Tag, E Value, Length*, *Start/Stop* >> >> Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, >> 2.8293600000000001e-140, 5256, complement(NZ_ACEV01000078.1:25146..40916) >> Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, >> 8.0333299999999997e-138, 5256, complement(NZ_ACEV01000078.1:25146..40916) >> Streptomyces sp. AA4, ZP_05482482, StAA4_010100030484, 1.08889e-124, 5256, >> complement(NZ_ACEV01000078.1:25146..40916) >> Streptomyces sp. AA4, ZP_07281899, SSMG_05939, 2.9253900000000001e-140, >> 5260, complement(NZ_GG657746.1:6565974..6581756) >> Streptomyces sp. AA4, ZP_07281899, SSMG_05939, 8.2369599999999995e-138, >> 5260, complement(NZ_GG657746.1:6565974..6581756) >> >> Do you have any suggestions for how to proceed. It feels like I am getting >> closer. :) >> >> >> Note: >> When I change this part of the code to 0 >> for line in xml: >> ID = proteinVals.split()[0] >> rslt = "%s,%s"% (line,PIDS[ID]) >> print rslt >> >> I get the following output: >> Streptomyces sp. AA4, ZP_05482482, 8.0333299999999997e-138, 5256, >> ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 >> >> >> Streptomyces sp. AA4, ZP_05482482, 1.08889e-124, 5256, >> ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 >> >> >> Streptomyces sp. AA4, ZP_07281899, 2.9253900000000001e-140, 5260, >> ,ZP_05479896, StAA4_010100017400, NZ_ACEV01000043.1:241968..>242983 >> >> Which seems closer but all it's doing is repeating the same Locus Tag and >> Start/Stop for each entry. >> >> Thank you! >> >> Ara >> >> >> -- >> Quis hic locus, quae regio, quae mundi plaga. Ubi sum. Sub ortu solis an >> sub cardine glacialis ursae. >> >> _______________________________________________ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor >> >> > > > -- > Data is not information, information is not knowledge, knowledge is not > understanding, understanding is not wisdom. > --Clifford Stoll > -- Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom. --Clifford Stoll
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor