Hey all,
It's mostly solved. The program prints out to the screen just fine except
for the new line return. Here is what I ended up using:
#Merges two files into one using dictionaries
xml = open("final.txt",'r')
gen = open("final_gen.txt",'r')
PIDS = {}
for proteinVals in g
I sent both emails and may have confused things:
1. PIDS.has_key(ID) returns True/False. you need to make sure the dictionary
has the key before you fetch PIDS[NotAKey] and get a KeyError.
2. line.split() splits at and removes whitespace, leaving commas.
line.split(",") splits at and removes comma
Adam,
I am going to try and sort through the pseudocode you provided to see if
I can get things up and running that way has well. This a part of a larger
workflow thing and needs to be in the format that I have. Sticking all this
into a database is down the road a ways.
*for every line in ONE:
Emile,
I modified the code to this:
for line in xml:
ID = line.split()[1]
rslt = "%s,%s"% (line,PIDS[ID])
print rslt
and ended up with this error:
Traceback (most recent call last):
File "/Users/ara/Desktop/biopy_programs/merge2.py", line 16, in
rslt = "%s,%s"% (line,PIDS[ID])
Ke
Whoops:
1) dictionary.has_key() ???
2) I don't know if it's a typo or oversight, but there's a comma in you
dictionary key, line.split(',')[0].
3) Forget the database if it's part of a larger workflow unless your job is
to adapt a biological workflow database for your lab.
On Thu, Oct 14, 2010
Either way; nest the for loops and index with protein IDs or dictionary one
file and write the other with matches to the dictionary:
non-python pseudocode:
for every line in TWO:
get the first protein ID
for every line in ONE:
if the second protein ID is the same as the first:
On 10/14/2010 7:48 AM Ara Kooser said...
Morning all,
I took the pseudocode that Emile provided and tried to write a python
program. I may have taken the pseudocode to literally.
So what I wrote was this:
xml = open("final.txt",'r')
gen = open("final_gen.txt",'r')
PIDS = {}
for proteinVals
Morning all,
I took the pseudocode that Emile provided and tried to write a python
program. I may have taken the pseudocode to literally.
So what I wrote was this:
xml = open("final.txt",'r')
gen = open("final_gen.txt",'r')
PIDS = {}
for proteinVals in gen:
ID = proteinVals.split()[0]
Morning all,
I took the pseudocode that Emile provided and tried to write a python
program. I may have taken the pseudocode to literally.
So what I wrote was this:
xml = open("final.txt",'r')
gen = open("final_gen.txt",'r')
PIDS = {}
for proteinVals in gen:
ID = proteinVals.split()[0]
Thank you for all of the advice. I am going to try the dictionary route
first thing tomorrow.
This code is a part of larger code theat: 1) quires the BLAST database using
BioPython 2) parses the data using BioPython, 3) dumps to text files 4) then
merges the text files and sorts them. Somewhere do
"Ara Kooser" wrote
I was thinking about writing the text files into a dictionary and
then
searching for each ID and then insert the content from file TWO into
where
the IDs match. But I am not sure how to start. Is there a more
pythony way
to go about doing this?
Thats exactly how I would
Dear Ara,
I have been working on something similar.
In the end I used a dictionary for each line in the file, and stored
data from each file in a different set. I then matched using one (or
more) element from each dictionary. This is really very close doing a
join in a database, though, and i
On 10/13/2010 1:16 PM Ara Kooser said...
Hello all,
I am working on merging two text files with fields separated by commas.
The files are in this format:
File ONE:
*Species, Protein ID, E value, Length*
Streptomyces sp. AA4, ZP_05482482, 2.82936001e-140, 5256,
Streptomyces sp. AA4, Z
On Wed, 13 Oct 2010 14:16:21 -0600, Ara Kooser wrote:
> Hello all,
>
> I am working on merging two text files with fields separated by
> commas.
> The files are in this format:
>
> File ONE:
> *Species, Protein ID, E value, Length* Streptomyces sp. AA4,
> ZP_05482482, 2.82936001e-140
14 matches
Mail list logo