On 10/02/2012 17:08, Peter Otten wrote:
Spyros Charonis wrote:

Dear python community,

I have a file where I store sequences that each have a header. The
structure of the file is as such:

sp|(some code) =>1st header
ATTTTGGCGG
MNKPLOI
.....
.....

sp|(some code) =>  2nd header
AAAAAA
GGGG ...
.........

......

I am looking to implement a logical structure that would allow me to group
each of the sequences (spread on multiple lines) into a single string. So
instead of having the letters spread on multiple lines I would be able to
have 'ATTTTGGCGGMNKP....' as a single string that could be indexed.

This snipped is good for isolating the sequences (=stripping headers and
skipping blank lines) but how could I concatenate each sequence in order
to get one string per sequence?

for line in align_file:
...     if line.startswith('>sp'):
...             continue
...     elif not line.strip():
...             continue
...     else:
...             print line

(... is just OS X terminal notation, nothing programmatic)

Many thanks in advance.

Instead of printing the line directly collect it in a list (without trailing
"\n"). When you encounter a line starting with">sp" check if that list is
non-empty, and if so print "".join(parts), assuming the list is called
parts, and start with a fresh list. Don't forget to print any leftover data
in the list once the for loop has terminated.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


The advice from Peter is sound if the strings could grow very large but you can simply concatenate the parts if they are not. For the indexing simply store your data in a dict.

--
Cheers.

Mark Lawrence.

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to