[Tutor] Script for Parsing string sequences from a file

2011-04-15 Thread Spyros Charonis
Hello,

I'm doing a biomedical degree and am taking a course on bioinformatics. We
were given a raw version of a public database in a file (the file is in
simple ASCII) and need to extract only certain lines containing important
information. I've made a script that does not work and I am having trouble
understanding why.

when I run it on the python shell, it prompts for a protein name but then
reports that there is no such entry. The first while loop nested inside a
for loop is intended to pick up all lines beginning with "gc;", chop off the
"gc;" part and keep only the text after that (which is a protein name).
 Then it scans the file and collects all lines, chops the "gc;" and stores
in them in a tuple. This tuple is not built correctly, because as I posted
when the program is run it reports that it cannot find my query in the tuple
I created and it is certainly in the database. Can you detect what the
mistake is? Thank you in advance!

Spyros


myParser.py
Description: Binary data
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Deleting strings from a line

2011-04-26 Thread Spyros Charonis
Hello,

I've written a script that scans a biological database and extracts some
information. A sample of output from my script is as follows:

LYLGILLSHAN  AA3R_SHEEP26331

 LYMGILLSHAN  AA3R_HUMAN26431

 MCLGILLSHANAA3R_RAT26631

 LLVGILLSHAN  AA3R_RABIT26531

The leftmost strings are the ones I want to keep, while I would like to get
rid of the ones to the right (AA3R_SHEEP, 263 61) which are just indicators
of where the sequence came from and genomic coordinates. Is there any way to
do this with a string processing command? The loop which builds my list goes
like this:

 for line in query_lines:
if line.startswith('fd;'):  # find motif sequences
#print "Found an FD for your query!",
line.rstrip().lstrip('fd;')
print line.lstrip('fd;')
motif.append(line.rstrip().lstrip('fd;'))

Is there a del command I can use to preserve only the actual sequences
themselves. Many thanks in advance!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Filtering out unique list elements

2011-05-03 Thread Spyros Charonis
Dear All,

I have built a list with multiple occurrences of a string after some text
processing that goes something like this:

[cat, dog, cat, cat, cat, dog, dog, tree, tree, tree, bird, bird, woods,
woods]

I am wondering how to truncate this list so that I only print out the unique
elements, i.e. the same list but with one occurrence per element:

[cat, dog, tree, bird, woods]

Any help much appreciated!

Regards,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] triple-nested for loop not working

2011-05-04 Thread Spyros Charonis
Hello everyone,

I have written a program, as part of a bioinformatics project, that extracts
motif sequences (programmatically just strings of letters) from a database
and writes them to a file.
I have written another script to annotate the database file (in plaintext
ASCII format) by replacing every match of a motif with a sequence of tildes
(~).  Primitive I know, but not much more can be done with ASCII files.  The
code goes as follows:


motif_file = open('myfolder/pythonfiles/final motifs_11SGLOBULIN', 'r')   #
=> final motifs_11sglobulin contains the output of my first program
align_file = open('myfolder/pythonfiles/11sglobulin.seqs', 'a+')  #
=> 11sglobulin.seqs is the ASCII sequence alignment file which I want to
"annotate" (modify)

finalmotif_seqs = []
finalmotif_length = []  # store length of each motif
finalmotif_annot = []

for line in finalmotifs:
finalmotif_seqs.append(line)
mot_length = len(line)
finalmotif_length.append(mot_length)

for item in finalmotif_length:
annotation = '~' * item
finalmotif_annot.append(annotation)

finalmotifs = motif_file.readlines()
seqalign = align_file.readlines()

for line in seqalign:
for i in len(finalmotif_seqs):  # for item in finalmotif_seqs:
for i in len(finalmotif_annot): # for item in finalmotif_annot:
if finalmotif_seqs[i] in line:  # if item in line:
newline = line.replace(finalmotif_seqs[i],
finalmotif_annot[i])
#sys.stdout.write(newline)   # => print the lines out on
the shell
align_file.writelines(newline)

motif_file.close()
align_file.close()


My coding issue is that although the script runs, there is a logic error
somewhere in the triple-nested for loop as I when I check my file I'm
supposedly modifying there is no change. All three lists are built correctly
(I've confirmed this on the Python shell). Any help would be much
appreciated!
I am running Python 2.6.5
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Printing output from Python program to HTML

2011-05-10 Thread Spyros Charonis
Hello everyone,

I have a Python script that extracts some text from a database file and
annotates another file,
writing the results to a new file. Because the files I am annotating are
ASCII,
I am very restricted as to how I can annotate the text, and I would like to
instead
write the results to HTML so that I can annotate my file in more visually
effective ways,e.g. by changing text color
where appropriate.  My program extracts text from a database, reads a file
that is to be annotated, and writes those
annotations to a newly created (.htm) file
I include the following headers at the beginning of my program:

print "Content-type:text/html\r\n\r\n"
print ''
print ''

The part of the program that finds the entry I want and produces the
annotation is about
80 lines down and goes as follow:

file_rmode = open('/myfolder/alignfiles/query1, 'r')
file_amode = open('/myfolder/alignfiles/query2, 'a+')

file1 = motif_file.readlines() # file has been created in code not shown
file2 = file_rmode.readlines()

for line in seqalign:
   for item in finalmotifs:
   item = item.strip().upper()
   if item in line:
  newline = line.replace(item, "  item
 ") # compiler complains here about the word "red"
  # sys.stdout.write(newline)
  align_file_amode.write(line)

print ''
print ''

motif_file.close()
align_file_rmode.close()
align_file_amode.close()

The Python compiler complains on the line I try to change the font color,
saying "invalid syntax".  Perhaps I
need to import the cgi module to make this a full CGI program? (I have
configured my Apache server). Or alternatively, my HTML code is messed up,
but I
am pretty sure this is more or less a simple task.

I am working in Python 2.6.5. Many thanks in advance

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Printing output from Python program to HTML

2011-05-10 Thread Spyros Charonis
Thanks, very simple but I missed that because it was supposed be in HTML
code!

On Tue, May 10, 2011 at 1:16 PM, Spyros Charonis wrote:

> Hello everyone,
>
> I have a Python script that extracts some text from a database file and
> annotates another file,
> writing the results to a new file. Because the files I am annotating are
> ASCII,
> I am very restricted as to how I can annotate the text, and I would like to
> instead
> write the results to HTML so that I can annotate my file in more visually
> effective ways,e.g. by changing text color
> where appropriate.  My program extracts text from a database, reads a file
> that is to be annotated, and writes those
> annotations to a newly created (.htm) file
> I include the following headers at the beginning of my program:
>
> print "Content-type:text/html\r\n\r\n"
> print ''
> print ''
>
> The part of the program that finds the entry I want and produces the
> annotation is about
> 80 lines down and goes as follow:
>
> file_rmode = open('/myfolder/alignfiles/query1, 'r')
> file_amode = open('/myfolder/alignfiles/query2, 'a+')
>
> file1 = motif_file.readlines() # file has been created in code not shown
> file2 = file_rmode.readlines()
>
> for line in seqalign:
>for item in finalmotifs:
>item = item.strip().upper()
>if item in line:
>   newline = line.replace(item, "  item
>  ") # compiler complains here about the word "red"
>   # sys.stdout.write(newline)
>   align_file_amode.write(line)
>
> print ''
> print ''
>
> motif_file.close()
> align_file_rmode.close()
> align_file_amode.close()
>
> The Python compiler complains on the line I try to change the font color,
> saying "invalid syntax".  Perhaps I
> need to import the cgi module to make this a full CGI program? (I have
> configured my Apache server). Or alternatively, my HTML code is messed up,
> but I
> am pretty sure this is more or less a simple task.
>
> I am working in Python 2.6.5. Many thanks in advance
>
> Spyros
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
Hello,

I know I posted the exact same topic a few hours ago and I do apologize for
this, but my script had a careless error, and my real issue is somewhat
different.
 I have a Python script that extracts some text from a database file and
annotates another file, writing the results to a new file. Because the files
I am annotating are ASCII,
I am very restricted as to how I can annotate the text, and I would like to
instead write the results to HTML so that I can annotate my file in more
visually effective ways,e.g. by changing text color where appropriate.  My
program extracts text from a database, reads a file that is to be annotated,
and writes those
annotations to a newly created (.htm) file.

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines()

# These two files have been created in code that I don't show here because
it is not relevant to the issue

align_file_appmode.write('')
align_file_appmode.write('')

align_file_appmode.write
('
\'query_\' Multiple Sequence Alignment
 ')

align_file_appmode.write('')
align_file_appmode.write('')

for line in seqalign:
align_file_appmode.write(' \'line\' ')
for item in finalmotifs:
item = item.strip().upper()
if item in line:

newline = line.replace
(item, '  \'item\' ')

align_file_appmode.write(newline)

align_file_appmode.write('')
align_file_appmode.write('')

motif_file.close()
align_file_rmode.close()
align_file_appmode.close()

The .htm file that is created is not what I intend it to be, it has the word
"item"
printed every couple lines because I assume I'm not passing the string
 sequence that I want to output correctly.

QUESTION
Basically, HTML (or the way I wrote my code) does not understand that with
the
escape character '\item\' I am trying to print a string and not the word
"item".
Is there someway to correct that or would I have to use
something like XML to create a markup system that specifically describes my
data?

I am aware Python supports multiline strings (using the format ''' text ''')
but I do want my HTML ( or XML?)
to be correctly rendered before I consider making this into a CGI program.
Built in python 2.6.5
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
Hi all,

No need to post answers, I figured out where my mistake was.

Spyros

On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis wrote:

> Hello,
>
> I know I posted the exact same topic a few hours ago and I do apologize for
> this, but my script had a careless error, and my real issue is somewhat
> different.
>  I have a Python script that extracts some text from a database file and
> annotates another file, writing the results to a new file. Because the
> files I am annotating are ASCII,
> I am very restricted as to how I can annotate the text, and I would like to
> instead write the results to HTML so that I can annotate my file in more
> visually effective ways,e.g. by changing text color where appropriate.  My
> program extracts text from a database, reads a file that is to be annotated,
> and writes those
> annotations to a newly created (.htm) file.
>
> finalmotifs = motif_file.readlines()
> seqalign = align_file_rmode.readlines()
>
> # These two files have been created in code that I don't show here because
> it is not relevant to the issue
>
> align_file_appmode.write('')
> align_file_appmode.write('')
>
> align_file_appmode.write
> ('
> \'query_\' Multiple Sequence Alignment
>  ')
>
> align_file_appmode.write('')
> align_file_appmode.write('')
>
> for line in seqalign:
> align_file_appmode.write(' \'line\' ')
> for item in finalmotifs:
> item = item.strip().upper()
> if item in line:
>
> newline = line.replace
> (item, '  \'item\' ')
>
> align_file_appmode.write(newline)
>
> align_file_appmode.write('')
> align_file_appmode.write('')
>
> motif_file.close()
> align_file_rmode.close()
> align_file_appmode.close()
>
> The .htm file that is created is not what I intend it to be, it has the
> word "item"
> printed every couple lines because I assume I'm not passing the string
>  sequence that I want to output correctly.
>
> QUESTION
> Basically, HTML (or the way I wrote my code) does not understand that with
> the
> escape character '\item\' I am trying to print a string and not the word
> "item".
> Is there someway to correct that or would I have to use
> something like XML to create a markup system that specifically describes my
> data?
>
> I am aware Python supports multiline strings (using the format ''' text
> ''') but I do want my HTML ( or XML?)
> to be correctly rendered before I consider making this into a CGI program.
> Built in python 2.6.5
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem with printing Python output to HTML Correctly

2011-05-10 Thread Spyros Charonis
A SOLUTION TO THE PROBLEM I POSTED:

align_file_rmode =
open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query1, 'r')
align_file_appmode =
open('/Users/spyros/folder1/python/printsmotifs/alignfiles/' + query2, 'a+')

finalmotifs = motif_file.readlines()
seqalign = align_file_rmode.readlines()

for line in seqalign:
#align_file_appmode.write(' \'line\' ')
for item in finalmotifs:
item = item.strip().upper()
annotation = ""+item+""
if item in line:
newline = line.replace(item, annotation)
# sys.stdout.write(newline)
align_file_appmode.write(newline)

motif_file.close()
align_file_rmode.close()
align_file_appmode.close()

the line

annotation = ""+item+""

added a span and set the color in CSS.

On Tue, May 10, 2011 at 6:14 PM, Spyros Charonis wrote:

> Hi all,
>
> No need to post answers, I figured out where my mistake was.
>
> Spyros
>
>
> On Tue, May 10, 2011 at 5:11 PM, Spyros Charonis wrote:
>
>> Hello,
>>
>> I know I posted the exact same topic a few hours ago and I do apologize
>> for this, but my script had a careless error, and my real issue is somewhat
>> different.
>>  I have a Python script that extracts some text from a database file and
>> annotates another file, writing the results to a new file. Because the
>> files I am annotating are ASCII,
>> I am very restricted as to how I can annotate the text, and I would like
>> to instead write the results to HTML so that I can annotate my file in more
>> visually effective ways,e.g. by changing text color where appropriate.  My
>> program extracts text from a database, reads a file that is to be annotated,
>> and writes those
>> annotations to a newly created (.htm) file.
>>
>> finalmotifs = motif_file.readlines()
>> seqalign = align_file_rmode.readlines()
>>
>> # These two files have been created in code that I don't show here because
>> it is not relevant to the issue
>>
>> align_file_appmode.write('')
>> align_file_appmode.write('')
>>
>> align_file_appmode.write
>> ('
>> \'query_\' Multiple Sequence Alignment
>>  ')
>>
>> align_file_appmode.write('')
>> align_file_appmode.write('')
>>
>> for line in seqalign:
>> align_file_appmode.write(' \'line\' ')
>> for item in finalmotifs:
>> item = item.strip().upper()
>> if item in line:
>>
>> newline = line.replace
>> (item, '  \'item\' ')
>>
>> align_file_appmode.write(newline)
>>
>> align_file_appmode.write('')
>> align_file_appmode.write('')
>>
>> motif_file.close()
>> align_file_rmode.close()
>> align_file_appmode.close()
>>
>> The .htm file that is created is not what I intend it to be, it has the
>> word "item"
>> printed every couple lines because I assume I'm not passing the string
>>  sequence that I want to output correctly.
>>
>> QUESTION
>> Basically, HTML (or the way I wrote my code) does not understand that with
>> the
>> escape character '\item\' I am trying to print a string and not the word
>> "item".
>> Is there someway to correct that or would I have to use
>> something like XML to create a markup system that specifically describes
>> my data?
>>
>> I am aware Python supports multiline strings (using the format ''' text
>> ''') but I do want my HTML ( or XML?)
>> to be correctly rendered before I consider making this into a CGI program.
>> Built in python 2.6.5
>>
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] String Processing Query

2011-05-16 Thread Spyros Charonis
I have a file with the following contents:

>from header1
abcdefghijkl
mnopqrs
tuvwxyz
*
>from header2
poiuytrewq
lkjhgfdsa
mnbvcxz
*

My string processing code goes as follows:

file1=open('/myfolder/testfile.txt')
scan = file1.readlines()

string1 = ' '
for line in scan:
if line.startswith('>from'):
continue
if line.startswith('*'):
continue
string1.join(line.rstrip('\n'))

This code produces the following output:

'abcdefghijkl'
'mnopqrs'
'tuvwxyz'
'poiuytrewq'
'lkjhgfdsa'
'mnbvcxz'

I would like to know if there is a way to get the following
output instead:

'abcdefghijklmnopqrstuvwxyz'

'poiuytrewqlkjhgfdsamnbvcxz'

I'm basically trying to concatenate the strings
in order to produce 2 separate lines
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Indexing a List of Strings

2011-05-17 Thread Spyros Charonis
Greetings Python List,

I have a motif sequence (a list of characters e.g. 'EAWLGHEYLHAMKGLLC')
whose index I would like to return.
The list contains 20 strings, each of which is close to 1000 characters long
making it far too cumbersome to display an example.
I would like to know if there is a way to return a pair of indices, one
index where my sequence begins (at 'E' in the above case) and
one index where my sequence ends (at 'C' in the above case). In short, if
'EAWLGHEYLHAMKGLLC' spans 17 characters is it possible
to get something like 100 117, assuming it begins at 100th position and goes
up until 117th character of my string. My loop goes as
follows:

for item in finalmotifs:
for line in my_list:
if item in line:
print line.index(item)

But this only returns a single number (e.g 119), which is the index at which
my sequence begins.

Is it possible to get a pair of indices that indicate beginning and end of
substring?

Many thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] STRING PROC

2011-05-20 Thread Spyros Charonis
Hello List,

A quick string processing query. If I have an entry in a list such as
['>NAME\n'],
is there a way to split it into two separate lines:

>
NAME
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Logical Structure of Snippet

2011-05-23 Thread Spyros Charonis
Hello List,

I'm trying to read some sequence files and modify them to a particular
format. These files are structured something like:

>P1; ICA1_HUMAN
AAEVDTG. (A very long sequence of letters)
>P1;ICA1_BOVIN
TRETG(A very long sequence of letters)
>P1;ICA2_HUMAN
WKH.(another sequence)

I read a database file which has information that I need to modify my
sequence files.
I must extract one of the data fields from the database (done this)
and place it in the sequence file (structure shown above). The relevant
database fields go like:

tt; ICA1_HUMAN   Description
tt; ICA1_BOVIN Description
tt; ICA2_HUMAN   Description

What I would like is to extract the tt; fields (I already have code for
that) and then to read
through the sequence file and insert the TT field corresponding to the >P1
header right underneath
the >P1 header. Basically, I need a newline everytime >P1 occurs in the
sequence file and I need to paste
its corresponding TT field in that newline (for P1; ICA1_HUMAN,that would be
 ICA1_HUMAN   Description, etc).

the pseudocode would go like this:

for line sequence file:
   if line.startswith('>P1; ICA )
   make a newline
   go to list with extracted tt; fields*
   find the one with the same query (tt; ICA1 ...)*
   insert this field in the newline

The steps marked * are the ones I am not sure how to implement. What
logical structure would I need to make Python match a tt; field (I already
have
the list of entries) whenever it finds a header with the same content?

Apologies for the verbosity, but I did want to be clear as it is quite
specific.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Concatenating multiple lines into one

2012-02-10 Thread Spyros Charonis
Dear python community,

I have a file where I store sequences that each have a header. The
structure of the file is as such:

>sp|(some code) =>1st header
AGGCGG
MNKPLOI
.
.

>sp|(some code) => 2nd header
AA
 ...
.

..

I am looking to implement a logical structure that would allow me to group
each of the sequences (spread on multiple lines) into a single string. So
instead of having the letters spread on multiple lines I would be able to
have 'AGGCGGMNKP' as a single string that could be indexed.

This snipped is good for isolating the sequences (=stripping headers and
skipping blank lines) but how could I concatenate each sequence in order to
get one string per sequence?

>>> for line in align_file:
... if line.startswith('>sp'):
... continue
... elif not line.strip():
... continue
... else:
... print line

(... is just OS X terminal notation, nothing programmatic)

Many thanks in advance.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Concatenating multiple lines into one

2012-02-12 Thread Spyros Charonis
Thanks for all the help, Peter's and Hugo's methods worked well in
concatenating multiple lines into a single data structure!

S

On Fri, Feb 10, 2012 at 5:30 PM, Mark Lawrence wrote:

> On 10/02/2012 17:08, Peter Otten wrote:
>
>> Spyros Charonis wrote:
>>
>>  Dear python community,
>>>
>>> I have a file where I store sequences that each have a header. The
>>> structure of the file is as such:
>>>
>>>  sp|(some code) =>1st header
>>>>
>>> AGGCGG
>>> MNKPLOI
>>> .
>>> .
>>>
>>>  sp|(some code) =>  2nd header
>>>>
>>> AA
>>>  ...
>>> .
>>>
>>> ..
>>>
>>> I am looking to implement a logical structure that would allow me to
>>> group
>>> each of the sequences (spread on multiple lines) into a single string. So
>>> instead of having the letters spread on multiple lines I would be able to
>>> have 'AGGCGGMNKP' as a single string that could be indexed.
>>>
>>> This snipped is good for isolating the sequences (=stripping headers and
>>> skipping blank lines) but how could I concatenate each sequence in order
>>> to get one string per sequence?
>>>
>>>  for line in align_file:
>>>>>>
>>>>> ... if line.startswith('>sp'):
>>> ... continue
>>> ... elif not line.strip():
>>> ... continue
>>> ... else:
>>> ... print line
>>>
>>> (... is just OS X terminal notation, nothing programmatic)
>>>
>>> Many thanks in advance.
>>>
>>
>> Instead of printing the line directly collect it in a list (without
>> trailing
>> "\n"). When you encounter a line starting with">sp" check if that list is
>> non-empty, and if so print "".join(parts), assuming the list is called
>> parts, and start with a fresh list. Don't forget to print any leftover
>> data
>> in the list once the for loop has terminated.
>>
>> __**_
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>>
>>
> The advice from Peter is sound if the strings could grow very large but
> you can simply concatenate the parts if they are not.  For the indexing
> simply store your data in a dict.
>
> --
> Cheers.
>
> Mark Lawrence.
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] List Indexing Issue

2012-05-08 Thread Spyros Charonis
Hello python community,

I'm having a small issue with list indexing. I am extracting certain
information from a PDB (protein information) file and need certain fields
of the file to be copied into a list. The entries look like this:

ATOM   1512  N   VAL A 222   8.544  -7.133  25.697  1.00 48.89
  N
ATOM   1513  CA  VAL A 222   8.251  -6.190  24.619  1.00 48.64
  C
ATOM   1514  C   VAL A 222   9.528  -5.762  23.898  1.00 48.32
  C

I am using the following syntax to parse these lines into a list:

charged_res_coord = [] # store x,y,z of extracted charged resiudes
for line in pdb:
if line.startswith('ATOM'):
atom_coord.append(line)

for i in range(len(atom_coord)):
for item in charged_res:
if item in atom_coord[i]:
charged_res_coord.append(atom_coord[i].split()[1:9])


The problem begins with entries such as the following.

ROW1)   ATOM   1572  NH2 ARG A 228   7.890 -13.328  16.363  1.00 59.63
  N

ROW2)   ATOM   1617  N   GLU A1005  11.906  -2.722   7.994  1.00 44.02
  N

Here, the code that I use to extract the third spatial coordinate (the last
of the three consecutive non-integer values) produces a problem:

because 'A1005' (second row) is considered as a single list entry, while
'A' and '228' (first row) are two list entries, when I
use a loop to index the 7th element it extracts '16.363' (entry I want) for
first row and 1.00 (not entry I want) for the second row.

>>> charged_res_coord[1]
['1572', 'NH2', 'ARG', 'A', '228', '7.890', '-13.328', '16.363']

>>> charged_res_coord[10]
['1617', 'N', 'GLU', 'A1005', '11.906', '-2.722', '7.994', '1.00']


The loop I use goes like this:

for i in range(len(lys_charged_group)):
lys_charged_group[i][7] = float(lys_charged_group[i][7])

The [7] is the problem - in lines that are like ROW1 the code extracts the
correct value,
but in lines that are like ROW2 the code extracts the wrong value.
Unfortunately, the different formats of rows are interspersed
so I don't know if I can solve this using text processing routines? Would I
have to use regular expressions?

Many thanks for your help!

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Parsing data from a set of files iteratively

2012-05-18 Thread Spyros Charonis
Dear Python community,

I have a set of ~500 files which I would like to run a script on. My script
extracts certain information and
generates several lists with items I need. For one of these lists, I need
to combine the information from all
500 files into one super-list. Is there a way in which I can iteratively
execute my script over all 500 files
and get them to write the list I need into a new file? Many thanks in
advance for your time.

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-19 Thread Spyros Charonis
I have tried the following two snippets which both results in the same
error

import os, glob
os.chdir('users/spyros/desktop/3NY8MODELSHUMAN/')
homology_models = glob.glob('*.pdb')
for i in range(len(homology_models)):
python serialize_PIPELINE_models.py homology_models[i]

import os, sys
path = "/users/spyros/desktop/3NY8MODELSHUMAN/
dirs = os.listdir(path)
for file in dirs:
python serialize_PIPELINE_models.py

The error, respectively for each snipped, read:

File "", line 2
python serialize_PIPELINE_models.py homology_models[i]
   ^
SyntaxError: invalid syntax

 File "", line 2
python serialize_PIPELINE_models.py
   ^
SyntaxError: invalid syntax

In the first snippet, the final line reads:
'python' (calling the interpreter) 'serialize_PIPELINE_models.py' (calling
my python program) 'homology_models[i]' (the file to run it on)

the glob.glob routine returns a list of files, so maybe python does not
allow the syntax "python (call interpreter)" "list entry" ?

Many thanks.
Spyros



On Fri, May 18, 2012 at 7:57 PM, Alan Gauld wrote:

> On 18/05/12 19:23, Spyros Charonis wrote:
>
>> Dear Python community,
>>
>> I have a set of ~500 files which I would like to run a script on.
>>
> > ...Is there a way in which I can iteratively execute my script
> > over all 500 files
>
> Yes.
> You could use os.walk() or the glob module depending on whether
> the files are in a folder heirarchy or a single folder.
>
> That will give you access to each file.
> Put your functionality into a function taking a single file
> as input and a list to which you append the new data.
> Call that function for each file in turn.
>
> Try that and if you get stuck come back with a more specific question, the
> code you used and the full error text.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-27 Thread Spyros Charonis
Returning to this original problem, I have modified my program from a
single long procedure to
3 functions which do the following:

serialize_pipeline_model(f): takes as input a file, reads it and parses
coordinate values
(numerical entries in the file) into a list

write_to_binary(): writes the generated list to a binary file (pickles it)

read_binary(): unpickles the aggregate of merged lists that should be one
large list.

The code goes like so:

**
z_coords1 = []

def serialize_pipeline_model(f):
  
  .
  #  z_coords1 = [] has been declared global
global z_coords1
charged_groups = lys_charged_group + arg_charged_group + his_charged_group
+ asp_charged_group + glu_charged_group
for i in range(len(charged_groups)):
z_coords1.append(float(charged_groups[i][48:54]))

#print z_coords1
return z_coords1

import pickle, shelve
print '\nPickling z-coordinates list'

def write_to_binary():
""" iteratively write successively generated z_coords1 to a binary file """
f = open("z_coords1.dat", "ab")
pickle.dump(z_coords1, f)
f.close()
return

def read_binary():
""" read the binary list """
print '\nUnpickling z-coordinates list'
f = open("z_coords1.dat", "rb")
z_coords1=pickle.load(f)
print(z_coords1)
f.close()
return

### LOOP OVER DIRECTORY
for f in
os.listdir('/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels/'):
serialize_pipeline_model(f)
write_to_binary()

read_binary()
print '\n Z-VALUES FOR ALL CHARGED RESIDUES'
print z_coords1
**

The problem is that the list (z_coords1) returns as an empty list. I know
the code works (too large to post here)
in a procedural format (z_coords1 can be generated correctly), so as a
diagnostic I included a print statement
in the serialize function to see that the list that is generated for each
of the 500 files.

Short of some intricacy with the scopes of the program I may be missing, I
am not sure why this is happening? Deos anybody have
any ideas? Many thanks for your time.

Best regards,
Spyros


On Fri, May 18, 2012 at 7:23 PM, Spyros Charonis wrote:

> Dear Python community,
>
> I have a set of ~500 files which I would like to run a script on. My
> script extracts certain information and
> generates several lists with items I need. For one of these lists, I need
> to combine the information from all
> 500 files into one super-list. Is there a way in which I can iteratively
> execute my script over all 500 files
> and get them to write the list I need into a new file? Many thanks in
> advance for your time.
>
> Spyros
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-29 Thread Spyros Charonis
FINAL SOLUTION:

### LOOP OVER DIRECTORY
location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
zdata = []
for filename in os.listdir(location):
filename = os.path.join(location, filename)
try:
zdata.extend(extract_zcoord(filename))
except NameError:
print "No such file!"
except SyntaxError:
print "Check Your Syntax!"
except IOError:
print "PDB file NOT FOUND!"
else:
continue

print 'Z-VALUES FOR ALL CHARGED RESIDUES'
print zdata #diagnostic

### WRITE Z-COORDINATE LIST TO A BINARY FILE
import pickle

f1 = open("z_coords1.dat", "wb")
pickle.dump(zdata, f1)
f1.close()

f2 = open("z_coords1.dat", "rb")
zdata1 = pickle.load(f2)
f2.close()

assert zdata == zdata1, "error in pickle/unpickle round trip!"

On Wed, May 30, 2012 at 1:09 AM, Steven D'Aprano wrote:

> Steven D'Aprano wrote:
>
>  location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/'
>> zdata = []
>> for filename in os.listdir(location):
>>zdata.extend(get_zcoords(**filename))
>>
>
I only had the filename and not its path, that's why the system was not
able to locate the file, so
filename = os.path.join(location, filename) was used to solve that.

Many thanks to everyone for their time and efforts!

Spyros

>
>
> Hah, that can't work. listdir returns the name of the file, but not the
> file's path, which means that Python will only look in the current
> directory. You need something like this:
>
>
> location = '/Users/spyros/Desktop/**3NY8MODELSHUMAN/**HomologyModels/'
> zdata = []
> for filename in os.listdir(location):
>zdata.extend(get_zcoords(os.**path.join(location, filename)))
>
>
> Sorry about that.
>
>
>
>
> --
> Steven
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Parsing data from a set of files iteratively

2012-05-30 Thread Spyros Charonis
On Wed, May 30, 2012 at 8:16 AM, Steven D'Aprano wrote:

> On Wed, May 30, 2012 at 07:00:30AM +0100, Spyros Charonis wrote:
> > FINAL SOLUTION:
>
> Not quite. You are making the mistake of many newbies to treat Python
> exceptions as a problem to be covered up and hidden, instead of as a
> useful source of information.
>
> To quote Chris Smith:
>
>"I find it amusing when novice programmers believe their main
>job is preventing programs from crashing. ... More experienced
>programmers realize that correct code is great, code that
>crashes could use improvement, but incorrect code that doesn't
>crash is a horrible nightmare."
>-- http://cdsmith.wordpress.com/2011/01/09/an-old-article-i-wrote/
> Ok, so basically wrong code beats useless code.
>
> There is little as painful as a program which prints "An error occurred"
> and then *keeps working*. What does this mean? Can I trust that the
> program's final result is correct? How can it be correct if an error
> occurred? What error occurred? How do I fix it?
>
My understanding is that an except clause will catch a relevant error and
raise an exception if there is one, discontinuing program execution.

>
> Exceptions are your friend, not your enemy. An exception tells you that
> there is a problem with your program that needs to be fixed. Don't
> cover-up exceptions unless you absolutely have to.


> Sadly, your indentation is still being broken when you post. Please
> ensure you include indentation, and disable HTML or "Rich Text" posting.
> I have tried to guess the correct indentation below, and fix it in
> place, but apologies if I get it wrong.
>
Yes, that is the way my code looks in a python interpreter

>
>
> > ### LOOP OVER DIRECTORY
> > location = '/Users/spyros/Desktop/3NY8MODELSHUMAN/HomologyModels'
> > zdata = []
> > for filename in os.listdir(location):
> > filename = os.path.join(location, filename)
> > try:
> > zdata.extend(extract_zcoord(filename))
> > except NameError:
> > print "No such file!"
>
> Incorrect. When a file is missing, you do not get NameError. This
> except-clause merely disguises programming errors in favour of a
> misleading and incorrect error message.
>
> If you get a NameError, your program has a bug. Don't just hide the bug,
> fix it.
>
>
> > except SyntaxError:
> > print "Check Your Syntax!"
>
> This except-clause is even more useless. SyntaxErrors happen when the
> code is compiled, not run, so by the time the for-loop is entered, the
> code has already been compiled and cannot possibly raise SyntaxError.
>
What I meant was, check the syntax of my pathname specification, i.e. check
that I
did not make a type when writing the path of the directory I want to scan
over. I realize
syntax has a much more specific meaning in the context of programming -
code syntax!

>
> Even if it could, what is the point of this? Instead of a useful
> exception traceback, which tells you not only which line contains the
> error, but even highlights the point of the error with a ^ caret, you
> hide all the useful information and tease the user with a useless
> message "Check Your Syntax!".
>
Ok, I didn't realize I was being so reckless - thanks for pointing that
out.

>
> Again, if your program raises a SyntaxError, it has a bug. Don't hide
> the bug, fix it.
>
>
> > except IOError:
> > print "PDB file NOT FOUND!"
>
> This, at least, is somewhat less useless than the others. At least it is
> a valid exception, and if your intention is to skip missing files,
> catching IOError is a reasonable way to do it.
>
> But you don't just get IOError for *missing* files, but also for
> *unreadable* files, perhaps because you don't have permission to read
> them, or perhaps because the file is corrupt and can't be read.
>
Understood, but given that I am reading and processing are standard ASCII
text files,
there is no good reason (which I can think of) that the files would be
*unreadable*
I verified that I had read/write permissions for all my files, which are
the default
access privileges anyway (for the owner).

>
> In any case, as usual, imagine yourself as the recipient of this
> message: "PDB file NOT FOUND!" -- what do you expect to do about it?
> Which file is missing or unreadable? How can you tell? Is this a
> problem? Are your results still valid without that PDB file's data?
>
Perhaps because I was writing the program I didn't think that this message
would
be confusing to others, but it did help in making clear that there 

[Tutor] indexing a list

2012-10-18 Thread Spyros Charonis
Hello pythoners,

I have a string that I want to read in fixed-length windows.

In [68]: SEQ
Out[68]:
'MKAAVLTLAVLFLTGSQARHFWQQDEPPQSPWDRVKDLATVYVDVLKDSGRDYVSQFEGSALGKQLNLKLLDNWDSVTSTFSKLREQLGPVTQEFWDNLEKETEGLRQEMSKDLEEVKAKVQPYLDDFQKKWQEEMELYRQKVEPLRAELQEGARQKLHELQEKLSPLGEEMRDRARAHVDALRTHLAPYSDELRQRLAARLEALKENGGARLAEYHAKATEHLSTLSEKAKPALEDLRQ'

I would like a function that reads the above string, 21 characters at a
time, and checks for certain conditions, i.e. whether characters co-occur
in other lists I have made. For example:

x = 21   # WINDOW LENGTH

In [70]: SEQ[0:x]
Out[70]: 'MKAAVLTLAVLFLTGSQARHF'

In [71]: SEQ[x:2*x]
Out[71]: 'WQQDEPPQSPWDRVKDLATVY'

In [72]: SEQ[2*x:3*x]
Out[72]: 'VDVLKDSGRDYVSQFEGSALG'

How could I write a function to automate this so that it does this from
SEQ[0] throughout the entire sequence, i.e. until len(SEQ)?

Many thanks for your time,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Understanding a linear runtime implementation of anagram detection

2015-12-10 Thread Spyros Charonis
Dear All,

I am learning about analysis of algorithms (python 2.7.6). I am reading a
book (Problem solving with Algorithms and Data Structures) where Python is
the language used for implementations. The author introduces algorithm
analysis in a clear and understandable way, and uses an anagram detection
program as a template to compare different runtime implementations
(quadratic, log linear, linear). In the linear, and most efficient
implementation, the code is as follows (comments added by me):

def anagram_test2(s1,s2):""" Checks if two strings are anagrams of each other
Runs with O(n) linear complexity """
if (not s1) or (not s2):
raise TypeError, "Invalid input: input must be string"
return None
# Initialize two lists of counters
c1 = [0] * 26
c2 = [0] * 26
# Iterate over each string# When a char is encountered, # increment
the counter at # its correspoding position   for i in range(len(s1)):
pos = ord(s1[i]) - ord("a")
c1[pos] += 1
for i in range(len(s2)):
pos = ord(s2[i]) - ord("a")
c2[pos] += 1

j = 0
hit = Truewhile j < 26 and hit:
if c1[j] == c2[j]:
j += 1
else:
hit = False
return hit


My questions are:

1)
Is it computationally more/less/equally efficient to use an explicit while
loop as it is to just do "return c1 === c2" (replacing the final code block
following the two for loops). I realize that this single line of code
performs an implicit for loop over each index to test for equality. My
guess is that because in other languages you may not be able to do this
simple test, the author wanted to present an example that could be adapted
for other languages, unless the explicit while loop is less expensive
computationally.

2)
How could I go about adapting this algorithm for multiple strings (say I
had 20 strings and wanted to check if they are anagrams of one another).

def are_anagrams(*args):

""" Accepts a tuple of strings and checks if

 they are anagrams of each other """


 # Check that neither of strings are null

 for i in args:

 if not i:

 raise TypeError, "Invalid input"

 return None



 # Initialize a list of counters for each string

 c = ( [] for i in range(len(args) ) ???

Many thanks in advance!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] pickle.dump yielding awkward output

2013-02-03 Thread Spyros Charonis
Hello Pythoners,

I am experiencing a strange result with the pickle module when using it to
write certain results to a separate file.

In short, I have a program that reads a file, finds lines which satisfy
some criteria, and extracts those lines, storing them in a list. I am
trying to write this list to a separate file.

The list of extracted lines looks like this:

ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
  N

ATOM  2  CA  GLN A   1  29.809  11.972  54.274  1.00 58.51
  C

ATOM  3  C   GLN A   1  28.376  11.536  54.029  1.00 55.13
  C

The output stored from the call to the pickle.dump method, however, looks
like this:

(lp0
S'ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
N  \r\n'
p1
aS'ATOM  2  CA  GLN A   1  29.809  11.972  54.274  1.00 58.51
C  \r\n'
p2
aS'ATOM  3  C   GLN A   1  28.376  11.536  54.029  1.00 55.13
C  \r\n'

The code I am using to write the output to an external file goes as follows:

def export_antibody_chains():
''' EXPORT LIST OF EXTRACTED CHAINS TO FILE '''
chains_file = open(query + '_Chains', 'wb')
pickle.dump(ab_chains, chains_file)  # ab_chains is global
chains_file.close()
return

Does anyone know why the strings lp0, S', aS' are showing up?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] pickle.dump yielding awkward output

2013-02-04 Thread Spyros Charonis
Thank you Alan, Steven,

I don't care about the characters from the pickle operation per se, I just
want the list to be stored in its native format.

What I am trying to do is basically the Unix shell equivalent of: "Unix
command" > newfile.txt

I am trying to store the list that I get from my code in a separate file,
in human-readable format.


On Mon, Feb 4, 2013 at 1:03 AM, Alan Gauld wrote:

> On 03/02/13 19:26, Spyros Charonis wrote:
>
>> I am experiencing a strange result with the pickle module when using it
>> to write certain results to a separate file.
>>
>
> The only strangec results using pickle would be if the uinpickle failed to
> bring back that which was pickled.
> Pickle is a storage format not a display format.
>
>
>  In short, I have a program that reads a file, finds lines which satisfy
>> some criteria, and extracts those lines, storing them in a list.
>>
>
> Extracting them with pickle I hope? That's the only thing that should be
> used to unpickle a pickled file.
>
>
>  The list of extracted lines looks like this:
>>
>> ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
>>  N
>>
>> The output stored from the call to the pickle.dump method, however,
>> looks like this:
>>
>> (lp0
>> S'ATOM  1  N   GLN A   1  29.872  13.384  54.754  1.00 60.40
>>N  \r\n'
>>
>
> Yep, I'm sure pickle can make sense of it.
>
>
>  Does anyone know why the strings lp0, S', aS' are showing up?
>>
>
> Because that's what pickle puts in there to help it unpickle it later.
>
> Why do you care? You shouldn't be looking at it (unless you want to
> understand how pickle works).
>
> pickle, as the name suggests, is intended for storing python objects
> for later use. This is often called object persistence in programming
> parlance. It is not designed for anything else.
>
> If you want cleanly formatted data in a file that you can read in a text
> editor or similar you need to do the formatting yourself or use another
> recognised format such as CSV or configparser (aka ini file).
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Text Processing Query

2013-03-14 Thread Spyros Charonis
Hello Pythoners,

I am trying to extract certain fields from a file that whose text looks
like this:

COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;

COMPND   3 CHAIN: A, B;

COMPND  10 MOL_ID: 2;

COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;

COMPND  12 CHAIN: D, F;

COMPND  13 ENGINEERED: YES;

COMPND  14 MOL_ID: 3;

COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;

COMPND  16 CHAIN: E, G;

I would like the chain IDs, but only those following the text heading
"ANTIBODY FAB FRAGMENT", i.e. I need to create a list with D,F,E,G  which
excludes A,B which have a non-antibody text heading. I am using the
following syntax:

with open(filename) as file:

scanfile=file.readlines()

for line in scanfile:

if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue

elif line[0:6]=='COMPND' and 'CHAIN' in line:

print line

But this yields:

COMPND   3 CHAIN: A, B;

COMPND  12 CHAIN: D, F;

COMPND  16 CHAIN: E, G;

I would like to ignore the first line since A,B correspond to non-antibody
text headings, and instead want to extract only D,F & E,G whose text
headings are specified as antibody fragments.

Many thanks,
Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Text Processing Query

2013-03-14 Thread Spyros Charonis
Yes, the elif line need to have **flag_FAB ==1** as is conidition instead
of **flag_FAB=1**. So:


for line in scanfile:

if line[0:6]=='COMPND' and 'FAB' in line: flag_FAB = 1

elif line[0:6]=='COMPND' and 'CHAIN' in line and flag_FAB == 1:

print line

flag_FAB = 0


On Thu, Mar 14, 2013 at 4:33 PM, Mark Lawrence wrote:

> On 14/03/2013 11:28, taserian wrote:
>
> Top posting fixed
>
>
>> On Thu, Mar 14, 2013 at 6:56 AM, Spyros Charonis > <mailto:s.charo...@gmail.com>> wrote:
>>
>> Hello Pythoners,
>>
>> I am trying to extract certain fields from a file that whose text
>> looks like this:
>>
>> COMPND   2 MOLECULE: POTASSIUM CHANNEL SUBFAMILY K MEMBER 4;
>> COMPND   3 CHAIN: A, B;
>> COMPND  10 MOL_ID: 2;
>> COMPND  11 MOLECULE: ANTIBODY FAB FRAGMENT LIGHT CHAIN;
>> COMPND  12 CHAIN: D, F;
>> COMPND  13 ENGINEERED: YES;
>> COMPND  14 MOL_ID: 3;
>> COMPND  15 MOLECULE: ANTIBODY FAB FRAGMENT HEAVY CHAIN;
>> COMPND  16 CHAIN: E, G;
>>
>> I would like the chain IDs, but only those following the text
>> heading "ANTIBODY FAB FRAGMENT", i.e. I need to create a list with
>> D,F,E,G  which excludes A,B which have a non-antibody text heading.
>> I am using the following syntax:
>>
>> with open(filename) as file:
>>
>>  scanfile=file.readlines()
>>
>>  for line in scanfile:
>>
>>  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: continue
>>
>>  elif line[0:6]=='COMPND' and 'CHAIN' in line:
>>
>>  print line
>>
>>
>> But this yields:
>>
>> COMPND   3 CHAIN: A, B;
>> COMPND  12 CHAIN: D, F;
>> COMPND  16 CHAIN: E, G;
>>
>> I would like to ignore the first line since A,B correspond to
>> non-antibody text headings, and instead want to extract only D,F &
>> E,G whose text headings are specified as antibody fragments.
>>
>> Many thanks,
>> Spyros
>>
>> Since the identifier and the item that you want to keep are on different
>> lines, you'll need to set a "flag".
>>
>> with open(filename) as file:
>>
>>  scanfile=file.readlines()
>>
>>  flag = 0
>>
>>  for line in scanfile:
>>
>>  if line[0:6]=='COMPND' and 'FAB FRAGMENT' in line: flag = 1
>>
>>  elif line[0:6]=='COMPND' and 'CHAIN' in line and flag = 1:
>>
>>  print line
>>
>>  flag = 0
>>
>>
>> Notice that the flag is set to 1 only on "FAB FRAGMENT", and it's reset
>> to 0 after the next "CHAIN" line that follows the "FAB FRAGMENT" line.
>>
>>
>> AR
>>
>>
>>
> Notice that this code won't run due to a syntax error.
>
> --
> Cheers.
>
> Mark Lawrence
>
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/**mailman/listinfo/tutor<http://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Arbitrary-argument set function

2013-10-01 Thread Spyros Charonis
Dear Pythoners,


I am trying to extract from a set of about 20 sequences, the characters
which are unique to each sequence. For simplicity, imagine I have only 3
"sequences" (words in this example) such as:


s1='spam'; s2='scam', s3='slam'


I would like the character that is unique to each sequence, i.e. I need my
function to return the list [ 'p', 'c', ',l' ]. This function I am using is
as follows:


def uniq(*args):

""" FIND UNIQUE ELEMENTS OF AN ARBITRARY NUMBER OF SEQUENCES"""

unique = []

for i in args[0]:

if i not in args[1:]:

   unique.append(i)

return unique


and is returning the list [ 's', 'p', 'a', 'm' ]. Any help much appreciated,


Best,

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] bubble sort function

2014-11-15 Thread Spyros Charonis
Dear group,


I'm having a bit of trouble with understanding why my bubble sort
implementation doesn't work. I've got the following function to perform a
bubble sort operation on a list of numbers:


def bubble_sort_ascending(unsorted):

  """ Sorts a list of numbers into ascending order """

   iterations = 0

   size = len(unsorted) - int(1)

   for i in range(0, size):

unsorted[i] = float(unsorted[i])

while unsorted[i] > unsorted[i+1]:

  # Use a tuple assignment in order to swap the value of
two variables

  unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]

  iterations += 1

  sorted_vec = unsorted[:] # copy unsorted which is now
sorted

  print "\nIterations completed: %s\n" %(iterations)

   return sorted_vec


Example: mylist = [4, 1, 7, 19, 13, 22, 17, 14, 23, 21]


When I call it as such bubble_sort_ascending(mylist), it returns the list
only partially sorted with 5 iterations reported, i.e.


[1, 4.0, 7.0, 13, 19.0, 17, 14, 22.0, 21, 23.0]


and I have to call it again for the the sorting operation to complete. Is
there something I am missing in my code? Why does it not sort the entire
list at once and just count all completed iterations?


Any help appreciated.


Many thanks,

Spyros
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] bubble sort function

2014-11-15 Thread Spyros Charonis
Thank you Alan,

When I initiated the loop with the condition:

for i in range(len(unsorted)):


Python raised an IndexError saying I had gone out of bounds. Hence the
change to:

for i in range(0, size)


Yes, I actually the loop only consists of:


while unsorted[i] > unsorted[i+1]:

# Use a tuple assignment in order to swap the value of two variables

unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]

iterations += 1


Sorry about that. the *iterations* update and sorted_vec assignment are
outside of the loop body.


This is indeed just a learning exercise, I am aware that lists have sort()
and reverse() methods. I'm in the process of learning a bit about data
structures & algorithms using Python as my implementation language.




On Sat, Nov 15, 2014 at 7:02 PM, Alan Gauld 
wrote:

> On 15/11/14 16:46, Spyros Charonis wrote:
>
>  def bubble_sort_ascending(unsorted):
>> iterations = 0
>> size = len(unsorted) - int(1)
>>
>
> Don't convert 1 to an int - it already is.
>
>  for i in range(0, size):
>>
>
> This will result in 'i' going from zero to len()-2.
> Is that what you want?
>
>   unsorted[i] = float(unsorted[i])
>>
>
> Comparing ints to floats or even comparing two floats
> is notoriously error prone due to the imprecision of
> floating point representation. You probably don't want
> to do the conversion.
>
> And if you must do it, why do you only do it once,
> outside the while loop?
>
>   while unsorted[i] > unsorted[i+1]:
>>unsorted[i], unsorted[i+1] = unsorted[i+1], unsorted[i]
>>iterations += 1
>>
>
> I assume you intended to end the loop body here?
> But the following lines are indented so are included
> in the loop.
>
> Also because you never change 'i' the loop can only
> ever run once. So really you could use a an if
> statement instead of the while loop?
>
> Finally, iterations is really counting swaps. Is that what you want it to
> count or os it actually loop iterations? If so which? The for loop or the
> while loop or the sum of both?
>
> sorted_vec = unsorted[:]
>>print "\nIterations completed: %s\n" %(iterations)
>> return sorted_vec
>>
>
> Since you never alter sorted_vec there is no point in creating it.
> Just return unsorted - which is now sorted...
>
>
>  and I have to call it again for the the sorting operation to complete.
>> Is there something I am missing in my code? Why does it not sort the
>> entire list at once and just count all completed iterations?
>>
>
> There are several things missing or broken, the few I've pointed
> out above will help but the algorithm seems suspect to me. You need
> to revisit the core algorithm I suspect.
>
> BTW I assume this is just a learning exercise since the default
> sorting algorithm will virtually always be better than bubble
> sort for any real work!
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.flickr.com/photos/alangauldphotos
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] bubble sort function

2014-11-16 Thread Spyros Charonis
Many thanks for the link as well as for the pseudocode & code. I see what I
did wrong now. Here's the final version that works:


def bubbleSort_ascending(unsorted):

""" Sorts a list of numbers in ascending order """

n = len(unsorted)

count = swaps = 0

swapped = True

## Prompt user to choose if they want to see each sorting step

option = raw_input("Show sorting steps? (Y/N):\n")

while swapped:

count += 1

swapped = False

## Use a tuple assignment in order to swap the value of two
variables

for i in range(1, n):

if unsorted[i-1] > unsorted[i]:

unsorted[i-1], unsorted[i] = unsorted[i], unsorted[i-1]

swapped = True

## Catch user input and either show or hide sorting steps
accordingly

if option in ("Y", "y"):

print "\nIteration %d, %d swaps; list: %r\n" %(count, swaps,
unsorted)

elif option in ("N", "n"):

pass

else:

print "\nYour input was invalid, type either Y/y or N/n"

return unsorted

On Sun, Nov 16, 2014 at 4:50 AM, Steven D'Aprano 
wrote:

> On Sat, Nov 15, 2014 at 04:46:26PM +, Spyros Charonis wrote:
> > Dear group,
> >
> >
> > I'm having a bit of trouble with understanding why my bubble sort
> > implementation doesn't work. I've got the following function to perform a
> > bubble sort operation on a list of numbers:
>
> It doesn't work because it is completely wrong. Sorry to be harsh, but
> sometimes it is easier to throw broken code away and start again than it
> is to try to diagnose the problems with it.
>
> Let's start with the unoptimized version of bubblesort given by
> Wikipedia:
>
> https://en.wikipedia.org/wiki/Bubble_sort#Implementation
>
> procedure bubbleSort( A : list of sortable items )
>n = length(A)
>repeat
>  swapped = false
>  for i = 1 to n-1 inclusive do
>/* if this pair is out of order */
>if A[i-1] > A[i] then
>  /* swap them and remember something changed */
>  swap( A[i-1], A[i] )
>  swapped = true
>end if
>  end for
>until not swapped
> end procedure
>
>
> Let's translate that to Python:
>
> def bubbleSort(alist):
> n = len(alist)
> swapped = True
> while swapped:
> swapped = False
> for i in range (1, n-1):
> # if this pair is out of order
> if alist[i-1] > alist[i]:
> # swap them and remember something changed
> alist[i-1], alist[i] = alist[i], alist[i-1]
> swapped = True
>
>
> Let's add something to print the partially sorted list each time we go
> through the loop:
>
>
> def bubbleSort(alist):
> print("Unsorted: %r" % alist)
> n = len(alist)
> swapped = True
> count = swaps = 0
> while swapped:
> count += 1
> swapped = False
> for i in range (1, n):
> # if this pair is out of order
> if alist[i-1] > alist[i]:
> # swap them and remember something changed
> swaps += 1
> alist[i-1], alist[i] = alist[i], alist[i-1]
> swapped = True
> print("Iteration %d, %d swaps; list: %r" % (count, swaps, alist))
>
>
>
> And now let's try it:
>
> py> mylist = [2, 4, 6, 8, 1, 3, 5, 7, 9, 0]
> py> bubbleSort(mylist)
> Unsorted: [2, 4, 6, 8, 1, 3, 5, 7, 9, 0]
> Iteration 1, 5 swaps; list: [2, 4, 6, 1, 3, 5, 7, 8, 0, 9]
> Iteration 2, 9 swaps; list: [2, 4, 1, 3, 5, 6, 7, 0, 8, 9]
> Iteration 3, 12 swaps; list: [2, 1, 3, 4, 5, 6, 0, 7, 8, 9]
> Iteration 4, 14 swaps; list: [1, 2, 3, 4, 5, 0, 6, 7, 8, 9]
> Iteration 5, 15 swaps; list: [1, 2, 3, 4, 0, 5, 6, 7, 8, 9]
> Iteration 6, 16 swaps; list: [1, 2, 3, 0, 4, 5, 6, 7, 8, 9]
> Iteration 7, 17 swaps; list: [1, 2, 0, 3, 4, 5, 6, 7, 8, 9]
> Iteration 8, 18 swaps; list: [1, 0, 2, 3, 4, 5, 6, 7, 8, 9]
> Iteration 9, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
> Iteration 10, 19 swaps; list: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>
>
>
> Now you can inspect the working code and compare it to the non-working
> code below and see what is different:
>
>
> > def bubble_sort_ascending(unsorted):
> >   """ Sorts a list of numbers into ascending order """
> >iterations = 0
> > size = len(unsorted) - int(1)
> >for i in range(0, size):
> > unsorted[i] = float(unsorted[i])
> >