from:"kumar s"

[Tutor] Python regular expression

2004-12-03 Thread kumar s

Dear group, 

I have a file with 645,984 lines.  This file is
composedcompletely of bocks.

For e.g.

[Unit111]
Name=NONE
Direction=2
NumAtoms=16
NumCells=32
UnitNumber=111
UnitType=3
NumberBlocks=1

[Unit111_Block1]
Name=31318_at
BlockNumber=1
NumAtoms=16
NumCells=32
StartPosition=0
StopPosition=15
CellHeader=XY   PROBE   FEATQUALEXPOS   POS CBASE   PBASE
TBASE   ATOMINDEX   CODONINDCODON   REGIONTYPE  REGION
Cell1=24636 N   control 31318_at0   13  A   
A   A   0   407064  -1
-1  99  
Cell2=24635 N   control 31318_at0   13  A   
T   A   0   406424  -1
-1  99  
Cell3=631   397 N   control 31318_at1   13  T   
A   T   1   254711
-1  -1  99  



[Unit113]
Name=NONE
Direction=2
NumAtoms=16
NumCells=32
UnitNumber=113
UnitType=3
NumberBlocks=1

[Unit113_Block1]
Name=31320_at
BlockNumber=1
NumAtoms=16
NumCells=32
StartPosition=0
StopPosition=15
CellHeader=XY   PROBE   FEATQUALEXPOS   POS CBASE   PBASE
TBASE   ATOMINDEX   CODONINDCODON   REGIONTYPE  REGION
Cell1=6863  N   control 31320_at0   13  T   
A   T   0   40388   -1
-1  99  
Cell2=6864  N   control 31320_at0   13  T   
T   T   0   41028   -1
-1  99  
Cell3=99194 N   control 31320_at1   13  C   
C   C   1   124259  -1
-1  99  





I have a file with identifiers that are found in the
first file as :
Name=31320_at


I am interested in getting lines of block that are
present in first to be written as a file.  

I am search:

search = re.search ["_at")


my question:
how can i tell python to select some rows that have
particular pattern such as [Name] or Name of [Unit]. 
is there any way of doing this. 
please help me

thanks
kumar

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] How to select particular lines from a text

2004-12-04 Thread kumar s

Dear group, 
 This is continuation to my previous email with
sugject line "Python regular expression".  My text
file although, looks like .ini file, but it is not. It
is a chip definition file from Gene chip.  it is a
huge file with over 340,000 lines.

I have particular set of question in general not
related to that file:

Exmple text:

Name:
City:






Name:
City:



Characterstics of this text:
1. This text is divided into blocks and every block
start with 'Name'.  The number of lines after this
identifier is random. 

In this particular case how a particular logic I can
think of to extract some of these blocks is:
1.write a reg.exp to identify the Name identifier one
need.
2. based on the this, ask the program to select all
lines after that until it hits either a new line OR
another name identifier:

My question:

How can I tell my program these 2 conditions:

1. mark the identifier i need and select all the lines
after that identifier until it hits a new line or
another name identifier. 


please englihten me with your suggestions. 

thank you. 

kumar



__ 
Do you Yahoo!? 
Read only the mail you want - Yahoo! Mail SpamGuard. 
http://promotions.yahoo.com/new_mail 
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Can i define anywhere on file object function for reading a range of lines?

2004-12-05 Thread kumar s

Dear group, 
 
For instance I have a text  that looks like following:

Segment:Page 21
x
x
.
x

Segment:Page 22




Segment:Page 23




I have another file with Page numbers that looks like
this:

Page 1
Page 2
..
Page 22
Page 34
Page 200

I can see that Page 22 is existing in my first file.
Now I am trying locate Page 22 segment in first file
and asking my program to read STARTING from
Segment:Page 22 to End of page 22 segment that is a
blank line(empty line)  OR Start of another segment
which Segment: Page 23. 

Question: 
Is there any function where I can specify to python
buit-in function to select specific line (such as
starting from segment: page 22 TO the next new line)
instead of the whole lines until EOF. 
e.g.:
a = readlines (From , TO )

I asked a similar question before and that was well
taught by experts, however, I am still confused. Can
any one please help me again. 
Thank you. 

Kumar





__ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Finding a part of an element in a list

2004-12-06 Thread kumar s

Dear Group, 

I have a list that is:

List1 =
['Tyres','windsheild','A\CUnit','Model=Toyota_Corolla']


In other list I have :
List2= ['Corolla','Accord','Camry']


I want to see if Corolla is there in list 1:

The code:
for i in range(len(List1)):
 if i in range(len(List2):
  print i

If I have 'Corolla' as an element in both list then it
is easy to find.  However, in List1 this element
appears as 'Model=Toyota_Corolla'. 

How can I ask python to match both elements:
'Model=Toyota_Corolla' and 'Corolla', where a part of
element is matching. 

please help.

thanks







__ 
Do you Yahoo!? 
All your favorites on one personal page  Try My Yahoo!
http://my.yahoo.com 
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Removing a row from a tab delimitted text

2004-12-06 Thread kumar s

Dear group, 
 I have a file, with Name identifier followed by two
columns with numbers. 


Here is how my file looks:

Name=3492_at
Cell1=481 13 (The space between (481 and 13 is tab)
Cell1=481 13
Cell1=481 13
Name=1001_at
Cell1=481 13
Cell2=481 12
Cell1=481 13
Cell1=481 13
Cell2=481 12
Name=1002_at
Cell3=482 12
Cell1=481 13
Cell1=481 13
Cell2=481 12
Cell3=482 12
Cell4=482 13
Cell1=481 13

My question:

1. How can I remove the line where Name identfier
exists and get two columns of data. 

Thanks
kumar.



__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Printing two elements in a list

2004-12-07 Thread kumar s

Dear group, 
 I have two lists names x and seq. 

I am trying to find element of x in element of seq. I
find them. However, I want to print element in seq
that contains element of x and also the next element
in seq. 


So I tried this piece of code and get and error that
str and int cannot be concatenated
>>> for ele1 in x:
for ele2 in seq:
if ele1 in ele2:
print (seq[ele1+1])



Traceback (most recent call last):
  File "", line 4, in -toplevel-
print (seq[ele1+1])
TypeError: cannot concatenate 'str' and 'int' objects


2. TRIAL TWO:

>>> for ele1 in x:
for ele2 in seq:
if ele2 in range(len(seq)):
if ele1 in ele2:
print seq[ele2+1]


This is taking forever and I am not getting an answer.


3. TRIAL 3:
I just asked to print the element in seq that matched
element 1 in X.  It prints only that element, however
I want to print the next element too and I cannot get
it. 
>>> for ele1 in x:
for ele2 in seq:
if ele1 in ele2:
print ele2


>probe:HG-U95Av2:31358_at:454:493;
Interrogation_Position=132; Antisense;
>probe:HG-U95Av2:31358_at:319:607;
Interrogation_Position=144; Antisense;




>>> len(x)
4504
>>> x[1:10]
['454:494', '319:607', '319:608', '322:289',
'322:290', '183:330', '183:329', '364:95', '364:96']
>>> len(seq)
398169
>>> seq[0:4]
['>probe:HG-U95Av2:1000_at:399:559;
Interrogation_Position=1367; Antisense;',
'TCTCCTTTGCTGAGGCCTCCAGCTT',
'>probe:HG-U95Av2:1000_at:544:185;
Interrogation_Position=1379; Antisense;',
'AGGCCTCCAGCTTCAGGCAGGCCAA']


>>> for ele1 in x:
for ele2 in seq:
if ele1 in ele2:
print ele2


>probe:HG-U95Av2:31358_at:454:493;
Interrogation_Position=132; Antisense;
>probe:HG-U95Av2:31358_at:319:607;
Interrogation_Position=144; Antisense;






How Do I WANT:

I want to print get an output like this:


>probe:HG-U95Av2:1000_at:399:559;
Interrogation_Position=1367; Antisense;'
TCTCCTTTGCTGAGGCCTCCAGCTT

>probe:HG-U95Av2:1000_at:544:185;
Interrogation_Position=1379; Antisense;
AGGCCTCCAGCTTCAGGCAGGCCAA


can any one please suggest what is going wrong in my
statements and how can I get it. 

Thank you.
Kumar



__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Printing two elements in a list

2004-12-07 Thread kumar s

Hello group,
 Thank you very much for your kind replies. In fact I
survived to pull out what I needed by going with
Kent's tip by enumerating on iterator. 

The problem with me is suddenly I embarked on
something big problem and I am surviving it in pieces
by writing pieces of code. 

I have another question:
To be brief:

My list contains some elements that I do not want and
I want to remove unwanted elements in my list:

My TEXT file looks like this:

Name=32972_at
Cell1=xxx   xxx N   control 32972_at
Cell1=xxx   xxx N   control 32972_at
Cell1=xxx   xxx N   control 32972_at
Cell1=xxx   xxx N   control 32972_at
Name=3456_at
Cell1=xxx   xxx N   control 3456_at
Cell1=xxx   xxx N   control 3456_at
Cell1=xxx   xxx N   control 3456_at
Cell1=xxx   xxx N   control 3456_at
.   ... x   xxx
(34K lines)

I want to remove Name=Xxxx_at identifiers.

My List:
['Name=32972_at',
'Cell1=432\t118\tN\tcontrol\t32972_at\t0\t13\tA\tA\tA\t0\t75952\t-1\t-1\t99\t',
'Cell2=432\t117\tN\tcontrol\t32972_at\t0\t13\tA\tT\tA\t0\t75312\t-1\t-1\t99\t',
'Cell3=499\t632\tN\tcontrol\t32972_at\t1\t13\tC\tC\tC\t1\t404979\t-1\t-1\t99\t']

I tried to resolve in this way:

>>>pat = re.compile('Name')
>>> for i in range(len(cord)):
x = pat.search(cord[i])
cord.remove(x)

I know I am wrong here because I do not know how to
search and remove an element in a list. Can any one
please help me. 

on Page 98, chapter Lists and dictionaries of mark
lutz's learning python. It is mentioned in table 6-1 :
L2.append(4)  Methods: grow,sort,search,reverse etc.

Although not much is covered on this aspect in this
book, I failed to do more operations on list. 

Looking forward for help from tutors. 

Thank you. 
Kumar.

--- Kent Johnson <[EMAIL PROTECTED]> wrote:

> kumar,
> 
> Looking at the quantity and structure of your data I
> think the search you are doing is going to be 
> pretty slow - you will be doing 4504 * 398169 =
> 1,793,353,176 string searches.
> 
> Where does the seq data come from? Could you
> consolidate the pairs of lines into a single record?
> If 
> you do that and extract the '399:559' portion, you
> could build a dict that maps '399:559' to the 
> full record. Looking up '399:559' in the dictionary
> would be much, much faster than searching the 
> entire list.
> 
> If you have multiple entries for '399:559' you could
> have the dict map to a list.
> 
> Kent
> 
> kumar s wrote:
> > 
> >>>>len(x)
> > 
> > 4504
> > 
> >>>>x[1:10]
> > 
> > ['454:494', '319:607', '319:608', '322:289',
> > '322:290', '183:330', '183:329', '364:95',
> '364:96']
> > 
> >>>>len(seq)
> > 
> > 398169
> > 
> >>>>seq[0:4]
> > 
> > ['>probe:HG-U95Av2:1000_at:399:559;
> > Interrogation_Position=1367; Antisense;',
> > 'TCTCCTTTGCTGAGGCCTCCAGCTT',
> > '>probe:HG-U95Av2:1000_at:544:185;
> > Interrogation_Position=1379; Antisense;',
> > 'AGGCCTCCAGCTTCAGGCAGGCCAA']
> > 
> > 
> > 
> >>>>for ele1 in x:
> > 
> > for ele2 in seq:
> > if ele1 in ele2:
> > print ele2
> > 
> > 
> > 
> >>probe:HG-U95Av2:31358_at:454:493;
> > 
> > Interrogation_Position=132; Antisense;
> > 
> >>probe:HG-U95Av2:31358_at:319:607;
> > 
> > Interrogation_Position=144; Antisense;
> > 
> > 
> > 
> > 
> > 
> > 
> > How Do I WANT:
> > 
> > I want to print get an output like this:
> > 
> > 
> > 
> >>probe:HG-U95Av2:1000_at:399:559;
> > 
> > Interrogation_Position=1367; Antisense;'
> > TCTCCTTTGCTGAGGCCTCCAGCTT
> > 
> > 
> >>probe:HG-U95Av2:1000_at:544:185;
> > 
> > Interrogation_Position=1379; Antisense;
> > AGGCCTCCAGCTTCAGGCAGGCCAA
> > 
> > 
> > can any one please suggest what is going wrong in
> my
> > statements and how can I get it. 
> > 
> > Thank you.
> > Kumar
> > 
> > 
> > 
> > __ 
> > Do you Yahoo!? 
> > Yahoo! Mail - 250MB free storage. Do more. Manage
> less. 
> > http://info.mail.yahoo.com/mail_250
> > ___
> > Tutor maillist  -  [EMAIL PROTECTED]
> > http://mail.python.org/mailman/listinfo/tutor
> > 
> ___
> Tutor maillist  -  [EMAIL PROTECTED]
> http://mail.python.org/mailman/listinfo/tutor
> 

__ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Please help matching elements from two lists and printing them

2004-12-08 Thread kumar s

Dear group, 

 I have two tables:

First table: spot_cor:
432 117 
499 631 
10  0   
326 83  
62  197 
0   0   
37  551 



Second table: spot_int
0   0   98  
1   0   5470
2   0   113 
3   0   5240
4   0   82.5
5   0   92  
6   0   5012
7   0   111 
8   0   4612
9   0   115 
10  0   4676.5  



I stored these two tables as lists:

>>> spot_cor[0:5]
['432\t117', '499\t631', 10\t0', '326\t83', '62\t197']

>>> spot_int[0:5]
['  0\t  0\t18.9', '  1\t  0\t649.4', '  10\t 
0\t37.3', '  3\t  0\t901.6', '  4\t  0\t14.9']


I want to take each element from spot_cor and search
in spot_int, if they match, I want to write
all the three columns of spot_int. 



I did the following way to see what happens when I
print element1 and element 2 as tab delim. text:

code:
>>> for ele1 in spot_cor:
for ele2 in spot_int:
if ele1 in ele2:
print (ele1+'\t'+ele2)


432 117 432 117 17.3
432 117   7 432 117.9
432 117 554 432 117.7
499 631 499 631 23.1
12  185  12 185 19.6
12  185 112 185 42.6
12  185 212 185 26.3
12  185 312 185 111.9
12  185 412 185 193.1
12  185 512 185 21.9
12  185 612 185 22.0
326 83  169 326 83.7
62  197  62 197 18.9


The problem with this script is that it is printing
all unwanted element of spot_int list.  This is simply
crap for me. I want to print the columns only if first
two columns of both tables match.  

The simple reason here I asked it to see if 12 and 185
are contained in two columns and pythons tells me, yes
they are present in 112 and 185 and this is a wrong
result. 

Can you please suggest a better method for comparing
these two elements and then printing the third column.
 

thank you very much. 


Cheers
K



__ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Please help matching elements from two lists and printing them

2004-12-08 Thread kumar s

Hi, 
 thank you very much for suggesting a way. 

In fact I tried and I found another way to do it.
could you please suggest if something is wrong because
I have false positive results in the output.  That
means I getting more that the values I have in
spot_cor. For example I have 2500 elements in spot_cor
list. I am searching each element if it is in
spot_init. IF it is there then I am writing it to a
file.  What I expect is to get 2500 elements. However
I am getting 500 elements extra. I do not understand
how is this possible. 

Code:

>>> out = open('sa_int_2.txt','w')
>>> for ele1 in range(len(spot_cor)):
x = spot_cor[ele1]
for ele2 in range(len(spot_int)):
cols = split(spot_int[ele2],'\t')
y = (cols[0]+'\t'+cols[1])
if x == y:
for ele3 in spot_int:
if y in ele3:
out.write(ele3)
out.write('\n')

On top of this this process is VERY SLOW on high end
server too. I think its just the way it is to deal
with string processing. 

As you asked I am all parsing out the pieces for a
tab-delimitted text. I can get the values as CSV
instead of tab delimitted. But what is the way using
CSV to deal with this situation. 

thanks
Kumar

--- Bob Gailer <[EMAIL PROTECTED]> wrote:

> At 02:51 PM 12/8/2004, kumar s wrote:
> >Dear group,
> >
> >  I have two tables:
> >
> >First table: spot_cor:
> >432 117
> >499 631
> >10  0
> >326 83
> >62  197
> >0   0
> >37  551
> >
> >
> >
> >Second table: spot_int
> >0   0   98
> >1   0   5470
> >2   0   113
> >3   0   5240
> >4   0   82.5
> >5   0   92
> >6   0   5012
> >7   0   111
> >8   0   4612
> >9   0   115
> >10  0   4676.5
> >
> >
> >
> >I stored these two tables as lists:
> >
> > >>> spot_cor[0:5]
> >['432\t117', '499\t631', 10\t0', '326\t83',
> '62\t197']
> 
> Note there is no ' before the 10. That won't fly'
> 
> > >>> spot_int[0:5]
> >['  0\t  0\t18.9', '  1\t  0\t649.4', '  10\t
> >0\t37.3', '  3\t  0\t901.6', '  4\t  0\t14.9']
> 
> It would be a lot easier to work with if the lists
> looked like (assumes all 
> data are numeric):
> [(432,117), (499,631), (10,0), (326,83), (62,197)]
> [(0,0,18.9), (1,0,649.4), (10,0,37.3), (3,0,901.6),
> (4,0,14.9)]
> 
> What is the source for this data? Is it a
> tab-delimited file? If so the CSV 
> module can help make this translation.
> 
> I also assume that you want the first 2 elements of
> a spot_int element to 
> match a spot_cor element.
> 
> Then (for the subset of data you've provided):
> 
>  >>> for ele1 in spot_cor:
> ...   for ele2 in spot_int:
> ... if ele1 == ele2[:2]:
> ... print "%8s %8s %8s" % ele2
> ...
>100 37.3
> 
> >I want to write all the three columns of spot_int.
> >[snip]
> 
> Hope that helps.
> 
> Bob Gailer
> [EMAIL PROTECTED]
> 303 442 2625 home
> 720 938 2625 cell 
> 
> 

__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Difference between for i in range(len(object)) and for i in object

2004-12-09 Thread kumar s

Dear group, 
  

My Tab delimited text looks like this:

HG-U95Av2   32972_at432 117
HG-U95Av2   32972_at499 631
HG-U95Av2   32972_at12  185
HG-U95Av2   32972_at326 83
HG-U95Av2   32972_at62  197


I want to capture: columns 2 and 3 as tab delim. text:


Here is my code:
>>> spot_cor=[]
>>> for m in cor:
... cols = split(cor,'\t')
... spot_cor.append(cols[2]+'\t'+cols[3])
...
...
Traceback (most recent call last):
  File "", line 2, in ?
  File "/usr/local/lib/python2.3/string.py", line 121,
in split
return s.split(sep, maxsplit)
AttributeError: 'list' object has no attribute 'split'

Here is 2nd way:


>>> test_cor=[]
>>> for m in cor:
... cols = split(cor,'\t')
... x = (cols[2]+'\t'+cols[3])
... test_cor.append(x)
...
Traceback (most recent call last):
  File "", line 2, in ?
  File "/usr/local/lib/python2.3/string.py", line 121,
in split
return s.split(sep, maxsplit)
AttributeError: 'list' object has no attribute 'split'



Here is my 3rd way of doing this thing:
>>> for m in range(len(cor)):
... cols = split(cor[m],'\t')
... spot_cor.append(cols[2]+'\t'+cols[3])
...
>>>
>>> len(spot_cor)
2252
>>>



My question:
 Many people suggested me to avoid  iteration over  a
object using (range(len)) its index and use instead
'Python's power' by using for i in object, instead. 

However, when I tried that using some data, as
demonstrated above, I get error because append method
does not work on list.  In method 2, i tried to append
an object instead of string elements. In both ways the
execution failed because  'List object has no
attribute split'.


Can you help me making me clear about his dogma. 


Thank you. 

Kumar.



--- Guillermo Fernandez Castellanos
<[EMAIL PROTECTED]> wrote:

> Cheers,
> 
> I think your mistake is here:
> if x == y:
>for ele3 in spot_int:
>if y in ele3:
>   
> out.write(ele3)
>   
> out.write('\n')
> Each time you find an element that is the same
> (x==y) you don't write
> only y, you write *all* the elements that are in
> spot_init instead
> only the matching one! And it's not what you are
> looking for! :-)
> 
> I'll also change a bit your code to make it look
> more "pythonic" :-)
> 
> > for ele1 in spot_cor:
> > for ele2 in spot_int:
> > cols = split(ele2,'\t')
> > y = (cols[0]+'\t'+cols[1])
> > if ele1 == y:
> > for ele3 in spot_int:
> > if y in ele3:
> >
> out.write(ele3)
> >
> out.write('\n')
> 
> What changes I did:
> 
> for ele1 in range(len(spot_cor)):
>x = spot_cor[ele1]
> 
> can be writen like:
> for ele1 in spot_cor:
> x = ele1
> 
> Furthermore, as you only use x once, I changed:
>  if x == y:
> 
> with
> if ele1 == y:
> 
> and deleted the line:
> x = ele1
> 
> I also don't understand why you do this:
> cols = split(ele2,'\t')
> y = (cols[0]+'\t'+cols[1])
> 
> It seems to me that you are separating something to
> put it again
> together. I don't really see why...
> 
> Enjoy,
> 
> Guille
> 




__ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Difference between for i in range(len(object)) andfor i in object

2004-12-12 Thread kumar s

Thank you for clearing up some mist here.  In fact I
was depressed by that e-mail because there are not
many tutorials that clearly explains the issues that
one faces while trying to code in python.  Also, due
to lack of people who are proficient in python around
our univ. campus in baltimore, i am very much relying
on tutors mailing list. I am poor enough to go to Mark
Lutz's python training course(~ $1000 for 2 days and
3.5K for 5 days at a python bootcamp) and helpless to
the fact that there is no one offering a python course
on the campus. I am very much depended on this list
and I cannot tell you people, how much I respect and
appreciate the help from tutors. I cannot finish my
Ph.D. thesis without tutors help and tutors will
always be praised in my thesis acknowledgements. 

Thank you again for a supportive e-mail Mr.Gauld.

P.S: My intention is not to hurt tutor's opinion and
it is their right to express their opinion freely.

kumar.

--- Alan Gauld <[EMAIL PROTECTED]> wrote:

> > Personally I am getting weary of a lot of requests
> that to me seem
> to come
> > from a lack of understanding of Python..
> 
> To be fair that is what the tutor list is for -
> learning Python.
> 
> > Would you be willing to take a good tutorial so
> you understand
> > basic Python concepts and apply them to your code.
> 
> But as a tutor author I do agree that I am often
> tempted
> (and sometimes succumb) to just point at the
> relevant topic
> in my tutorial. Particularly since the latest
> version tries
> to answer all of the most common questions asked
> here, but
> still they come up...
> 
> > I also despair that you don't seem to benefit from
> some of our
> suggestions.
> 
> And this too can be frustrating but sometimes it is
> the case
> that the "student" simply didn't fully appreciate
> the
> significance of what was offered. I'm feeling
> generous tonight!
> 
> :-)
> 
> Alan G
> Author of the Learn to Program web tutor
> http://www.freenetpages.co.uk/hp/alan.gauld
> 
> 

__ 
Do you Yahoo!? 
Send holiday email and support a worthy cause. Do good. 
http://celebrity.mail.yahoo.com
___
Tutor maillist  -  [EMAIL PROTECTED]
http://mail.python.org/mailman/listinfo/tutor

[Tutor] raw_input()

2010-03-15 Thread kumar s

Dear group:
I have a large file 3GB. Each line is a tab delim file. 

example lines of it:

585 chr1433 433 rs56289060  0   +   -   -   
-/C genomic insertion   unknown 0   0   unknown between 1
585 chr1491 492 rs55998931  0   +   C   C   
C/T genomic single  unknown 0   0   unknown exact   1
585 chr1518 519 rs62636508  0   +   G   G   
C/G genomic single  unknown 0   0   unknown exact   1
585 chr1582 583 rs58108140  0   +   G   G   
A/G genomic single  unknown 0   0   unknown exact   1

Now I dont want to load this entire file. I want to give each line as an input 
and print selective lines. 

For example:

x1.py  = 

second = raw_input()
x =  second.split('\t')
y = x[1:]
print '\t'.join(y)


%cat mybigfile.rod | python x1.py
chr1433 433 rs56289060  0   +   -   -   -/C 
genomic insertion   unknown 0   0   unknown between 1


My question:

this program is only printing first line. It is not processing every line that 
cat spits to x1.py. 
how do I print every line. 

thanks
Kumar.



  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] raw_input()

2010-03-15 Thread kumar s

Here it worked after trying a while loop:


x1.py  = 

while True:
 second = raw_input()
 x =  second.split('\t')
 y = x[1:]
 print '\t'.join(y)




%cat mybigfile.rod | python x1.py
Traceback (most recent call last):
  File "x1.py", line 2, in 
second = raw_input()
EOFError: EOF when reading a line



How to notify that at EOF break and suppress exception.

thanks






- Original Message 
From: kumar s 
To: tutor@python.org
Sent: Mon, March 15, 2010 6:52:26 PM
Subject: [Tutor] raw_input()

Dear group:
I have a large file 3GB. Each line is a tab delim file. 

example lines of it:

585 chr1433 433 rs56289060  0   +   -   -   
-/C genomic insertion   unknown 0   0   unknown between 1
585 chr1491 492 rs55998931  0   +   C   C   
C/T genomic single  unknown 0   0   unknown exact   1
585 chr1518 519 rs62636508  0   +   G   G   
C/G genomic single  unknown 0   0   unknown exact   1
585 chr1582 583 rs58108140  0   +   G   G   
A/G genomic single  unknown 0   0   unknown exact   1

Now I dont want to load this entire file. I want to give each line as an input 
and print selective lines. 

For example:

x1.py  = 

second = raw_input()
x =  second.split('\t')
y = x[1:]
print '\t'.join(y)


%cat mybigfile.rod | python x1.py
chr1433 433 rs56289060  0   +   -   -   -/C 
genomic insertion   unknown 0   0   unknown between 1


My question:

this program is only printing first line. It is not processing every line that 
cat spits to x1.py. 
how do I print every line. 

thanks
Kumar.



  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor



  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] raw_input()

2010-03-15 Thread kumar s

thanks Benno. 

supplying 3.6 GB file is over-kill for the script.  This is the reason I chose 
to input lines on fly. 

thanks
Kumar





- Original Message 
From: Benno Lang 
To: kumar s 
Cc: tutor@python.org
Sent: Mon, March 15, 2010 7:19:24 PM
Subject: Re: [Tutor] raw_input()

On 16 March 2010 08:04, kumar s  wrote:
> %cat mybigfile.rod | python x1.py
> Traceback (most recent call last):
>  File "x1.py", line 2, in 
>second = raw_input()
> EOFError: EOF when reading a line
>
> How to notify that at EOF break and suppress exception.

try:
second = raw_input()
except EOFError:
# handle error in some way

I would probably supply the file name as an argument rather than
piping into stdin (or allow both methods), but that's up to you.

HTH,
benno



  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] help with loops

2010-03-25 Thread kumar s

Dear group:

I need some tips/help from experts. 

I have two files tab-delimted. 
One file is 4K lines. The other files is 40K lines. 

I want to search contents of a file to other and print those lines that satisfy.


File 1:
chr X   Y
chr18337733 8337767 NM_001042682_cds_0_0_chr1_8337734_r 0   -   
RERE
chr18338065 8338246 NM_001042682_cds_1_0_chr1_8338066_r 0   -   
RERE
chr18338746 8338893 NM_001042682_cds_2_0_chr1_8338747_r 0   -   
RERE
chr18340842 8341563 NM_001042682_cds_3_0_chr1_8340843_r 0   -   
RERE
chr18342410 8342633 NM_001042682_cds_4_0_chr1_8342411_r 0   -   
RERE


File 2:
Chr  X Y
chr1871490  871491
chr1925085  925086
chr1980143  980144
chr11548655 1548656
chr11589675 1589676
chr11977853 1977854
chr13384899 3384900
chr13406309 3406310
chr13732274 3732275


I want to search if file 2 X is greater or less then X and Y and print line of 
file 2 and last column of file 1:


for j in file2:
col = j.split('\t')
 for k in file1:
 cols = k.split('\t')
  if col[1] > cols[1]:
 if col[1] < cols[2]:
 print j +'\t'+cols[6]


This prints a lot of duplicate lines and is slow.  Is there any other way I can 
make it fast. 

In file 1, how a dictionary can be made. I mean unique keys that are common to 
file 1 and 2. 

thanks
Kumar.


  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] counting elements in list

2010-09-15 Thread kumar s

Hi group:

I have a list:

 k = ['T', 'C', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'C', 'T', 'T', 'T', 'C', 
'T', 
'T', 'T', 'C', 'C', 'T', 'T', 'T', 'C', 'T', 'T', 'T', 'T', 'T', 'T']

the allowed elements are A or T or G or C. List can have any number of A or T 
or 
G or C

My aim is to get a string ouput with counts of each type A or T or G or C.  

A:0\tT:23\tG:0\tC:6  

from the above example, I could count T and C and since there are no A and G, I 
want to print 0 for them. I just dont know how this can be done. 




>>> d = {}
>>> for i in set(k):
... d[i] = k.count(i)
...
>>> d
{'C': 6, 'T': 23}


>>> for keys,values in d.items():
... print keys+'\t'+str(d[keys])
...
C   6
T   23

the other way i tried is:
>>> k.count('A'),k.count('T'),k.count('G'),k.count('C')
(0, 23, 0, 6)


 how can I get counts for those elements not represented in list and print 
them.  appreciate your help. 


thanks
kumar



  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] (no subject)

2010-01-07 Thread kumar s

dear tutors:
I have two files. I want to take coordiates of an row in fileA and find if they 
are in the range of coordinates in fileB. If they are, I want to be able to map 
else, pass. 
thanks
kumar

file a:
name loc  x   y
a   4   4081159640811620
b   4   4081161940811643
c   4   4081164940811673
d   4   4081173440811758
e   4   4081179740811821
f   4   4081181740811841
g   4   4081189540811919
h   4   4081193840811962



file b:

  zx   zy
z1  4   +   4081032340812000
z2  4   +   4081032340812000
z3  4   +   4081032340812000
z4  4   +   4081032340812000
z5  4   +   4081032340812000
z6  4   +   4081032340812000
z7  4   +   4081032340812000
z8  4   +   4081032340812000




I want to take coordiates x and y from each row in file a, and check if they 
are in range of zx and zy. If they are in range then I want to be able to write 
both matched rows in a tab delim single row. 


my code:

f1 = open('fileA','r')
f2 = open('fileB','r')
da = f1.read().split('\n')
dat = da[:-1]
ba = f2.read().split('\n')
bat = ba[:-1]


for m in dat:
col = m.split('\t')
for j in bat:
cols = j.split('\t')
if col[1] == cols[1]:
xc = int(cols[2])
yc = int(cols[3])
if int(col[2]) in xrange(xc,yc):
if int(col[3]) in xrange(xc,yc):
print m+'\t'+j

output:
a   4   4081159640811620z1 4 +  40810323 40812000



This code is too slow. Could you experts help me speed the script a lot faster. 
In each file I have over 50K rows and the script runs very slow. 

Please help. 

thanks
Kumar


  

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] How to substitute an element of a list as a pattern for re.compile()

2004-12-29 Thread kumar s

Hi Group:

I have Question: 
How can I substitute an object as a pattern in making
a pattern. 

>>> x = 30
>>> pattern = re.compile(x)




My situation:

I have a list of numbers that I have to match in
another list and write them to a new file:

List 1: range_cors 
>>> range_cors[1:5]
['161:378', '334:3', '334:4', '65:436']

List 2: seq
>>> seq[0:2]
['>probe:HG-U133A_2:1007_s_at:416:177;
Interrogation_Position=3330; Antisense;',
'CACCCAGCTGGTCCTGTGGATGGGA']


A slow method:
>>> sequences = []
>>> for elem1 in range_cors:
for index,elem2 in enumerate(seq):
if elem1 in elem2:
sequences.append(elem2)
sequences.append(seq[index+1])

This process is very slow and it is taking a lot of
time. I am not happy.



A faster method (probably):

>>> for i in range(len(range_cors)):
for index,m in enumerate(seq):
pat = re.compile(i)
if re.search(pat,seq[m]):
p.append(seq[m])
p.append(seq[index+1])


I am getting errors, because I am trying to create an
element as a pattern in re.compile(). 


Questions:

1. Is it possible to do this. If so, how can I do
this. 

Can any one help correcting my piece of code and
suggesting where I went wrong. 

Thank you in advance. 


-K


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] O.T.

2004-12-29 Thread kumar s

30, Married, will soon be a dad., Live in Baltimore,
U.S.A. and I am a Ph.D. student

 I lived in Denmark and Israel in the past as a part
of my research life.

Will finish my Ph.D., in Bioinformatics.

Got introduced to computers at the age of 25 :-(
 and more happy it is not 52 :-)

Programming lang: Python and R, Bioconductor, PHP
(I have not mastered but WANT TO)

DB: PostgreSQL 

I earn my bread by doing research and in a way I get
paid for my interests in life. 

-K

--- "Jacob S." <[EMAIL PROTECTED]> wrote:

> I hate to sound weird...
> 
> But who are you all, what are you're ages, what do
> you do, marriage status,
> etc?
> You obviously don't have to answer, I'm just curious
> who I'm boldly sending
> emails to.
> 
> Jacob Schmidt
> 
> P.S.
> I'm a student. 14 years. Play the piano better than
> I write scripts. Single.
> etc.
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Parsing a block of XML text

2004-12-31 Thread kumar s

Dear group:

I am trying to parse BLAST output (Basic Local
Alignment Search Tool, size around more than 250 KB 
).

- 
  1 
  gi|43442325|emb|BX956931.1| 
  DKFZp781D1095_r1 781 (synonym: hlcc4) Homo
sapiens cDNA clone DKFZp781D1095 5', mRNA
sequence. 
  BX956931 
  693 
- 
- 
  1 
  1164.13 
  587 
  0 
  1 
  587 
  107 
  693 
  1 
  1 
  587 
  587 
  587 
 
GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA

 
GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA

 
|||

  
  
  
- 




I wanted to parse out :

  
   


I wrote a ver small 4 line code to obtain it.

for bls in doc.getElementsByTagName('Hsp_num'):
bls.normalize()
if bls.firstChild.data >1:
print bls.firstChild.data


This is not sufficient for me to get anything doen. 
Could any one help me directing how to get the
elements
in that tag. 

Thanks.
-K



__ 
Do you Yahoo!? 
Send holiday email and support a worthy cause. Do good. 
http://celebrity.mail.yahoo.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Parsing a block of XML text

2004-12-31 Thread kumar s

Dear group:

I am trying to parse BLAST output (Basic Local
Alignment Search Tool, size around more than 250 KB 
).

- 
  1 
  gi|43442325|emb|BX956931.1| 
  DKFZp781D1095_r1 781 (synonym: hlcc4) Homo
sapiens cDNA clone DKFZp781D1095 5', mRNA
sequence. 
  BX956931 
  693 
- 
- 
  1 
  1164.13 
  587 
  0 
  1 
  587 
  107 
  693 
  1 
  1 
  587 
  587 
  587 
 
GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA

 
GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA

 
|||

  
  
  
- 




I wanted to parse out :

  
   


I wrote a ver small 4 line code to obtain it.

for bls in doc.getElementsByTagName('Hsp_num'):
bls.normalize()
if bls.firstChild.data >1:
print bls.firstChild.data


This is not sufficient for me to get anything doen. 
Could any one help me directing how to get the
elements
in that tag. 

Thanks.
-K

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing a block of XML text

2004-12-31 Thread kumar s

AACATTAATTCATACCAGAGTGAATTTCTGCAAATGTGATGTGGGCAACTGCCCTTCAATCATCACTAAACATAAGAGAATTAATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGTCTTTAATTGGTCCTCACGCCTTACTACACATTTATACTAGATACAAACTCTACAAATGTGAAGAATGTGGCAAAGCAACAAGTCCTCAATCCTTACTACCCATAAGATAATTCGCACTGGAGAGAAATTCTACAAATGTAAAGAATGTGCCAAAGCAACCAATCCTCAAACCTTACTGAACATAAGTTCATCCTGGAGAGAAACCTTACAAATGTGAAGAATGTGGCAAAGCCTTTAACTGGCCCTCAACTCTTACTAAACATAAGAGAATTCATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGCAACCAGTTCTCAAACCTTACTACACATAAGAGAATCCATACTGCAGAGAAATTCTATAAATGTACAGAATGT-GGTGAAGC-AGCCGGTCCTCAAACCTTACTAAACAT-AAGTTCATACT--GGAAACCCTAC
text node:
  
Element node: Hsp_hseq
text
node:TGGATTTAACCAATGTTTGCCAGCTACCCAGAGCTATTTCTATTTGATAAATGTGTGAAAGCCTTTCATAAACAAATTCAAACAGACATAAGATAAGCCATACTGCCAAATGCAAAGAATGTGGCAAATCAGCATGCTTCCACATCTAGCTCAACATTAATTCATACCAGAGTGAATTTCTGCAAATGTGATGTGGGCAACTGCCCTTCAATCATCACTAAACATAAGAGAATTAATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGTCTTTAATTGGTCCTCACGCCTTACTACACATTTATACTAGATACAAACTCTACAAATGTGAAGAATGTGGCAAAGCAACAAGTCCTCAATCCTTACTACCCATAAGATAATTCGCACTGGAGAGAAATTCTACAAATGTAAAGAATGTGCCAAAGCAACCAATCCTCAAACCTTACTGAACATAAGTTCATCCTGGAGAGAAACCTTACAAATGTGAAGAATGTGGCAAAGCCTTTAACTGGCCCTCAACTCTTACTAAACATAAGAGAATTCATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGCCTTTAACCAGTTCTCAAACCTTACTACACATAAGAGAATCCATACTGCAGAGAAATTCTATAAATGTACAGAATGTGGGTGAAGCAACCCGGCCCTCAAACCTTACTAAACATTTCATACTTGAGAAACCCTAC
text node:
  
Element node: Hsp_midline
text
node:||
|||
 |   ||
   ||
text node:

text node:

Element node: Hsp
text node:
  
Element node: Hsp_num
text node:2
text node:

























--- Danny Yoo <[EMAIL PROTECTED]> wrote:

> 
> 
> On Fri, 31 Dec 2004, kumar s wrote:
> 
> > I am trying to parse BLAST output (Basic Local
> Alignment Search Tool,
> > size around more than 250 KB ).
> 
> [xml text cut]
> 
> 
> Hi Kumar,
> 
> Just as a side note: have you looked at Biopython
> yet?
> 
> http://biopython.org/
> 
> I mention this because Biopython comes with parsers
> for BLAST; it's
> possible that you may not even need to touch XML
> parsing if the BLAST
> parsers in Biopython are sufficiently good.  Other
> people have already
> solved the parsing problem for BLAST: you may be
> able to take advantage of
> that work.
> 
> 
> > I wanted to parse out :
> >
> >   >  
> >   
> 
> Ok, I see that you are trying to get the content of
> the High Scoring Pair
> (HSP) query and hit coordinates.
> 
> 
> 
> > I wrote a ver small 4 line code to obtain it.
> >
> > for bls in doc.getElementsByTagName('Hsp_num'):
> > bls.normalize()
> > if bls.firstChild.data >1:
> > print bls.firstChild.data
> 
> This might not work.  'bls.firstChild.data' is a
> string, not a number, so
> the expression:
> 
> bls.firstChild.data > 1
> 
> is most likely buggy.  Here, try using this function
> to get the text out
> of an element:
> 
> ###
> def get_text(node):
> """Returns the child text contents of the
> node."""
> buffer = []
> for c in node.childNodes:
> if c.nodeType == c.TEXT_NODE:
> buffer.append(c.data)
> return ''.join(buffer)
> ###
> 
> (code adapted from:
> http://www.python.org/doc/lib/dom-example.html)
> 
> 
> 
> For example:
> 
> ###
> >>> doc =
>
xml.dom.minidom.parseString("helloworld")
> >>> for bnode in doc.getElementsByTagName('b'):
> ... print "I see:", get_text(bnode)
> ...
> I see: hello
> I see: world
> ###
> 
> 
> 
> 
> > Could any one help me directing how to get the
> elements in that tag.
> 
> One way to approach structured parsing problems
> systematically is to write
> a function for each particular element type that
> you're trying to parse.
> 
> From the sample XML that you've shown us, it appears
> that your document
> consists of a single 'Hit' root node.  Each 'Hit'
> appears to have a
> 'Hit_hsps&

[Tutor] Something is wrong in file input output functions.

2005-01-10 Thread kumar s

Dear group,
I have written a small piece of code that takes a file
and selects the columns that I am interested in and
checks the value of the column on a condition (value
that eqauls 25) and then write it the to another file.



Code:
import sys
from string import split
import string
print "enter the file name" ### Takes the file name###
psl = sys.stdin.readline()  ### psl has the file
object###

f2 = sys.stdout.write("File name to write")
def extCor(psl):
''' This function, splits the file and writes the
desired columns to
to another file only if the first column value equals
25.'''
str_psl = psl.split('\n')
str_psl = str_psl[5:]
for ele in range(len(str_psl)):
cols = split(str_psl[ele],'\t')
des_cols =
cols[0]+'\t'+cols[1]+'\t'+cols[8]+'\t'+cols[9]+'\t'+cols[11]+'\t'+cols[12]+'\t'+cols[13]+'\t'+cols[15]+'\t'+cols[16]+'\t'+cols[17])
if cols[0] == 25:
'''This condition checks if the first
column value == 25, then it writes it to the file, if
not then it does not'''
f2.write(des_cols)
f2.write("\n")

extCor(psl)



Question:
when i give it the file name that it should parse, I
do not get to asked the file name i am interested in
it gives me nothing. Please help me. 
Thanks
K



__ 
Do you Yahoo!? 
The all-new My Yahoo! - Get yours free! 
http://my.yahoo.com 
 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] please help: conditional statement and printing element

2005-01-11 Thread kumar s

Dear group, 
  For some reason my brain cannot think of any other
option than what I have in my script. Could any one
please help me in suggesting.

What I have : (File name : psl)
22  2   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
22  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0
24  1   457:411 25  0
22  0   457:411 25  0
21  0   457:411 25  0
25  0   457:411 25  0
25  0   457:411 25  0


What to do:
I want to print values that are 25 in column 1 and not
the other values such as 24,22,21 etc.


My script:
>>> for i in range(len(psl)):
col = split(psl[i],'\t')
col1 = col[0]
if col1 == 25:
print col[0]+'\t'+col[1]+'\t'+col[17]


>>>

Result: I get nothing. Am I doing something very
wrong. Why isnt if col1 == 25: functional. 

My idea is to check if col[0] == 25: then print
columns 1,18 etc. 

Can you please help me. 

Thanks
K

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] please help: conditional statement and printing element

2005-01-11 Thread kumar s

Dear group:
  I think I have got the answer :-)(


script:
>>> for i in range(len(psl)):
col = split(psl[i],'\t')
col1 = col[0]
col1 = int(col1)
col17 = int(col[17])
if col1 == 25 and col17 == 1:
print col[0]+
'\t'+col[1]+'\t'+col[9]+'\t'+col[10]+'\t'+col[11]


25  0   580:683 25  0
25  0   581:687 25  0
25  0   434:9   25  0
25  0   37:141  25  0
25  0   219:629 25  0
25  0   462:87  25  0
25  0   483:409 25  0
25  0   354:323 25  0
25  0   624:69  25  0
25  0   350:239 25      0


Is this a correct approach?

Thanks
K.


--- kumar s <[EMAIL PROTECTED]> wrote:

> Dear group, 
>   For some reason my brain cannot think of any other
> option than what I have in my script. Could any one
> please help me in suggesting.
> 
> What I have : (File name : psl)
> 222   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 220   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 241   457:411 25  0
> 220   457:411 25  0
> 210   457:411 25  0
> 250   457:411 25  0
> 250   457:411 25  0
> 
> 
> What to do:
> I want to print values that are 25 in column 1 and
> not
> the other values such as 24,22,21 etc.
> 
> 
> My script:
> >>> for i in range(len(psl)):
>   col = split(psl[i],'\t')
>   col1 = col[0]
>   if col1 == 25:
>   print col[0]+'\t'+col[1]+'\t'+col[17]
> 
> 
> >>>
> 
> Result: I get nothing. Am I doing something very
> wrong. Why isnt if col1 == 25: functional. 
> 
> My idea is to check if col[0] == 25: then print
> columns 1,18 etc. 
> 
> Can you please help me. 
> 
> Thanks
> K
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam
> protection around 
> http://mail.yahoo.com 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] How to create a key-value pairs with alternative elements in a list ... please help.

2005-01-12 Thread kumar s

Dear group,

I am frustrated to ask some questions on topics that I
thought covered well. my logic is not correct. 

I have a simple list:

>>> a
['a', 'apple', 'b', 'boy', 'c', 'cat']

I want to create a dictionary:
 dict = {'a':'apple',
 'b':'boy',
 'c':'cat'}

my way of doing this :

>>> keys = [] # create a list of all keys i.e a,b,c)
>>> vals = [] # create a list of all values i.e 
   #appele,boy,cat etc.

>>> dict = {}

>>> dict = zip(keys,vals)

Problem:
How do i capture every alternative element in list a:

I am unable to pump the a,b, and c into keys list
and apple, boy,cat into vals list.

Trial 1:
>>> while i >= len(a):
print a[i]
i = i+2   -- I thought i+2 will give me alternative
elements

Trial 2:
>>> for index,i in enumerate(range(len(a))):
print a[i]
print a[index+1]


a
apple
apple
b
b
boy
boy
c
c
cat
cat

Please help me.  It is also time for me to refer my
prev. notes  :-(

thanks
K






__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] How to create a key-value pairs with alternative elements in a list ... please help.

2005-01-12 Thread kumar s

Thanks for this trick.

Can I call this as category thermo-NUKE  of list
functions.
-K

--- Jeff Shannon <[EMAIL PROTECTED]> wrote:

> kumar s wrote:
> 
> > Problem:
> > How do i capture every alternative element in list
> a:
> > 
> > I am unable to pump the a,b, and c into keys list
> > and apple, boy,cat into vals list.
> 
> In a sufficiently recent version of Python, you
> should be able to use 
> an extended slice with a stride --
> 
>  keys = a[::2]
>  vals = a[1::2]
> 
> (Note that this is untested, as I don't have a
> recent version of 
> Python handy at the moment; I'm on 2.2 here, which
> doesn't have 
> extended slices.)
> 
> Jeff Shannon
> Technician/Programmer
> Credit International
> 
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Regular expression re.search() object . Please help

2005-01-13 Thread kumar s

Dear group:

My list looks like this: List name = probe_pairs
Name=AFFX-BioB-5_at
Cell1=96369 N   control AFFX-BioB-5_at
Cell2=96370 N   control AFFX-BioB-5_at
Cell3=441   3   N   control AFFX-BioB-5_at
Cell4=441   4   N   control AFFX-BioB-5_at
Name=223473_at
Cell1=307   87  N   control 223473_at
Cell2=307   88  N   control 223473_at
Cell3=367   84  N   control 223473_at

My Script:
>>> name1 = '[N][a][m][e][=]'
>>> for i in range(len(probe_pairs)):
key = re.match(name1,probe_pairs[i])
key


<_sre.SRE_Match object at 0x00E37A68>
<_sre.SRE_Match object at 0x00E37AD8>
<_sre.SRE_Match object at 0x00E37A68>
<_sre.SRE_Match object at 0x00E37AD8>
<_sre.SRE_Match object at 0x00E37A68>
. (cont. 10K
lines)

Here it prints a bunch of reg.match objects. However
when I say group() it prints only one object why?

Alternatively:
>>> for i in range(len(probe_pairs)):
key = re.match(name1,probe_pairs[i])
key.group()


'Name='




1. My aim:
To remove those Name= lines from my probe_pairs
list

with name1 as the pattern, I asked using re.match()
method to identify the lines and then remove by using
re.sub(pat,'',string) method.  I want to substitute
Name=*** line by an empty string.


After I get the reg.match object, I tried to remove
that match object like this:
>>> for i in range(len(probe_pairs)):
key = re.match(name1,probe_pairs[i])
del key
print probe_pairs[i]


Name=AFFX-BioB-5_at
Cell1=96369 N   control AFFX-BioB-5_at
Cell2=96370 N   control AFFX-BioB-5_at
Cell3=441   3   N   control AFFX-BioB-5_at


Result shows that that Name** line has not been
deleted.


Is the way I am doing a good one. Could you please
suggest a good simple method. 


Thanks in advance
K





__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Regular expression re.search() object . Please help

2005-01-13 Thread kumar s

Hello group:
thank you for the suggestions. It worked for me using 

if not line.startswith('Name='): expression. 

I have been practising regular expression problems. I
tumle over one simple thing always. After obtaining
either a search object or a match object, I am unable
to apply certain methods on these objects to get
stuff. 

I have looked into many books including my favs(
Larning python and Alan Gaulds Learn to program using
python) I did not find the basic question, how can I
get what I intend to do with returned reg.ex match
object (search(), match()).

For example:

I have a simple list like the following:

>>> seq
['>probe:HG-U133B:20_s_at:164:623;
Interrogation_Position=6649; Antisense;',
'TCATGGCTGACAACCCATCTTGGGA']

Now I intend to extract particular pattern and write
to another list say: desired[]

What I want to extract:
I want to extract 164:623:
Which always comes after _at: and ends with ;
2. The second pattern/number I want to extract is
6649:
This always comes after position=.

How I want to put to desired[]:

>>> desired
['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA']

I write a pattern:

pat = '[0-9]*[:][0-9]*'
pat1 = '[_Position][=][0-9]*'

>>> for line in seq:
pat = '[0-9]*[:][0-9]*'
pat1 = '[_Position][=][0-9]*'
print (re.search(pat,line) and re.search(pat1,line))

<_sre.SRE_Match object at 0x163CAF00>
None

Now I know that I have a hit in the seq list evident
by  <_sre.SRE_Match object at 0x163CAF00>.

Here is the black box:

What kind of operations can I do on this to get those
two matches: 
164:623 and 6649. 

I read 
http://www.python.org/doc/2.2.3/lib/re-objects.html

This did not help me to progress further. May I
request tutors to give a small note explaining things.
In Alan Gauld's book, most of the explanation stopped
at 
<_sre.SRE_Match object at 0x163CAF00> this level.
After that there is no example where he did some
operations on these objects.  If I am wrong, I might
have skipped/missed to read it. Aplogies for that. 

Thank you very much in advance. 

K

--- Liam Clarke <[EMAIL PROTECTED]> wrote:

> ...as do I.
> 
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
> 
> openFile.close()
> 
> indexesToRemove=[]
> 
> for lineIndex in range(len(probe_pairs)):
> 
>if
> probe_pairs[lineIndex].startswith("Name="):
> 
> indexesToRemove.append(lineIndex)
> 
> for index in indexesToRemove:
>   probe_pairs[index]='""
> 
> Could just be
> 
> openFile=file("probe_pairs.txt","r")
> probe_pairs=openFile.readlines()
> 
> openFile.close()
> 
> indexesToRemove=[]
> 
> for lineIndex in range(len(probe_pairs)):
> 
>if
> probe_pairs[lineIndex].startswith("Name="):
>  probe_pairs[lineIndex]=''
> 
> 
> 
> 
> 
> On Fri, 14 Jan 2005 09:38:17 +1300, Liam Clarke
> <[EMAIL PROTECTED]> wrote:
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > key
> > >
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > > <_sre.SRE_Match object at 0x00E37AD8>
> > > <_sre.SRE_Match object at 0x00E37A68>
> > 
> > 
> > You are overwriting key each time you iterate.
> key.group() gives the
> > matched characters in that object, not a group of
> objects!!!
> > 
> > You want
> > > >>> name1 = '[N][a][m][e][=]'
> > > >>> keys=[]
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > keys.append[key]
> > 
> > >>> print keys
> > 
> > > 'Name='
> > >
> > > 1. My aim:
> > > To remove those Name= lines from my
> probe_pairs
> > > list
> > 
> > Why are you deleting the object key?
> > 
> > > >>> for i in range(len(probe_pairs)):
> > > key = re.match(name1,probe_pairs[i])
> > > del key
> > > print probe_pairs[i]
> > 
> > Here's the easy way. Assuming that probe_pairs is
> stored in a file callde
> > probe_pairs.txt
> > 
> > openFile=file("probe_pairs.txt","r")
> > probe_pairs=openFile.readlines()
> &

[Tutor] Faster procedure to filter two lists . Please help

2005-01-14 Thread kumar s

Hi group:
I have two lists a. 'my_report' and b. 'what'.

In list 'what', I want to take 6649 (element1:
164:623\t6649) and write to a new list ( although I
printed the result, my 
intension is to list.append(result). 

I took column 1 value of element 1 in what, which is
164:623 and checked in column 1 value in list
my_report, if it matches
I asked it to write the all columns of my_report along
with column 2 value in what list. 

(with my explanation, I feel I made it complex).

Here is what I did:




>>> what[0:4]
['164:623\t6649', '484:11\t6687', '490:339\t6759',
'247:57\t6880', '113:623\t6901']



>>>my_report[0:4]

['164:623\tTCATGGCTGACAACCCATCTTGGGA\t20_s_at',
'484:11\tATTATCATCACATGCAGCTTCACGC\t20_s_at',
'490:339\tGAATCCGCCAGAACACAGACA\t20_s_at',
'247:57\tAGTCCTCGTGGAACTACAACTTCAT\t20_s_at',
'113:623\tTCATGGGTGTTCGGCATGAAA\t20_s_at']







>>>for i in range(len(what)):
ele = split(what[i],'\t')
cor1 = ele[0]
for k in range(len(my_report)):
cols = split(my_report[k],'\t')
cor = cols[0]
if cor1 == cor:
print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]


164:623 6649TCATGGCTGACAACCCATCTTGGGA   
484:11  6687ATTATCATCACATGCAGCTTCACGC   
490:339 6759GAATCCGCCAGAACACAGACA   
247:57  6880AGTCCTCGTGGAACTACAACTTCAT   
113:623 6901TCATGGGTGTTCGGCATGAAA   




PROBLEM:

This process is very very slow. I have 249502 elements
in each list. The process has been running for over 30
min.  Could 
any one suggest a better fast procedure, to save time.


Thank you in advance. 

K



__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Faster procedure to filter two lists . Please help

2005-01-14 Thread kumar s

Hi Danny:
 Thank you for your suggestion. I tried creating a
dictionary of 'what' list and searched keys with
has_key method and it is pretty fast. 

Thanks again. following is the piece of code.

K



>>> cors = []
>>> intr = []
>>> for i in range(len(what)):
ele = split(what[i],'\t')
cors.append(ele[0])
intr.append(ele[1])


>>> what_dict = dict(zip(cors,intr))

>>> for i in range(len(my_report)):
cols = split(my_report[i],'\t')
cor = cols[0]
if what_dict.has_key(cor):
intr = what_dict[cor]

my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2])





--- Danny Yoo <[EMAIL PROTECTED]> wrote:

> 
> 
> On Fri, 14 Jan 2005, kumar s wrote:
> 
> > >>>for i in range(len(what)):
> > ele = split(what[i],'\t')
> > cor1 = ele[0]
> > for k in range(len(my_report)):
> > cols = split(my_report[k],'\t')
> > cor = cols[0]
> > if cor1 == cor:
> > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
> 
> 
> 
> Hi Kumar,
> 
> 
> Ok, this calls for the use of an "associative map"
> or "dictionary".
> 
> 
> The main time sink is the loop here:
> 
> > for k in range(len(my_report)):
> > cols = split(my_report[k],'\t')
> > cor = cols[0]
> > if cor1 == cor:
> > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2]
> 
> Conceptually, my_report can be considered a list of
> key/value pairs.  For
> each element in 'my_report', the "key" is the first
> column (cols[0]), and
> the "value" is the rest of the columns (cols[1:]).
> 
> 
> The loop above can, in a pessimistic world, require
> a search across the
> whole of 'my_report'.  This can take time that is
> proportional to the
> length of 'my_report'.  You mentioned earlier that
> each list might be of
> length 249502, so we're looking into a process whose
> overall cost is
> gigantic.
> 
> 
> [Notes on calculating runtime cost: when the
> structure of the code looks
> like:
> 
> for element1 in list1:
> for element2 in list2:
> some_operation_that_costs_K_time()
> 
> then the overall cost of running this loop will be
> 
> K * len(list1) * len(list2)
> ]
> 
> 
> We can do much better than this if we use a
> "dictionary" data structure. A
> "dictionary" can reduce the time it takes to do a
> lookup search down from
> a linear-time operation to an atomic-time one.  Do
> you know about
> dictionaries yet?  You can take a look at:
> 
> http://www.ibiblio.org/obp/thinkCSpy/chap10.htm
> 
> which will give an overview of a dictionary.  It
> doesn't explain why
> dictionary lookup is fast, but we can talk about
> that later if you want.
> 
> 
> Please feel free to ask any questions about
> dictionaries and their use.
> Learning how to use a dictionary data structure is a
> skill that pays back
> extraordinarily well.
> 
> 
> Good luck!
> 
> 




__ 
Do you Yahoo!? 
Meet the all-new My Yahoo! - Try it today! 
http://my.yahoo.com 
 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Selecting text

2005-01-18 Thread kumar s

Dear group:

I have two lists:

1. Lseq:

>>> len(Lseq)
30673
>>> Lseq[20:25]
['NM_025164', 'NM_025164', 'NM_012384', 'NM_006380',
'NM_007032','NM_014332']


2. refseq:
>>> len(refseq)
1080945
>>> refseq[0:25]
['>gi|10047089|ref|NM_014332.1| Homo sapiens small
muscle protein, X-linked (SMPX), mRNA',
'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGGCATCGGAATTGAGATCGCAGCT',
'CAGAGGACACCGGGCGTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT',
'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC',
'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAA',
'CCAGAAGGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC',
'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTGTGAA',
'CTTATGTAAAGCTGAACAGTAGTAGGAAGAAGGATTGATGTGAAGAAATAAAGAGGCA',
'GAAGATGGATTCAATAGCTCACTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG',
'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT',
'ATCTTCATGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGGAG',
'GAGGGATATGAATGGAGAATGATATGGCAATGTGCCTAACGAGATGGTTTCCCAAGCT',
'ACTTCCTACAGTAGGTCAATATTTGGAATGCGAGTTCTTCACCAAATTATGTCACTAA',
'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTTGTGA',
'>gi|10047091|ref|NM_013259.1| Homo sapiens neuronal
protein (NP25), mRNA',
'TGTGCTGCTATTGTGTGGATGCCGCGCGTGTCTTCTCTTCTTTCCAGAGATGGCTAACACCCGAGC',
'TATGGCTTAAGCCGAGAGGTGCAGGAGAAGATCGAGCAGAAGTATGATGCGGACCTGGAGAACAAGCTGG',
'TGGACTGGATCATCCTGCAGTGCGCCGAGGACATAGAGCACCCGCCGGCAGGGCCCACAGAA',
'ATGGTTAATGGACGGGACGGTCCTGTGCAAGCTGATAAATAGTTTATACCCACCAGGACAAGAGCCCATA',
'CCCAAGATCTCAGAGTCAAAGATGGCAAGCAGATGGAGCAAATCTCCCAGTTCCTGCTGCGG',
'AGACCTATGGTGTCAGAACCACCGACATCTTTCAGACGGTGGATCTATGGGAAGGGAAGGACATGGCAGC',
'TGTGCAGAGGACCCTGATGGCTTTAGGCAGCGTTGCAGTCACCAAGGATGATGGCTGCTATCAGAG',
'CCATCCTGGTTTCACAGGAAAGCCCAGCAGAATCGGAGAGGCCCGAGGAGCAGCTTCGCCAGGGAC',
'AGAACGTAATAGGCCTGCAGATGGGCAGCAACAAGGGAGCCTCCCAGGCGGGCATGACAGGGTACGGGAT',
'GCCCAGGCAGATCATGTTAGGACGCGGCATCCTGTGGTAGAGAGGACGAATGTTCCACACCATGGT']


If Lseq[i] is present in refseq[k], then I am
interested in printing starting from refseq[k] until
the element that starts with '>' sign. 

my Lseq has NM_014332 element and this is also present
in second list refseq. I want to print starting from
element where NM_014332 is present until next element
that starts with '>' sign.

In this case, it would be:
'>gi|10047089|ref|NM_014332.1| Homo sapiens small
muscle protein, X-linked (SMPX), mRNA',
'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGGCATCGGAATTGAGATCGCAGCT',
'CAGAGGACACCGGGCGTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT',
'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC',
'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAA',
'CCAGAAGGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC',
'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTGTGAA',
'CTTATGTAAAGCTGAACAGTAGTAGGAAGAAGGATTGATGTGAAGAAATAAAGAGGCA',
'GAAGATGGATTCAATAGCTCACTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG',
'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT',
'ATCTTCATGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGGAG',
'GAGGGATATGAATGGAGAATGATATGGCAATGTGCCTAACGAGATGGTTTCCCAAGCT',
'ACTTCCTACAGTAGGTCAATATTTGGAATGCGAGTTCTTCACCAAATTATGTCACTAA',
'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTTGTGA'

I could not think of any smart way to do this,
although I have tried like this:

>>> for ele1 in Lseq:
for ele2 in refseq:
if ele1 in ele2:
k = ele2
s = refseq[ele2].startswith('>')
print k,s



Traceback (most recent call last):
  File "", line 5, in -toplevel-
s = refseq[ele2].startswith('>')
TypeError: list indices must be integers


I do not know how to dictate to python to select lines
between two > symbols. 

Could any one help me thanks. 

K



__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Cluster algorithms

2005-01-26 Thread kumar s

Hi:

I am still trying to learn the OOPs side of python. 
however, things/circumstances dont seems to stop until
I finish my practise and attaing higher understanding.
may be, i am being pushed by circumstances into the
stream and i am being tested if I can swim efficiently
while I struggle with basic steps of swimming. The
100% analogy my perspective of learning python :-)


I have a couple of questions to ask tutors:

Are there any example programs depicting Clustering
algorithms such as agglomerative, complete link,
partional , squared error clustering, k-means or
clustering algos based on Neural networks or genetic
algorithm. although I just learned python, (to major
extent in programming also), I need to apply some of
these algos to my data.  Any
suggestions/recommendations? 


 Do I have to know to code well using OOP methods to
apply these algorithms?


-Kumar




__ 
Do you Yahoo!? 
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] files in a directory

2005-01-30 Thread kumar s

Hello.

I wrote a parser to parse spot intensities. The input
to this parser i am giving one single file

f1 = open('my_intensity_file.dml','r')
int = f1.read().split('\n')

my_vals  = intParser(int) 

intParser return a list
f2  = open('myvalues.txt','w')
for line in my_vals:
 f2.write(line)
 f2.write('\n')

f2.close()


The problem with this approach is that, i have to give
on file per a run. I have 50 files to pare and i want
to do that in one GO.  I kepy those 50 files in one
directory. Can any one suggest an approach to automate
this process. 

I tried to use f1 = stdin(...) it did not work. i dont
know , possible is that i am using incorrect syntax.

Any suggestions. 

Thank you. 
K







__ 
Do you Yahoo!? 
All your favorites on one personal page  Try My Yahoo!
http://my.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] files in a directory

2005-01-30 Thread kumar s

Thank you Jay.  It worked, I am V.V.happy. I tried
Liam's suggestion also, but some weird things are
going and I am not only getting results but also any
error. I am working on that. 


Other thing. 
I a feeding my parser some coordinates specified by
me, where I am asking the parser to extract the
intensity values only for those coordinates. 

For exmple:
Coordinates_file = open('xxx','r')

def coOrs(coordinates_file):
  ...
  ..
  ## this parse extracts the my specified coordinates#
  ## and saves as a list for lookup in Intensity
File##

  return my_coordinates_list


def intPars
er(Intensity File, my_coordinates_list):
  
   ...

return intensities

This above f(x) returns intensities and my
coordinates.

Now that I am reading many files at once, I wanted, to
have a tab delim file op that looks like this:

My_coors Int_file 1 Int_file2 
IntFile3
01:26   34  235 
245.45
04:42  342.4452.445.5
02:56  45.4 34.5 557.8



code:
files = glob.glob("My_dir\*.ext")
def parSer(file):
f1 = open(file,'r')
seelf = f1.read().split('\n')
seelfile = seelf[24:506969]
my_vals = intParser(seelfile,pbs)
f2 = open(file+'.txt','w')
for line in my_vals:
f2.write(line+'\t') => asking for tab delim..
f2.write('\n')
f2.close()

def main():
for each in files:
parSer(each)
main()


=> putting here a '\t' did not work.. . 
Am i wrong here.  Any suggestions, please. 

Thank you in advance.

--- Jay Loden <[EMAIL PROTECTED]> wrote:

> There's a few ways to accomplish this...the way that
> comes to mind is: 
> 
>
##
> import glob
> 
> files = glob.glob("/path/to/director/*.dml")  #
> assuming you want only .dml 
> 
> def spot(file):
>   '''search for intensity spots and report them to
> an output file'''
>   f1 = open('my_intensity_file.dml','r')
>   int = f1.read().split('\n')
> 
>   my_vals  = intParser(int) 
> 
>   intParser return a list
>   f2  = open('myvalues.txt','w') # you will want to
> change this to output mult 
>   for line in my_vals:   # files, or to at least
> append instead of overwriting
>   f2.write(line)
> f2.write('\n')
>   f2.close()
> 
> def main():
>   for each in files:
> spot(each)
> 
> main()
> 
>
##
> 
> Basically, turn the parsing into a function, then
> create a list of files, and 
> perform the parsing on each file.  glob() lets you
> grab a whole list of files 
> matching the wildcard just like if you typed "ls
> *.dml" or whatever into a 
> command prompt.  There wasn't too much info about
> specifically how you needed 
> this to work, so this is a rough sketch of what you
> want. Hopefully it helps.
> 
> -Jay
> 
> On Sunday 30 January 2005 03:03 am, kumar s wrote:
> > Hello.
> >
> > I wrote a parser to parse spot intensities. The
> input
> > to this parser i am giving one single file
> >
> > f1 = open('my_intensity_file.dml','r')
> > int = f1.read().split('\n')
> >
> > my_vals  = intParser(int)
> >
> > intParser return a list
> > f2  = open('myvalues.txt','w')
> > for line in my_vals:
> >  f2.write(line)
> >  f2.write('\n')
> >
> > f2.close()
> >
> >
> > The problem with this approach is that, i have to
> give
> > on file per a run. I have 50 files to pare and i
> want
> > to do that in one GO.  I kepy those 50 files in
> one
> > directory. Can any one suggest an approach to
> automate
> > this process.
> >
> > I tried to use f1 = stdin(...) it did not work. i
> dont
> > know , possible is that i am using incorrect
> syntax.
> >
> > Any suggestions.
> >
> > Thank you.
> > K
> >
> >
> >
> >
> >
> >
> >
> > __
> > Do you Yahoo!?
> > All your favorites on one personal page  Try My
> Yahoo!
> > http://my.yahoo.com
> > ___
> > Tutor maillist  -  Tutor@python.org
> > http://mail.python.org/mailman/listinfo/tutor
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




__ 
Do you Yahoo!? 
Yahoo! Mail - Find what you need with new enhanced search.
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] TypeError: can only concatenate list (not "str") to list

2005-01-30 Thread kumar s

>nmr = nmrows[i]
> pbr = cols[0]
> print nmrow[i] +'\t'+cols[0]

nmr = str(nmrows[i])
pbr = cols[0]

print nmrow[i]+'\t'+cols[0]

will print what you want.

k
--- Srinivas Iyyer <[EMAIL PROTECTED]> wrote:

> Hello group,
>  I am trying to print rows from two lists together:
> 
> how can i deal with TypeError' where i have to print
> a
> list and a string. 
> 
> for line in pb:  # tab delim text with 12 columns
>   cols = line.split('\t')
>   temp_seq = cols[7].split('\n') # extract 7thcol
>   seq = temp_seq[0].split(',') #splitting it by ,
>   for nm in seq:
>   for i in range(len(nmrows)):
>   if nm == nmrows[i][0] and nmrows[i][3] < cols[4]
> and nmrows[i][4] > cols[5]:
>   nmr = nmrows[i]
>   pbr = cols[0]
>   print nmrow[i] +'\t'+cols[0]
> 
> 
> 
> I tried the following also :
> 
> I created an empty list outside for loop and tried
> to
> extend the elements of the list and string
> 
> nmr = nmrows[i]
> pbr = cols[0]
> result.extend(nmr+'\t'+pbr)
> 
> # result is the list i created. nmr is a list, and
> pbr
> is a string. 
> 
> can any one plaease help.
> 
> thanks
> Srini
> 
> 
>   
> __ 
> Do you Yahoo!? 
> The all-new My Yahoo! - Get yours free! 
> http://my.yahoo.com 
>  
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 




__ 
Do you Yahoo!? 
Yahoo! Mail - 250MB free storage. Do more. Manage less. 
http://info.mail.yahoo.com/mail_250
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Append function

2005-01-30 Thread kumar s

Hello:

In append function instead of appending one below the
other can I append one next to other. 

I have a bunch of files where the first column is
always the same. I want to collect all those files,
extract the second columns by file wise and write the
first column, followed by the other columns(extracted
from files) next to each other.

Any tricks , tips and hints. 

thanks
K
 





__ 
Do you Yahoo!? 
Yahoo! Mail - Helps protect you from nasty viruses. 
http://promotions.yahoo.com/new_mail
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Append function

2005-02-01 Thread kumar s

Hi Danny:

 I have ~50 files in this format:

File1:

680:209 3006.3
266:123 250.5
62:393  117.3
547:429 161.5
341:311 546.5
132:419 163.3
98:471  306.3

File 2:
266:123 168.0
62:393  119.3
547:429 131.0
341:311 162.3
132:419 149.5
98:471  85.0
289:215 207.0
75:553  517.0


I am generating these files using this module:

f1 = open("test2_cor.txt","r")
ana = f1.read().split('\n')
ana = ana[:-1]
pbs = []
for line in ana:
cols = line.split('\t')
pb = cols[0]
pbs.append(pb)
##CEL Files section 
files = glob.glob("c:\files\*.cel")

def parSer(file):
f1 = open(file,'r')
celf = f1.read().split('\n')
celfile = celf[24:409624]
my_vals = celParser(celfile,pbs)
f2 = open(file+'.txt','w')
for line in my_vals:
f2.write(line+'\t')
f2.write('\n')
f2.close()

def main():
for each in files:
parSer(each)
main()
 

Because, I asked to write a file with the name of the
file as output, it is generating 50 output files for
50 input files. 

What I am interested in is to append the output to one
single file but with tab delimmitation. 

For example:

for each file there are 2 columns. Cor and val
file 1file 2file 3 file 4
cor val  cor  val  cor val cor val
x:x 1345 x:x 5434  x:x 4454 x:x 4462
x:y 3463 x:y 3435  x:y 3435 x:y 3435

Could you suggest a way. Thank you. 



__ 
Do you Yahoo!? 
The all-new My Yahoo! - What will yours do?
http://my.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Append function

2005-02-01 Thread kumar s

Hi Kent, 
 Thank you for your suggestion.  I keep getting
IOError permission denied every time I try the tips
that you provided. I tried to lookup on this error and
did not get reasonable answer. Is this error something
to do with Windows OS?

Any suggestions. 

Thank you
K

>>> allColumns = [readColumns("C:\Documents and
Settings\myfiles")for filePath in file_list]

Traceback (most recent call last):
  File "", line 1, in -toplevel-
allColumns = [readColumns("C:\Documents and
Settings\myfiles")for filePath in file_list]
  File "", line 2, in readColumns
rows = [line.split() for line in open(filePath)]
IOError: [Errno 13] Permission denied: 'C:\\Documents
and Settings\\myfiles'
>>> 



> def readColumns(filePath):
>  rows = [ line.split() for line in
> open(filePath) ]
>  return zip(*rows)
> 
> # list of all the files to read
> allFiles = [ 'f1.txt', 'f2.txt' ]
> 
> # both columns from all files
> allColumns = [ readColumns(filePath) for filePath in
> allFiles ]
> 
> # just the second column from all files
> allSecondColumns = [ cols[1] for cols in allColumns
> ]
> 
> # a representative first column
> col1 = allColumns[0][0]
> 
> # zip it up into rows
> allRows = zip(col1, *allSecondColumns)
> 
> for row in allRows:
>  print '\t'.join(row)
> 
> 
> Kent


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] please help formating

2007-05-22 Thread kumar s

hi group,

i have a data obtained from other student(over 100K)
lines that looks like this:
(39577484, 39577692) [['NM_003750']]
(107906, 108011) [['NM_002443']]
(113426, 113750) [['NM_138634', 'NM_002443']]
(106886, 106991) [['NM_138634', 'NM_002443']]
(100708, 100742) [['NM_138634', 'NM_002443']]
(35055935, 35056061) [['NM_002313', 'NM_001003407',
'NM_001003408']]

I know that first two items in () are tuples, and the
next [[]] a list of list. I was told that the tuples
were keys and the list was its value in a dictionary.

how can I parse this into a neat structure that looks
like this:
39577484, 39577692 \t NM_003750
107906, 108011 \t NM_002443
113426, 113750 \t  NM_138634,NM_002443
106886, 106991 \t  NM_138634,NM_002443
100708, 100742 \t  NM_138634,NM_002443
35055935, 35056061 \t
NM_002313,NM_001003407,NM_001003408


I treid substituting in vim editor but it is not
effective. 

Thank you

kum


   
Pinpoint
 customers who are looking for what you sell. 
http://searchmarketing.yahoo.com/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] dealing with nested list values in a dictionary

2007-05-24 Thread kumar s

Dear group,

unfortunately my previous post got tagged as
'homework' mail and got no responses. 

In short, I have a dictionary structure as depicted
below. 

I want to go over every key and print the key,value
pairs in a more sensible way. 

I have written a small piece of code. May I request
tutors to go through it and comment if it is correct
or prone to bugs. 

Thank you. 
kum

>>>md = {(21597133, 21597325): [['NM_032457']], 
(21399193, 21399334): [['NM_032456'], ['NM_002589']], 
(21397395, 21399192): [['NM_032457'], ['NM_032456'],
['NM_002589']], 
(21407733, 21408196): [['NM_002589']], 
(21401577, 21402315): [['NM_032456']], 
(21819453, 21820111): [['NM_032457']], 
(21399335, 21401576): [['NM_032457'], ['NM_032456'],
['NM_002589']]}

>>> for item in md.keys():
mlst = []
for frnd in md[item]:
for srnd in frnd:
mlst.append(srnd)
mystr = ','.join(mlst)
print(('%d\t%d\t%s')%(item[0],item[1],mystr))


2159713321597325NM_032457
2139919321399334NM_032456,NM_002589
2139739521399192NM_032457,NM_032456,NM_002589
2140773321408196NM_002589
2140157721402315NM_032456
2181945321820111NM_032457
2139933521401576NM_032457,NM_032456,NM_002589


  
Fussy?
 Opinionated? Impossible to please? Perfect.  Join Yahoo!'s user panel and lay 
it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] looping problem

2006-09-23 Thread kumar s

hi, 

the reason could be that I did not quite understand
the concept of looping 

I have a list of 48 elements 

I want to create another two lists , listA and listB

I want to loop through the list with 48 elements and 

select element with index 0,3,6,9,12 ..etc into listA

select elements with index 2,5,8,11 etc into listB.


Could any one help me how can I do that

thankyou

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] looping problem

2006-09-23 Thread kumar s

hi, 

thank you. this is not a homework question. 

I have a very huge file of fasta sequence.

> GeneName  \t 
AATTAAGGAA..





(1000 lines)
AATAAGGA
>GeneName  \t 
GGAGAGAGATTAAGAA
(15000 lines)



when I read this as:

f2= open('myfile','r')
dat = f2.read().split('\n')

turned out to be very expensive deal on computer. 


Instead I tried this:

dat = f2.read() 

(reading into jumbo file of 19,100,442,1342 lines is
easy but getting into what i want is a problem). 


I want to create a dictionary where 'GeneName' as key
and sequence of ATGC characters as value 


biglist = dat.split('\t')
['GeneName ','','ATTAAGGCCAA'...]

Now I want to select ''GeneName ' into listA
and 'ATTAAGGCCAA' into listB

so I want to select 0,3,6,9 elements into listA
and 2,5,8,11 and so on elements into listB

then I can do dict(zip(listA,listB))



however, the very loops concept is getting blanked out
in my brain when I want to do this:

for j in range(len(biglist)):
from here .. I cannot think anything..

may be it is just mental block.. thats the reason I
seek help on forum. 


Thanks





--- jim stockford <[EMAIL PROTECTED]> wrote:

> 
> keep a counter in your loop. is this a homework
> question?
> 
> On Sep 23, 2006, at 8:34 AM, kumar s wrote:
> 
> > hi,
> >
> > the reason could be that I did not quite
> understand
> > the concept of looping
> >
> > I have a list of 48 elements
> >
> > I want to create another two lists , listA and
> listB
> >
> > I want to loop through the list with 48 elements
> and
> >
> > select element with index 0,3,6,9,12 ..etc into
> listA
> >
> > select elements with index 2,5,8,11 etc into
> listB.
> >
> >
> > Could any one help me how can I do that
> >
> > thankyou
> >
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> > ___
> > Tutor maillist  -  Tutor@python.org
> > http://mail.python.org/mailman/listinfo/tutor
> >
> 
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] extracting numbers from a list

2006-10-16 Thread kumar s

hi :

I have a simple question to ask tutors:

list A :

a = [10,15,18,20,25,30,40]

I want to print
10 15 (first two elements)
16 18 (16 is last number +1)
19 20
21 25
26 30
31 40

>>> fx = a[0]
>>> fy = a[1]
>>> b = a[2:]
>>> ai = iter(b)
>>> last = ai.next()
>>> for j in ai:
... print fy+1,last
... last = j
...
16 18
16 20
16 25
16 30


can any one help please. 

thank you 

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extracting numbers from a list

2006-10-17 Thread kumar s

In continuation to :
Re: [Tutor] extracting numbers from a list



hello list

I have coordinates for exons (chunks of sequence). For
instance:

10 - 50  A
10 - 20  B
35 - 50  B
60 - 70  A
60 - 70  B
80 - 100 A
80 - 100 B
(The above coordinates and names are easier than in
dat)

Here my aim is to creat chunks of exons specific to A
or B.

For instance:
10 - 20,35 - 50 are common  to both A and B, whereas
21 - 34 is specific only to A.

The desired output for me is :

10 \t 20  A,B
21 \t 34  A
35 \t 50  A,B
60 \t 70  A,B
80 \t 100 A,B

I just learned python frm a friend and he is also a
novice.

What I could get is the break up of chunks. A problem
here I am getting number different from what I need:
[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]

The list next to chunks is the pairs( the longer
ones).

could any one help me how can I correct [21, 35],[36,
50] to 21 \t 34 , 35 \t 50.  I tried chaning the
indexs in function chunker, it is not working for me.
Also, how can I point chunks to their names.

This is the abstract example of the complex numbers
and their sequence names.  I want to change the simple
code and then go to the complex one.

Thank you very much for your valuable time. 



REsult: what I am getting now:

[10, 20] [10, 50]
[21, 35] [10, 50]
[36, 50] [10, 50]
[60, 70] [60, 70]
[80, 100] [80, 100]



My code:




from sets import Set
dat = ['10\t50\tA', '10\t20\tB', '35\t50\tB',
'60\t70\tA', '60\t70\tB', '80\t100\tA', '80\t100\tB']


# creating a dictionary with coordiates as key and NM_
as value
#

ekda = {}
for j in dat:
cols = j.split('\t')
   
ekda.setdefault(cols[0]+'\t'+cols[1],[]).append(cols[2])
##
#getting tab delim numbers only and not the A,B
bat = []
for j in dat:
cols = j.split('\t')
bat.append(cols[0]+'\t'+cols[1])
pairs = [ map(int, x.split('\t')) for x in bat ]


#
# this function takes pairs (from the above result)and
longer blocks(exons).
# For instance:
# 10 - 20; 14 - 25; 19 - 30; 40 - 50; 45 - 60; 70 - 80
# a =
[[10,20],[14,25],[19,30],[40,50],[45,60],[70,80]]
# for j in exoner(a):
#   print j
#The result would be:
#10 - 30; 40 - 60; 70 - 80
#
def exoner(pairs):
pairs.sort()
i = iter(pairs)
last = i.next()
for current in i:
if current[0] in
xrange(last[0],last[1]):
if current[1] > last[1]:
last = [last[0],
current[1]]
else:
last =
[last[0],last[1]]
else:
yield last
last = current
yield last
lon = exoner(pairs)
#
## Here I am getting all the unique numbers in dat

nums = []
for j in pairs:
for k in j:
nums.append(k)
unm = Set(nums)
unums = []
for x in unm:
unums.append(x)
unums.sort()
#
### This function takes a list of numbers and breaks
it in pieces
## For instance [10,15,20,25,30]
#>>> i = [10,15,20,25,30]
#>>> chunker(i)
#[[10, 15], [16, 20], [21, 25], [26, 30]]


def chunker(lis):
res = []
res.append([lis[0],lis[1]])
for m in range(2,len(lis)):
res.append([lis[m-1]+1,lis[m]])
return res

# Here I take each pair (longer block) and roll over
all the unique numbers ((unums) from dat) and check if
that number is in#the range of pair, if so, I will
break all those set of number in pair range into small
blocks
##
gdic = {}
unums.sort()
for pair in exoner(pairs):
x = pair[0]
y = pair[1]+1
sml = []
for k in unums:
if k in range(x,y):
sml.append(k)
else:
pass
for j in chunker(sml):
print j,pair






__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] extracting numbers from a list

2006-10-18 Thread kumar s

Thank you Danny. I am going over your email and trying
to understand (i am a biologist with bioinformatics
training). 

I am not sure if I got your opinion about the way I
solved. do you mean that there is something wrong with
the way i solved it. 

I am not sure If I explained the problem correctly in
terms of exons, transcripts. If not I would be happy
to send you a pdf file with a figure. 

Thanks again. 



--- Danny Yoo <[EMAIL PROTECTED]> wrote:

> 
> 
> On Mon, 16 Oct 2006, kumar s wrote:
> 
> > I have a simple question to ask tutors:
> >
> > list A :
> >
> > a = [10,15,18,20,25,30,40]
> 
> 
> Hi Kumar,
> 
> If you're concerned about correctness, I'd recommend
> that you try thinking 
> about the problem inductively.  An inductive
> definition for what you're 
> asking is straightforward to state in about three or
> four lines of code. 
> I'll try to go through it slowly so you see what the
> reasoning behind it 
> is.  The code sketch above uses a technique that you
> should already know 
> called "mathematical induction."
> 
> 
> http://en.wikipedia.org/wiki/Mathematical_induction
> 
> 
> Let's say we're designing a function called
> getSpans().  Here are some 
> sample behavior we'd like from it:
> 
>  getSpans([10, 15]) = [(10, 15)]
>  getSpans([10, 15, 18]) = [(10, 15), (16, 18)]
>  getSpans([10, 15, 18, 20]) = [(10, 15), (16,
> 18), (19, 20)]
> 
> Would you agree that this is reasonable output for a
> function like this? 
> getSpans() takes a list of numbers, and returns a
> list of pairs of 
> numbers.
> 
> 
> There is one "base" case to this problem.  The
> smallest list we'd like to 
> consider is a list of two elements.  If we see that,
> then we're happy, 
> because the answer is really simple:
> 
>  getSpans([a, b]) = [(a, b)]
> 
> 
> Otherwise, let's imagine a list that's a bit longer,
> with three elements. 
> Concretely, we know that this is going to look like:
> 
>  getSpans([a, b, c]) = [(a, b), (b+1, c)]
> 
> But another way to say this, though is that:
> 
>  getSpans([a, b, c]) = [(a, b)] + getSpans([b+1,
> c])
> 
> That is, we try to restate the problem in terms of
> smaller subproblems.
> 
> 
> 
> Let's look at what the case for four elements might
> look like:
> 
>  getSpans([a, b, c, d]) = [(a, b), (b+1, c),
> (c+1, d)]
> 
> Concretely, we know that that's the list of spans
> we'd like to see.  But 
> if we think about it, we might also restate this as:
> 
>  getSpans([a, b, c, d]) = [a, b] +
> getSpans([b+1, c, d])
> 
> because getSpans([b+1, c, d]) is going to give us:
> 
>  [(b+1, c), (c+1, d)]
> 
> All we need to do is add on [(a, b)] to that to get
> the complete answer to 
> getSpans([a, b, c, d]).
> 
> 
> Generally, for any particular list L that's longer
> than two elements:
> 
>  getExons(L) = [L[0:2]] + getExons([L[1] + 1] +
> L[2:])
> 
> When we work inductively, all we really need to
> think about is "base case" 
> and "inductive case": the solution will often just
> fall through from 
> stating those two cases.  An inductively-designed
> function is going to 
> look something like:
> 
>  def solve(input):
>  if input looks like a base-case:
>  handle that directly in a base-case way
>  else:
>  break up the problem into smaller
> pieces
>  that we assume can be solve()d by
> induction
> 
> The inductive definition above is slightly
> inefficient because we're doing 
> physical list slicing.  Rewriting it to use loops
> and list indicies 
> instead of slicing is a little harder, but not much
> harder.
> 
> Another example: how do we add up a list of numbers?
>  If there's just one 
> number, that must be the sum.  Otherwise, we can add
> up the first number 
> to the sum of the rest of the numbers.
> 
> #
> def mysum(L):
>  if len(L) == 1:
>  return L[0]
>  else:
>  return L[0] + mysum(L[1:])
> #
> 
> It's a funky way of doing this, but this is a real
> definition that works 
> (modulo limits in Python's recursion
> implementation).  It's inefficient, 
> but it's easy to state and reason about.  I'm
> assuming you're more 
> interested in correctness than efficiency at the
> moment.  Get it correct 
> first, then if you really need to, work to get it
> fast.
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] count numbers only in the string

2012-03-06 Thread kumar s

Hi :

I have some strings with both alpha-numeric strings. I want to add all the 
numbers in that string and leave characters and special characters. 
1A0G19

5G0C25^C52

0G2T3T91
44^C70

How can I count only the numbers in the above. 

1 A 0 G 19       =    1+0+19 = 20

5 G 0 C 25 ^C 52  =   5+0+25+52 = 82

0 G 2 T 3 T 91    =  0+2+3+91 =  96
44 ^C 70   =   44+70 =  114

 In first string 1A0G19  I am only adding 1, 0, and 19.    I am not splitting 
19 to add 1+9 which will give totally wrong answer for me.


Is there a way I can do this. 

Thanks for your advise. 

kumar

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] substitute using re.sub

2017-10-25 Thread kumar s via Tutor

Hi group, I am trying to substitute in the following way and i cannot. Could 
you point out whats wrong in what i am doing. 

>>> z'.|D'
>>> re.sub(z,'1',z)'111'
I just want only  '1' and not '111'. 
I want:>>> re.sub(z,'1',z)'1'
re.sub is repeatedly inserting 3 times because z has .|D . How can I substitute 
only 1. 
ThanksKumar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

48 matches

Mail list logo