[Tutor] Python regular expression
Dear group, I have a file with 645,984 lines. This file is composedcompletely of bocks. For e.g. [Unit111] Name=NONE Direction=2 NumAtoms=16 NumCells=32 UnitNumber=111 UnitType=3 NumberBlocks=1 [Unit111_Block1] Name=31318_at BlockNumber=1 NumAtoms=16 NumCells=32 StartPosition=0 StopPosition=15 CellHeader=XY PROBE FEATQUALEXPOS POS CBASE PBASE TBASE ATOMINDEX CODONINDCODON REGIONTYPE REGION Cell1=24636 N control 31318_at0 13 A A A 0 407064 -1 -1 99 Cell2=24635 N control 31318_at0 13 A T A 0 406424 -1 -1 99 Cell3=631 397 N control 31318_at1 13 T A T 1 254711 -1 -1 99 [Unit113] Name=NONE Direction=2 NumAtoms=16 NumCells=32 UnitNumber=113 UnitType=3 NumberBlocks=1 [Unit113_Block1] Name=31320_at BlockNumber=1 NumAtoms=16 NumCells=32 StartPosition=0 StopPosition=15 CellHeader=XY PROBE FEATQUALEXPOS POS CBASE PBASE TBASE ATOMINDEX CODONINDCODON REGIONTYPE REGION Cell1=6863 N control 31320_at0 13 T A T 0 40388 -1 -1 99 Cell2=6864 N control 31320_at0 13 T T T 0 41028 -1 -1 99 Cell3=99194 N control 31320_at1 13 C C C 1 124259 -1 -1 99 I have a file with identifiers that are found in the first file as : Name=31320_at I am interested in getting lines of block that are present in first to be written as a file. I am search: search = re.search ["_at") my question: how can i tell python to select some rows that have particular pattern such as [Name] or Name of [Unit]. is there any way of doing this. please help me thanks kumar __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] How to select particular lines from a text
Dear group, This is continuation to my previous email with sugject line "Python regular expression". My text file although, looks like .ini file, but it is not. It is a chip definition file from Gene chip. it is a huge file with over 340,000 lines. I have particular set of question in general not related to that file: Exmple text: Name: City: Name: City: Characterstics of this text: 1. This text is divided into blocks and every block start with 'Name'. The number of lines after this identifier is random. In this particular case how a particular logic I can think of to extract some of these blocks is: 1.write a reg.exp to identify the Name identifier one need. 2. based on the this, ask the program to select all lines after that until it hits either a new line OR another name identifier: My question: How can I tell my program these 2 conditions: 1. mark the identifier i need and select all the lines after that identifier until it hits a new line or another name identifier. please englihten me with your suggestions. thank you. kumar __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Can i define anywhere on file object function for reading a range of lines?
Dear group, For instance I have a text that looks like following: Segment:Page 21 x x . x Segment:Page 22 Segment:Page 23 I have another file with Page numbers that looks like this: Page 1 Page 2 .. Page 22 Page 34 Page 200 I can see that Page 22 is existing in my first file. Now I am trying locate Page 22 segment in first file and asking my program to read STARTING from Segment:Page 22 to End of page 22 segment that is a blank line(empty line) OR Start of another segment which Segment: Page 23. Question: Is there any function where I can specify to python buit-in function to select specific line (such as starting from segment: page 22 TO the next new line) instead of the whole lines until EOF. e.g.: a = readlines (From , TO ) I asked a similar question before and that was well taught by experts, however, I am still confused. Can any one please help me again. Thank you. Kumar __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Finding a part of an element in a list
Dear Group, I have a list that is: List1 = ['Tyres','windsheild','A\CUnit','Model=Toyota_Corolla'] In other list I have : List2= ['Corolla','Accord','Camry'] I want to see if Corolla is there in list 1: The code: for i in range(len(List1)): if i in range(len(List2): print i If I have 'Corolla' as an element in both list then it is easy to find. However, in List1 this element appears as 'Model=Toyota_Corolla'. How can I ask python to match both elements: 'Model=Toyota_Corolla' and 'Corolla', where a part of element is matching. please help. thanks __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Removing a row from a tab delimitted text
Dear group, I have a file, with Name identifier followed by two columns with numbers. Here is how my file looks: Name=3492_at Cell1=481 13 (The space between (481 and 13 is tab) Cell1=481 13 Cell1=481 13 Name=1001_at Cell1=481 13 Cell2=481 12 Cell1=481 13 Cell1=481 13 Cell2=481 12 Name=1002_at Cell3=482 12 Cell1=481 13 Cell1=481 13 Cell2=481 12 Cell3=482 12 Cell4=482 13 Cell1=481 13 My question: 1. How can I remove the line where Name identfier exists and get two columns of data. Thanks kumar. __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Printing two elements in a list
Dear group, I have two lists names x and seq. I am trying to find element of x in element of seq. I find them. However, I want to print element in seq that contains element of x and also the next element in seq. So I tried this piece of code and get and error that str and int cannot be concatenated >>> for ele1 in x: for ele2 in seq: if ele1 in ele2: print (seq[ele1+1]) Traceback (most recent call last): File "", line 4, in -toplevel- print (seq[ele1+1]) TypeError: cannot concatenate 'str' and 'int' objects 2. TRIAL TWO: >>> for ele1 in x: for ele2 in seq: if ele2 in range(len(seq)): if ele1 in ele2: print seq[ele2+1] This is taking forever and I am not getting an answer. 3. TRIAL 3: I just asked to print the element in seq that matched element 1 in X. It prints only that element, however I want to print the next element too and I cannot get it. >>> for ele1 in x: for ele2 in seq: if ele1 in ele2: print ele2 >probe:HG-U95Av2:31358_at:454:493; Interrogation_Position=132; Antisense; >probe:HG-U95Av2:31358_at:319:607; Interrogation_Position=144; Antisense; >>> len(x) 4504 >>> x[1:10] ['454:494', '319:607', '319:608', '322:289', '322:290', '183:330', '183:329', '364:95', '364:96'] >>> len(seq) 398169 >>> seq[0:4] ['>probe:HG-U95Av2:1000_at:399:559; Interrogation_Position=1367; Antisense;', 'TCTCCTTTGCTGAGGCCTCCAGCTT', '>probe:HG-U95Av2:1000_at:544:185; Interrogation_Position=1379; Antisense;', 'AGGCCTCCAGCTTCAGGCAGGCCAA'] >>> for ele1 in x: for ele2 in seq: if ele1 in ele2: print ele2 >probe:HG-U95Av2:31358_at:454:493; Interrogation_Position=132; Antisense; >probe:HG-U95Av2:31358_at:319:607; Interrogation_Position=144; Antisense; How Do I WANT: I want to print get an output like this: >probe:HG-U95Av2:1000_at:399:559; Interrogation_Position=1367; Antisense;' TCTCCTTTGCTGAGGCCTCCAGCTT >probe:HG-U95Av2:1000_at:544:185; Interrogation_Position=1379; Antisense; AGGCCTCCAGCTTCAGGCAGGCCAA can any one please suggest what is going wrong in my statements and how can I get it. Thank you. Kumar __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Printing two elements in a list
Hello group, Thank you very much for your kind replies. In fact I survived to pull out what I needed by going with Kent's tip by enumerating on iterator. The problem with me is suddenly I embarked on something big problem and I am surviving it in pieces by writing pieces of code. I have another question: To be brief: My list contains some elements that I do not want and I want to remove unwanted elements in my list: My TEXT file looks like this: Name=32972_at Cell1=xxx xxx N control 32972_at Cell1=xxx xxx N control 32972_at Cell1=xxx xxx N control 32972_at Cell1=xxx xxx N control 32972_at Name=3456_at Cell1=xxx xxx N control 3456_at Cell1=xxx xxx N control 3456_at Cell1=xxx xxx N control 3456_at Cell1=xxx xxx N control 3456_at . ... x xxx (34K lines) I want to remove Name=Xxxx_at identifiers. My List: ['Name=32972_at', 'Cell1=432\t118\tN\tcontrol\t32972_at\t0\t13\tA\tA\tA\t0\t75952\t-1\t-1\t99\t', 'Cell2=432\t117\tN\tcontrol\t32972_at\t0\t13\tA\tT\tA\t0\t75312\t-1\t-1\t99\t', 'Cell3=499\t632\tN\tcontrol\t32972_at\t1\t13\tC\tC\tC\t1\t404979\t-1\t-1\t99\t'] I tried to resolve in this way: >>>pat = re.compile('Name') >>> for i in range(len(cord)): x = pat.search(cord[i]) cord.remove(x) I know I am wrong here because I do not know how to search and remove an element in a list. Can any one please help me. on Page 98, chapter Lists and dictionaries of mark lutz's learning python. It is mentioned in table 6-1 : L2.append(4) Methods: grow,sort,search,reverse etc. Although not much is covered on this aspect in this book, I failed to do more operations on list. Looking forward for help from tutors. Thank you. Kumar. --- Kent Johnson <[EMAIL PROTECTED]> wrote: > kumar, > > Looking at the quantity and structure of your data I > think the search you are doing is going to be > pretty slow - you will be doing 4504 * 398169 = > 1,793,353,176 string searches. > > Where does the seq data come from? Could you > consolidate the pairs of lines into a single record? > If > you do that and extract the '399:559' portion, you > could build a dict that maps '399:559' to the > full record. Looking up '399:559' in the dictionary > would be much, much faster than searching the > entire list. > > If you have multiple entries for '399:559' you could > have the dict map to a list. > > Kent > > kumar s wrote: > > > >>>>len(x) > > > > 4504 > > > >>>>x[1:10] > > > > ['454:494', '319:607', '319:608', '322:289', > > '322:290', '183:330', '183:329', '364:95', > '364:96'] > > > >>>>len(seq) > > > > 398169 > > > >>>>seq[0:4] > > > > ['>probe:HG-U95Av2:1000_at:399:559; > > Interrogation_Position=1367; Antisense;', > > 'TCTCCTTTGCTGAGGCCTCCAGCTT', > > '>probe:HG-U95Av2:1000_at:544:185; > > Interrogation_Position=1379; Antisense;', > > 'AGGCCTCCAGCTTCAGGCAGGCCAA'] > > > > > > > >>>>for ele1 in x: > > > > for ele2 in seq: > > if ele1 in ele2: > > print ele2 > > > > > > > >>probe:HG-U95Av2:31358_at:454:493; > > > > Interrogation_Position=132; Antisense; > > > >>probe:HG-U95Av2:31358_at:319:607; > > > > Interrogation_Position=144; Antisense; > > > > > > > > > > > > > > How Do I WANT: > > > > I want to print get an output like this: > > > > > > > >>probe:HG-U95Av2:1000_at:399:559; > > > > Interrogation_Position=1367; Antisense;' > > TCTCCTTTGCTGAGGCCTCCAGCTT > > > > > >>probe:HG-U95Av2:1000_at:544:185; > > > > Interrogation_Position=1379; Antisense; > > AGGCCTCCAGCTTCAGGCAGGCCAA > > > > > > can any one please suggest what is going wrong in > my > > statements and how can I get it. > > > > Thank you. > > Kumar > > > > > > > > __ > > Do you Yahoo!? > > Yahoo! Mail - 250MB free storage. Do more. Manage > less. > > http://info.mail.yahoo.com/mail_250 > > ___ > > Tutor maillist - [EMAIL PROTECTED] > > http://mail.python.org/mailman/listinfo/tutor > > > ___ > Tutor maillist - [EMAIL PROTECTED] > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Please help matching elements from two lists and printing them
Dear group, I have two tables: First table: spot_cor: 432 117 499 631 10 0 326 83 62 197 0 0 37 551 Second table: spot_int 0 0 98 1 0 5470 2 0 113 3 0 5240 4 0 82.5 5 0 92 6 0 5012 7 0 111 8 0 4612 9 0 115 10 0 4676.5 I stored these two tables as lists: >>> spot_cor[0:5] ['432\t117', '499\t631', 10\t0', '326\t83', '62\t197'] >>> spot_int[0:5] [' 0\t 0\t18.9', ' 1\t 0\t649.4', ' 10\t 0\t37.3', ' 3\t 0\t901.6', ' 4\t 0\t14.9'] I want to take each element from spot_cor and search in spot_int, if they match, I want to write all the three columns of spot_int. I did the following way to see what happens when I print element1 and element 2 as tab delim. text: code: >>> for ele1 in spot_cor: for ele2 in spot_int: if ele1 in ele2: print (ele1+'\t'+ele2) 432 117 432 117 17.3 432 117 7 432 117.9 432 117 554 432 117.7 499 631 499 631 23.1 12 185 12 185 19.6 12 185 112 185 42.6 12 185 212 185 26.3 12 185 312 185 111.9 12 185 412 185 193.1 12 185 512 185 21.9 12 185 612 185 22.0 326 83 169 326 83.7 62 197 62 197 18.9 The problem with this script is that it is printing all unwanted element of spot_int list. This is simply crap for me. I want to print the columns only if first two columns of both tables match. The simple reason here I asked it to see if 12 and 185 are contained in two columns and pythons tells me, yes they are present in 112 and 185 and this is a wrong result. Can you please suggest a better method for comparing these two elements and then printing the third column. thank you very much. Cheers K __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Please help matching elements from two lists and printing them
Hi, thank you very much for suggesting a way. In fact I tried and I found another way to do it. could you please suggest if something is wrong because I have false positive results in the output. That means I getting more that the values I have in spot_cor. For example I have 2500 elements in spot_cor list. I am searching each element if it is in spot_init. IF it is there then I am writing it to a file. What I expect is to get 2500 elements. However I am getting 500 elements extra. I do not understand how is this possible. Code: >>> out = open('sa_int_2.txt','w') >>> for ele1 in range(len(spot_cor)): x = spot_cor[ele1] for ele2 in range(len(spot_int)): cols = split(spot_int[ele2],'\t') y = (cols[0]+'\t'+cols[1]) if x == y: for ele3 in spot_int: if y in ele3: out.write(ele3) out.write('\n') On top of this this process is VERY SLOW on high end server too. I think its just the way it is to deal with string processing. As you asked I am all parsing out the pieces for a tab-delimitted text. I can get the values as CSV instead of tab delimitted. But what is the way using CSV to deal with this situation. thanks Kumar --- Bob Gailer <[EMAIL PROTECTED]> wrote: > At 02:51 PM 12/8/2004, kumar s wrote: > >Dear group, > > > > I have two tables: > > > >First table: spot_cor: > >432 117 > >499 631 > >10 0 > >326 83 > >62 197 > >0 0 > >37 551 > > > > > > > >Second table: spot_int > >0 0 98 > >1 0 5470 > >2 0 113 > >3 0 5240 > >4 0 82.5 > >5 0 92 > >6 0 5012 > >7 0 111 > >8 0 4612 > >9 0 115 > >10 0 4676.5 > > > > > > > >I stored these two tables as lists: > > > > >>> spot_cor[0:5] > >['432\t117', '499\t631', 10\t0', '326\t83', > '62\t197'] > > Note there is no ' before the 10. That won't fly' > > > >>> spot_int[0:5] > >[' 0\t 0\t18.9', ' 1\t 0\t649.4', ' 10\t > >0\t37.3', ' 3\t 0\t901.6', ' 4\t 0\t14.9'] > > It would be a lot easier to work with if the lists > looked like (assumes all > data are numeric): > [(432,117), (499,631), (10,0), (326,83), (62,197)] > [(0,0,18.9), (1,0,649.4), (10,0,37.3), (3,0,901.6), > (4,0,14.9)] > > What is the source for this data? Is it a > tab-delimited file? If so the CSV > module can help make this translation. > > I also assume that you want the first 2 elements of > a spot_int element to > match a spot_cor element. > > Then (for the subset of data you've provided): > > >>> for ele1 in spot_cor: > ... for ele2 in spot_int: > ... if ele1 == ele2[:2]: > ... print "%8s %8s %8s" % ele2 > ... >100 37.3 > > >I want to write all the three columns of spot_int. > >[snip] > > Hope that helps. > > Bob Gailer > [EMAIL PROTECTED] > 303 442 2625 home > 720 938 2625 cell > > __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] Difference between for i in range(len(object)) and for i in object
Dear group, My Tab delimited text looks like this: HG-U95Av2 32972_at432 117 HG-U95Av2 32972_at499 631 HG-U95Av2 32972_at12 185 HG-U95Av2 32972_at326 83 HG-U95Av2 32972_at62 197 I want to capture: columns 2 and 3 as tab delim. text: Here is my code: >>> spot_cor=[] >>> for m in cor: ... cols = split(cor,'\t') ... spot_cor.append(cols[2]+'\t'+cols[3]) ... ... Traceback (most recent call last): File "", line 2, in ? File "/usr/local/lib/python2.3/string.py", line 121, in split return s.split(sep, maxsplit) AttributeError: 'list' object has no attribute 'split' Here is 2nd way: >>> test_cor=[] >>> for m in cor: ... cols = split(cor,'\t') ... x = (cols[2]+'\t'+cols[3]) ... test_cor.append(x) ... Traceback (most recent call last): File "", line 2, in ? File "/usr/local/lib/python2.3/string.py", line 121, in split return s.split(sep, maxsplit) AttributeError: 'list' object has no attribute 'split' Here is my 3rd way of doing this thing: >>> for m in range(len(cor)): ... cols = split(cor[m],'\t') ... spot_cor.append(cols[2]+'\t'+cols[3]) ... >>> >>> len(spot_cor) 2252 >>> My question: Many people suggested me to avoid iteration over a object using (range(len)) its index and use instead 'Python's power' by using for i in object, instead. However, when I tried that using some data, as demonstrated above, I get error because append method does not work on list. In method 2, i tried to append an object instead of string elements. In both ways the execution failed because 'List object has no attribute split'. Can you help me making me clear about his dogma. Thank you. Kumar. --- Guillermo Fernandez Castellanos <[EMAIL PROTECTED]> wrote: > Cheers, > > I think your mistake is here: > if x == y: >for ele3 in spot_int: >if y in ele3: > > out.write(ele3) > > out.write('\n') > Each time you find an element that is the same > (x==y) you don't write > only y, you write *all* the elements that are in > spot_init instead > only the matching one! And it's not what you are > looking for! :-) > > I'll also change a bit your code to make it look > more "pythonic" :-) > > > for ele1 in spot_cor: > > for ele2 in spot_int: > > cols = split(ele2,'\t') > > y = (cols[0]+'\t'+cols[1]) > > if ele1 == y: > > for ele3 in spot_int: > > if y in ele3: > > > out.write(ele3) > > > out.write('\n') > > What changes I did: > > for ele1 in range(len(spot_cor)): >x = spot_cor[ele1] > > can be writen like: > for ele1 in spot_cor: > x = ele1 > > Furthermore, as you only use x once, I changed: > if x == y: > > with > if ele1 == y: > > and deleted the line: > x = ele1 > > I also don't understand why you do this: > cols = split(ele2,'\t') > y = (cols[0]+'\t'+cols[1]) > > It seems to me that you are separating something to > put it again > together. I don't really see why... > > Enjoy, > > Guille > __ Do you Yahoo!? Yahoo! Mail - now with 250MB free storage. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Difference between for i in range(len(object)) andfor i in object
Thank you for clearing up some mist here. In fact I was depressed by that e-mail because there are not many tutorials that clearly explains the issues that one faces while trying to code in python. Also, due to lack of people who are proficient in python around our univ. campus in baltimore, i am very much relying on tutors mailing list. I am poor enough to go to Mark Lutz's python training course(~ $1000 for 2 days and 3.5K for 5 days at a python bootcamp) and helpless to the fact that there is no one offering a python course on the campus. I am very much depended on this list and I cannot tell you people, how much I respect and appreciate the help from tutors. I cannot finish my Ph.D. thesis without tutors help and tutors will always be praised in my thesis acknowledgements. Thank you again for a supportive e-mail Mr.Gauld. P.S: My intention is not to hurt tutor's opinion and it is their right to express their opinion freely. kumar. --- Alan Gauld <[EMAIL PROTECTED]> wrote: > > Personally I am getting weary of a lot of requests > that to me seem > to come > > from a lack of understanding of Python.. > > To be fair that is what the tutor list is for - > learning Python. > > > Would you be willing to take a good tutorial so > you understand > > basic Python concepts and apply them to your code. > > But as a tutor author I do agree that I am often > tempted > (and sometimes succumb) to just point at the > relevant topic > in my tutorial. Particularly since the latest > version tries > to answer all of the most common questions asked > here, but > still they come up... > > > I also despair that you don't seem to benefit from > some of our > suggestions. > > And this too can be frustrating but sometimes it is > the case > that the "student" simply didn't fully appreciate > the > significance of what was offered. I'm feeling > generous tonight! > > :-) > > Alan G > Author of the Learn to Program web tutor > http://www.freenetpages.co.uk/hp/alan.gauld > > __ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com ___ Tutor maillist - [EMAIL PROTECTED] http://mail.python.org/mailman/listinfo/tutor
[Tutor] raw_input()
Dear group: I have a large file 3GB. Each line is a tab delim file. example lines of it: 585 chr1433 433 rs56289060 0 + - - -/C genomic insertion unknown 0 0 unknown between 1 585 chr1491 492 rs55998931 0 + C C C/T genomic single unknown 0 0 unknown exact 1 585 chr1518 519 rs62636508 0 + G G C/G genomic single unknown 0 0 unknown exact 1 585 chr1582 583 rs58108140 0 + G G A/G genomic single unknown 0 0 unknown exact 1 Now I dont want to load this entire file. I want to give each line as an input and print selective lines. For example: x1.py = second = raw_input() x = second.split('\t') y = x[1:] print '\t'.join(y) %cat mybigfile.rod | python x1.py chr1433 433 rs56289060 0 + - - -/C genomic insertion unknown 0 0 unknown between 1 My question: this program is only printing first line. It is not processing every line that cat spits to x1.py. how do I print every line. thanks Kumar. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] raw_input()
Here it worked after trying a while loop: x1.py = while True: second = raw_input() x = second.split('\t') y = x[1:] print '\t'.join(y) %cat mybigfile.rod | python x1.py Traceback (most recent call last): File "x1.py", line 2, in second = raw_input() EOFError: EOF when reading a line How to notify that at EOF break and suppress exception. thanks - Original Message From: kumar s To: tutor@python.org Sent: Mon, March 15, 2010 6:52:26 PM Subject: [Tutor] raw_input() Dear group: I have a large file 3GB. Each line is a tab delim file. example lines of it: 585 chr1433 433 rs56289060 0 + - - -/C genomic insertion unknown 0 0 unknown between 1 585 chr1491 492 rs55998931 0 + C C C/T genomic single unknown 0 0 unknown exact 1 585 chr1518 519 rs62636508 0 + G G C/G genomic single unknown 0 0 unknown exact 1 585 chr1582 583 rs58108140 0 + G G A/G genomic single unknown 0 0 unknown exact 1 Now I dont want to load this entire file. I want to give each line as an input and print selective lines. For example: x1.py = second = raw_input() x = second.split('\t') y = x[1:] print '\t'.join(y) %cat mybigfile.rod | python x1.py chr1433 433 rs56289060 0 + - - -/C genomic insertion unknown 0 0 unknown between 1 My question: this program is only printing first line. It is not processing every line that cat spits to x1.py. how do I print every line. thanks Kumar. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] raw_input()
thanks Benno. supplying 3.6 GB file is over-kill for the script. This is the reason I chose to input lines on fly. thanks Kumar - Original Message From: Benno Lang To: kumar s Cc: tutor@python.org Sent: Mon, March 15, 2010 7:19:24 PM Subject: Re: [Tutor] raw_input() On 16 March 2010 08:04, kumar s wrote: > %cat mybigfile.rod | python x1.py > Traceback (most recent call last): > File "x1.py", line 2, in >second = raw_input() > EOFError: EOF when reading a line > > How to notify that at EOF break and suppress exception. try: second = raw_input() except EOFError: # handle error in some way I would probably supply the file name as an argument rather than piping into stdin (or allow both methods), but that's up to you. HTH, benno ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] help with loops
Dear group: I need some tips/help from experts. I have two files tab-delimted. One file is 4K lines. The other files is 40K lines. I want to search contents of a file to other and print those lines that satisfy. File 1: chr X Y chr18337733 8337767 NM_001042682_cds_0_0_chr1_8337734_r 0 - RERE chr18338065 8338246 NM_001042682_cds_1_0_chr1_8338066_r 0 - RERE chr18338746 8338893 NM_001042682_cds_2_0_chr1_8338747_r 0 - RERE chr18340842 8341563 NM_001042682_cds_3_0_chr1_8340843_r 0 - RERE chr18342410 8342633 NM_001042682_cds_4_0_chr1_8342411_r 0 - RERE File 2: Chr X Y chr1871490 871491 chr1925085 925086 chr1980143 980144 chr11548655 1548656 chr11589675 1589676 chr11977853 1977854 chr13384899 3384900 chr13406309 3406310 chr13732274 3732275 I want to search if file 2 X is greater or less then X and Y and print line of file 2 and last column of file 1: for j in file2: col = j.split('\t') for k in file1: cols = k.split('\t') if col[1] > cols[1]: if col[1] < cols[2]: print j +'\t'+cols[6] This prints a lot of duplicate lines and is slow. Is there any other way I can make it fast. In file 1, how a dictionary can be made. I mean unique keys that are common to file 1 and 2. thanks Kumar. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] counting elements in list
Hi group: I have a list: k = ['T', 'C', 'T', 'T', 'T', 'T', 'T', 'T', 'T', 'C', 'T', 'T', 'T', 'C', 'T', 'T', 'T', 'C', 'C', 'T', 'T', 'T', 'C', 'T', 'T', 'T', 'T', 'T', 'T'] the allowed elements are A or T or G or C. List can have any number of A or T or G or C My aim is to get a string ouput with counts of each type A or T or G or C. A:0\tT:23\tG:0\tC:6 from the above example, I could count T and C and since there are no A and G, I want to print 0 for them. I just dont know how this can be done. >>> d = {} >>> for i in set(k): ... d[i] = k.count(i) ... >>> d {'C': 6, 'T': 23} >>> for keys,values in d.items(): ... print keys+'\t'+str(d[keys]) ... C 6 T 23 the other way i tried is: >>> k.count('A'),k.count('T'),k.count('G'),k.count('C') (0, 23, 0, 6) how can I get counts for those elements not represented in list and print them. appreciate your help. thanks kumar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] (no subject)
dear tutors: I have two files. I want to take coordiates of an row in fileA and find if they are in the range of coordinates in fileB. If they are, I want to be able to map else, pass. thanks kumar file a: name loc x y a 4 4081159640811620 b 4 4081161940811643 c 4 4081164940811673 d 4 4081173440811758 e 4 4081179740811821 f 4 4081181740811841 g 4 4081189540811919 h 4 4081193840811962 file b: zx zy z1 4 + 4081032340812000 z2 4 + 4081032340812000 z3 4 + 4081032340812000 z4 4 + 4081032340812000 z5 4 + 4081032340812000 z6 4 + 4081032340812000 z7 4 + 4081032340812000 z8 4 + 4081032340812000 I want to take coordiates x and y from each row in file a, and check if they are in range of zx and zy. If they are in range then I want to be able to write both matched rows in a tab delim single row. my code: f1 = open('fileA','r') f2 = open('fileB','r') da = f1.read().split('\n') dat = da[:-1] ba = f2.read().split('\n') bat = ba[:-1] for m in dat: col = m.split('\t') for j in bat: cols = j.split('\t') if col[1] == cols[1]: xc = int(cols[2]) yc = int(cols[3]) if int(col[2]) in xrange(xc,yc): if int(col[3]) in xrange(xc,yc): print m+'\t'+j output: a 4 4081159640811620z1 4 + 40810323 40812000 This code is too slow. Could you experts help me speed the script a lot faster. In each file I have over 50K rows and the script runs very slow. Please help. thanks Kumar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] How to substitute an element of a list as a pattern for re.compile()
Hi Group: I have Question: How can I substitute an object as a pattern in making a pattern. >>> x = 30 >>> pattern = re.compile(x) My situation: I have a list of numbers that I have to match in another list and write them to a new file: List 1: range_cors >>> range_cors[1:5] ['161:378', '334:3', '334:4', '65:436'] List 2: seq >>> seq[0:2] ['>probe:HG-U133A_2:1007_s_at:416:177; Interrogation_Position=3330; Antisense;', 'CACCCAGCTGGTCCTGTGGATGGGA'] A slow method: >>> sequences = [] >>> for elem1 in range_cors: for index,elem2 in enumerate(seq): if elem1 in elem2: sequences.append(elem2) sequences.append(seq[index+1]) This process is very slow and it is taking a lot of time. I am not happy. A faster method (probably): >>> for i in range(len(range_cors)): for index,m in enumerate(seq): pat = re.compile(i) if re.search(pat,seq[m]): p.append(seq[m]) p.append(seq[index+1]) I am getting errors, because I am trying to create an element as a pattern in re.compile(). Questions: 1. Is it possible to do this. If so, how can I do this. Can any one help correcting my piece of code and suggesting where I went wrong. Thank you in advance. -K __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] O.T.
30, Married, will soon be a dad., Live in Baltimore, U.S.A. and I am a Ph.D. student I lived in Denmark and Israel in the past as a part of my research life. Will finish my Ph.D., in Bioinformatics. Got introduced to computers at the age of 25 :-( and more happy it is not 52 :-) Programming lang: Python and R, Bioconductor, PHP (I have not mastered but WANT TO) DB: PostgreSQL I earn my bread by doing research and in a way I get paid for my interests in life. -K --- "Jacob S." <[EMAIL PROTECTED]> wrote: > I hate to sound weird... > > But who are you all, what are you're ages, what do > you do, marriage status, > etc? > You obviously don't have to answer, I'm just curious > who I'm boldly sending > emails to. > > Jacob Schmidt > > P.S. > I'm a student. 14 years. Play the piano better than > I write scripts. Single. > etc. > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Parsing a block of XML text
Dear group: I am trying to parse BLAST output (Basic Local Alignment Search Tool, size around more than 250 KB ). - 1 gi|43442325|emb|BX956931.1| DKFZp781D1095_r1 781 (synonym: hlcc4) Homo sapiens cDNA clone DKFZp781D1095 5', mRNA sequence. BX956931 693 - - 1 1164.13 587 0 1 587 107 693 1 1 587 587 587 GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA ||| - I wanted to parse out : I wrote a ver small 4 line code to obtain it. for bls in doc.getElementsByTagName('Hsp_num'): bls.normalize() if bls.firstChild.data >1: print bls.firstChild.data This is not sufficient for me to get anything doen. Could any one help me directing how to get the elements in that tag. Thanks. -K __ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Parsing a block of XML text
Dear group: I am trying to parse BLAST output (Basic Local Alignment Search Tool, size around more than 250 KB ). - 1 gi|43442325|emb|BX956931.1| DKFZp781D1095_r1 781 (synonym: hlcc4) Homo sapiens cDNA clone DKFZp781D1095 5', mRNA sequence. BX956931 693 - - 1 1164.13 587 0 1 587 107 693 1 1 587 587 587 GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA GGACCTCTCCAGAATCCGGATTGCTGAATCTTCCCTGTTGCCTAGAAGGGCTCCAAACCACCTCTTGACAATGGGAAACTGGGTGGTTAACCACTGGCAGGTTTCTGGTTGTTTGGTTAGGGCTGAATGCCTGTTTGTGGATGCCTTCCTGAAATATGAGAAGGCCGACAAATACTACTACACAAGATCCTTGGGTCAACATTGGCCTGTGCCCGAGCGTCTGCTCTCTGCTTGAAAACAGCACGCTGATCCTGCTTCCTGTGTGTCGCAATCTGCTGTCCTTCCTGACACCTGCTCAGCAGCCGCACACTGAGAAAGCAATTGGATCACAACCTCACCTTCCACAAGCTGGTGGCCTATATGATCTGCCTACATACAGCTATTCACATCATTGCACACCTGTTTAACTTTGACTGCTATAGCAGAAGCCGACAGGCCACAGATGGCTCCCTTGCCTCCATTCTCTCCAGCCTATCTCATGATGAGAGGTTCTTGGCTAAATCCCATCCAGTCCCGAAACACGACAGTGGAGTATGTGACATTCACCAGCA ||| - I wanted to parse out : I wrote a ver small 4 line code to obtain it. for bls in doc.getElementsByTagName('Hsp_num'): bls.normalize() if bls.firstChild.data >1: print bls.firstChild.data This is not sufficient for me to get anything doen. Could any one help me directing how to get the elements in that tag. Thanks. -K __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Parsing a block of XML text
AACATTAATTCATACCAGAGTGAATTTCTGCAAATGTGATGTGGGCAACTGCCCTTCAATCATCACTAAACATAAGAGAATTAATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGTCTTTAATTGGTCCTCACGCCTTACTACACATTTATACTAGATACAAACTCTACAAATGTGAAGAATGTGGCAAAGCAACAAGTCCTCAATCCTTACTACCCATAAGATAATTCGCACTGGAGAGAAATTCTACAAATGTAAAGAATGTGCCAAAGCAACCAATCCTCAAACCTTACTGAACATAAGTTCATCCTGGAGAGAAACCTTACAAATGTGAAGAATGTGGCAAAGCCTTTAACTGGCCCTCAACTCTTACTAAACATAAGAGAATTCATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGCAACCAGTTCTCAAACCTTACTACACATAAGAGAATCCATACTGCAGAGAAATTCTATAAATGTACAGAATGT-GGTGAAGC-AGCCGGTCCTCAAACCTTACTAAACAT-AAGTTCATACT--GGAAACCCTAC text node: Element node: Hsp_hseq text node:TGGATTTAACCAATGTTTGCCAGCTACCCAGAGCTATTTCTATTTGATAAATGTGTGAAAGCCTTTCATAAACAAATTCAAACAGACATAAGATAAGCCATACTGCCAAATGCAAAGAATGTGGCAAATCAGCATGCTTCCACATCTAGCTCAACATTAATTCATACCAGAGTGAATTTCTGCAAATGTGATGTGGGCAACTGCCCTTCAATCATCACTAAACATAAGAGAATTAATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGTCTTTAATTGGTCCTCACGCCTTACTACACATTTATACTAGATACAAACTCTACAAATGTGAAGAATGTGGCAAAGCAACAAGTCCTCAATCCTTACTACCCATAAGATAATTCGCACTGGAGAGAAATTCTACAAATGTAAAGAATGTGCCAAAGCAACCAATCCTCAAACCTTACTGAACATAAGTTCATCCTGGAGAGAAACCTTACAAATGTGAAGAATGTGGCAAAGCCTTTAACTGGCCCTCAACTCTTACTAAACATAAGAGAATTCATACTGGAGAGAAACCCTACACATGTGAAGAATGTGGCAAAGCCTTTAACCAGTTCTCAAACCTTACTACACATAAGAGAATCCATACTGCAGAGAAATTCTATAAATGTACAGAATGTGGGTGAAGCAACCCGGCCCTCAAACCTTACTAAACATTTCATACTTGAGAAACCCTAC text node: Element node: Hsp_midline text node:|| ||| | || || text node: text node: Element node: Hsp text node: Element node: Hsp_num text node:2 text node: --- Danny Yoo <[EMAIL PROTECTED]> wrote: > > > On Fri, 31 Dec 2004, kumar s wrote: > > > I am trying to parse BLAST output (Basic Local > Alignment Search Tool, > > size around more than 250 KB ). > > [xml text cut] > > > Hi Kumar, > > Just as a side note: have you looked at Biopython > yet? > > http://biopython.org/ > > I mention this because Biopython comes with parsers > for BLAST; it's > possible that you may not even need to touch XML > parsing if the BLAST > parsers in Biopython are sufficiently good. Other > people have already > solved the parsing problem for BLAST: you may be > able to take advantage of > that work. > > > > I wanted to parse out : > > > > > > > > > Ok, I see that you are trying to get the content of > the High Scoring Pair > (HSP) query and hit coordinates. > > > > > I wrote a ver small 4 line code to obtain it. > > > > for bls in doc.getElementsByTagName('Hsp_num'): > > bls.normalize() > > if bls.firstChild.data >1: > > print bls.firstChild.data > > This might not work. 'bls.firstChild.data' is a > string, not a number, so > the expression: > > bls.firstChild.data > 1 > > is most likely buggy. Here, try using this function > to get the text out > of an element: > > ### > def get_text(node): > """Returns the child text contents of the > node.""" > buffer = [] > for c in node.childNodes: > if c.nodeType == c.TEXT_NODE: > buffer.append(c.data) > return ''.join(buffer) > ### > > (code adapted from: > http://www.python.org/doc/lib/dom-example.html) > > > > For example: > > ### > >>> doc = > xml.dom.minidom.parseString("helloworld") > >>> for bnode in doc.getElementsByTagName('b'): > ... print "I see:", get_text(bnode) > ... > I see: hello > I see: world > ### > > > > > > Could any one help me directing how to get the > elements in that tag. > > One way to approach structured parsing problems > systematically is to write > a function for each particular element type that > you're trying to parse. > > From the sample XML that you've shown us, it appears > that your document > consists of a single 'Hit' root node. Each 'Hit' > appears to have a > 'Hit_hsps&
[Tutor] Something is wrong in file input output functions.
Dear group, I have written a small piece of code that takes a file and selects the columns that I am interested in and checks the value of the column on a condition (value that eqauls 25) and then write it the to another file. Code: import sys from string import split import string print "enter the file name" ### Takes the file name### psl = sys.stdin.readline() ### psl has the file object### f2 = sys.stdout.write("File name to write") def extCor(psl): ''' This function, splits the file and writes the desired columns to to another file only if the first column value equals 25.''' str_psl = psl.split('\n') str_psl = str_psl[5:] for ele in range(len(str_psl)): cols = split(str_psl[ele],'\t') des_cols = cols[0]+'\t'+cols[1]+'\t'+cols[8]+'\t'+cols[9]+'\t'+cols[11]+'\t'+cols[12]+'\t'+cols[13]+'\t'+cols[15]+'\t'+cols[16]+'\t'+cols[17]) if cols[0] == 25: '''This condition checks if the first column value == 25, then it writes it to the file, if not then it does not''' f2.write(des_cols) f2.write("\n") extCor(psl) Question: when i give it the file name that it should parse, I do not get to asked the file name i am interested in it gives me nothing. Please help me. Thanks K __ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] please help: conditional statement and printing element
Dear group, For some reason my brain cannot think of any other option than what I have in my script. Could any one please help me in suggesting. What I have : (File name : psl) 22 2 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 22 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 24 1 457:411 25 0 22 0 457:411 25 0 21 0 457:411 25 0 25 0 457:411 25 0 25 0 457:411 25 0 What to do: I want to print values that are 25 in column 1 and not the other values such as 24,22,21 etc. My script: >>> for i in range(len(psl)): col = split(psl[i],'\t') col1 = col[0] if col1 == 25: print col[0]+'\t'+col[1]+'\t'+col[17] >>> Result: I get nothing. Am I doing something very wrong. Why isnt if col1 == 25: functional. My idea is to check if col[0] == 25: then print columns 1,18 etc. Can you please help me. Thanks K __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] please help: conditional statement and printing element
Dear group: I think I have got the answer :-)( script: >>> for i in range(len(psl)): col = split(psl[i],'\t') col1 = col[0] col1 = int(col1) col17 = int(col[17]) if col1 == 25 and col17 == 1: print col[0]+ '\t'+col[1]+'\t'+col[9]+'\t'+col[10]+'\t'+col[11] 25 0 580:683 25 0 25 0 581:687 25 0 25 0 434:9 25 0 25 0 37:141 25 0 25 0 219:629 25 0 25 0 462:87 25 0 25 0 483:409 25 0 25 0 354:323 25 0 25 0 624:69 25 0 25 0 350:239 25 0 Is this a correct approach? Thanks K. --- kumar s <[EMAIL PROTECTED]> wrote: > Dear group, > For some reason my brain cannot think of any other > option than what I have in my script. Could any one > please help me in suggesting. > > What I have : (File name : psl) > 222 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 220 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > 241 457:411 25 0 > 220 457:411 25 0 > 210 457:411 25 0 > 250 457:411 25 0 > 250 457:411 25 0 > > > What to do: > I want to print values that are 25 in column 1 and > not > the other values such as 24,22,21 etc. > > > My script: > >>> for i in range(len(psl)): > col = split(psl[i],'\t') > col1 = col[0] > if col1 == 25: > print col[0]+'\t'+col[1]+'\t'+col[17] > > > >>> > > Result: I get nothing. Am I doing something very > wrong. Why isnt if col1 == 25: functional. > > My idea is to check if col[0] == 25: then print > columns 1,18 etc. > > Can you please help me. > > Thanks > K > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam > protection around > http://mail.yahoo.com > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] How to create a key-value pairs with alternative elements in a list ... please help.
Dear group, I am frustrated to ask some questions on topics that I thought covered well. my logic is not correct. I have a simple list: >>> a ['a', 'apple', 'b', 'boy', 'c', 'cat'] I want to create a dictionary: dict = {'a':'apple', 'b':'boy', 'c':'cat'} my way of doing this : >>> keys = [] # create a list of all keys i.e a,b,c) >>> vals = [] # create a list of all values i.e #appele,boy,cat etc. >>> dict = {} >>> dict = zip(keys,vals) Problem: How do i capture every alternative element in list a: I am unable to pump the a,b, and c into keys list and apple, boy,cat into vals list. Trial 1: >>> while i >= len(a): print a[i] i = i+2 -- I thought i+2 will give me alternative elements Trial 2: >>> for index,i in enumerate(range(len(a))): print a[i] print a[index+1] a apple apple b b boy boy c c cat cat Please help me. It is also time for me to refer my prev. notes :-( thanks K __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How to create a key-value pairs with alternative elements in a list ... please help.
Thanks for this trick. Can I call this as category thermo-NUKE of list functions. -K --- Jeff Shannon <[EMAIL PROTECTED]> wrote: > kumar s wrote: > > > Problem: > > How do i capture every alternative element in list > a: > > > > I am unable to pump the a,b, and c into keys list > > and apple, boy,cat into vals list. > > In a sufficiently recent version of Python, you > should be able to use > an extended slice with a stride -- > > keys = a[::2] > vals = a[1::2] > > (Note that this is untested, as I don't have a > recent version of > Python handy at the moment; I'm on 2.2 here, which > doesn't have > extended slices.) > > Jeff Shannon > Technician/Programmer > Credit International > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Regular expression re.search() object . Please help
Dear group: My list looks like this: List name = probe_pairs Name=AFFX-BioB-5_at Cell1=96369 N control AFFX-BioB-5_at Cell2=96370 N control AFFX-BioB-5_at Cell3=441 3 N control AFFX-BioB-5_at Cell4=441 4 N control AFFX-BioB-5_at Name=223473_at Cell1=307 87 N control 223473_at Cell2=307 88 N control 223473_at Cell3=367 84 N control 223473_at My Script: >>> name1 = '[N][a][m][e][=]' >>> for i in range(len(probe_pairs)): key = re.match(name1,probe_pairs[i]) key <_sre.SRE_Match object at 0x00E37A68> <_sre.SRE_Match object at 0x00E37AD8> <_sre.SRE_Match object at 0x00E37A68> <_sre.SRE_Match object at 0x00E37AD8> <_sre.SRE_Match object at 0x00E37A68> . (cont. 10K lines) Here it prints a bunch of reg.match objects. However when I say group() it prints only one object why? Alternatively: >>> for i in range(len(probe_pairs)): key = re.match(name1,probe_pairs[i]) key.group() 'Name=' 1. My aim: To remove those Name= lines from my probe_pairs list with name1 as the pattern, I asked using re.match() method to identify the lines and then remove by using re.sub(pat,'',string) method. I want to substitute Name=*** line by an empty string. After I get the reg.match object, I tried to remove that match object like this: >>> for i in range(len(probe_pairs)): key = re.match(name1,probe_pairs[i]) del key print probe_pairs[i] Name=AFFX-BioB-5_at Cell1=96369 N control AFFX-BioB-5_at Cell2=96370 N control AFFX-BioB-5_at Cell3=441 3 N control AFFX-BioB-5_at Result shows that that Name** line has not been deleted. Is the way I am doing a good one. Could you please suggest a good simple method. Thanks in advance K __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Regular expression re.search() object . Please help
Hello group: thank you for the suggestions. It worked for me using if not line.startswith('Name='): expression. I have been practising regular expression problems. I tumle over one simple thing always. After obtaining either a search object or a match object, I am unable to apply certain methods on these objects to get stuff. I have looked into many books including my favs( Larning python and Alan Gaulds Learn to program using python) I did not find the basic question, how can I get what I intend to do with returned reg.ex match object (search(), match()). For example: I have a simple list like the following: >>> seq ['>probe:HG-U133B:20_s_at:164:623; Interrogation_Position=6649; Antisense;', 'TCATGGCTGACAACCCATCTTGGGA'] Now I intend to extract particular pattern and write to another list say: desired[] What I want to extract: I want to extract 164:623: Which always comes after _at: and ends with ; 2. The second pattern/number I want to extract is 6649: This always comes after position=. How I want to put to desired[]: >>> desired ['>164:623|6649', 'TCATGGCTGACAACCCATCTTGGGA'] I write a pattern: pat = '[0-9]*[:][0-9]*' pat1 = '[_Position][=][0-9]*' >>> for line in seq: pat = '[0-9]*[:][0-9]*' pat1 = '[_Position][=][0-9]*' print (re.search(pat,line) and re.search(pat1,line)) <_sre.SRE_Match object at 0x163CAF00> None Now I know that I have a hit in the seq list evident by <_sre.SRE_Match object at 0x163CAF00>. Here is the black box: What kind of operations can I do on this to get those two matches: 164:623 and 6649. I read http://www.python.org/doc/2.2.3/lib/re-objects.html This did not help me to progress further. May I request tutors to give a small note explaining things. In Alan Gauld's book, most of the explanation stopped at <_sre.SRE_Match object at 0x163CAF00> this level. After that there is no example where he did some operations on these objects. If I am wrong, I might have skipped/missed to read it. Aplogies for that. Thank you very much in advance. K --- Liam Clarke <[EMAIL PROTECTED]> wrote: > ...as do I. > > openFile=file("probe_pairs.txt","r") > probe_pairs=openFile.readlines() > > openFile.close() > > indexesToRemove=[] > > for lineIndex in range(len(probe_pairs)): > >if > probe_pairs[lineIndex].startswith("Name="): > > indexesToRemove.append(lineIndex) > > for index in indexesToRemove: > probe_pairs[index]='"" > > Could just be > > openFile=file("probe_pairs.txt","r") > probe_pairs=openFile.readlines() > > openFile.close() > > indexesToRemove=[] > > for lineIndex in range(len(probe_pairs)): > >if > probe_pairs[lineIndex].startswith("Name="): > probe_pairs[lineIndex]='' > > > > > > On Fri, 14 Jan 2005 09:38:17 +1300, Liam Clarke > <[EMAIL PROTECTED]> wrote: > > > >>> name1 = '[N][a][m][e][=]' > > > >>> for i in range(len(probe_pairs)): > > > key = re.match(name1,probe_pairs[i]) > > > key > > > > > > <_sre.SRE_Match object at 0x00E37A68> > > > <_sre.SRE_Match object at 0x00E37AD8> > > > <_sre.SRE_Match object at 0x00E37A68> > > > <_sre.SRE_Match object at 0x00E37AD8> > > > <_sre.SRE_Match object at 0x00E37A68> > > > > > > You are overwriting key each time you iterate. > key.group() gives the > > matched characters in that object, not a group of > objects!!! > > > > You want > > > >>> name1 = '[N][a][m][e][=]' > > > >>> keys=[] > > > >>> for i in range(len(probe_pairs)): > > > key = re.match(name1,probe_pairs[i]) > > > keys.append[key] > > > > >>> print keys > > > > > 'Name=' > > > > > > 1. My aim: > > > To remove those Name= lines from my > probe_pairs > > > list > > > > Why are you deleting the object key? > > > > > >>> for i in range(len(probe_pairs)): > > > key = re.match(name1,probe_pairs[i]) > > > del key > > > print probe_pairs[i] > > > > Here's the easy way. Assuming that probe_pairs is > stored in a file callde > > probe_pairs.txt > > > > openFile=file("probe_pairs.txt","r") > > probe_pairs=openFile.readlines() > &
[Tutor] Faster procedure to filter two lists . Please help
Hi group: I have two lists a. 'my_report' and b. 'what'. In list 'what', I want to take 6649 (element1: 164:623\t6649) and write to a new list ( although I printed the result, my intension is to list.append(result). I took column 1 value of element 1 in what, which is 164:623 and checked in column 1 value in list my_report, if it matches I asked it to write the all columns of my_report along with column 2 value in what list. (with my explanation, I feel I made it complex). Here is what I did: >>> what[0:4] ['164:623\t6649', '484:11\t6687', '490:339\t6759', '247:57\t6880', '113:623\t6901'] >>>my_report[0:4] ['164:623\tTCATGGCTGACAACCCATCTTGGGA\t20_s_at', '484:11\tATTATCATCACATGCAGCTTCACGC\t20_s_at', '490:339\tGAATCCGCCAGAACACAGACA\t20_s_at', '247:57\tAGTCCTCGTGGAACTACAACTTCAT\t20_s_at', '113:623\tTCATGGGTGTTCGGCATGAAA\t20_s_at'] >>>for i in range(len(what)): ele = split(what[i],'\t') cor1 = ele[0] for k in range(len(my_report)): cols = split(my_report[k],'\t') cor = cols[0] if cor1 == cor: print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2] 164:623 6649TCATGGCTGACAACCCATCTTGGGA 484:11 6687ATTATCATCACATGCAGCTTCACGC 490:339 6759GAATCCGCCAGAACACAGACA 247:57 6880AGTCCTCGTGGAACTACAACTTCAT 113:623 6901TCATGGGTGTTCGGCATGAAA PROBLEM: This process is very very slow. I have 249502 elements in each list. The process has been running for over 30 min. Could any one suggest a better fast procedure, to save time. Thank you in advance. K __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Faster procedure to filter two lists . Please help
Hi Danny: Thank you for your suggestion. I tried creating a dictionary of 'what' list and searched keys with has_key method and it is pretty fast. Thanks again. following is the piece of code. K >>> cors = [] >>> intr = [] >>> for i in range(len(what)): ele = split(what[i],'\t') cors.append(ele[0]) intr.append(ele[1]) >>> what_dict = dict(zip(cors,intr)) >>> for i in range(len(my_report)): cols = split(my_report[i],'\t') cor = cols[0] if what_dict.has_key(cor): intr = what_dict[cor] my_final_report.append(cols[0]+'\t'+intr+'\t'+cols[1]+'\t'+cols[2]) --- Danny Yoo <[EMAIL PROTECTED]> wrote: > > > On Fri, 14 Jan 2005, kumar s wrote: > > > >>>for i in range(len(what)): > > ele = split(what[i],'\t') > > cor1 = ele[0] > > for k in range(len(my_report)): > > cols = split(my_report[k],'\t') > > cor = cols[0] > > if cor1 == cor: > > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2] > > > > Hi Kumar, > > > Ok, this calls for the use of an "associative map" > or "dictionary". > > > The main time sink is the loop here: > > > for k in range(len(my_report)): > > cols = split(my_report[k],'\t') > > cor = cols[0] > > if cor1 == cor: > > print cor+'\t'+ele[1]+'\t'+cols[1]+'\t'+cols[2] > > Conceptually, my_report can be considered a list of > key/value pairs. For > each element in 'my_report', the "key" is the first > column (cols[0]), and > the "value" is the rest of the columns (cols[1:]). > > > The loop above can, in a pessimistic world, require > a search across the > whole of 'my_report'. This can take time that is > proportional to the > length of 'my_report'. You mentioned earlier that > each list might be of > length 249502, so we're looking into a process whose > overall cost is > gigantic. > > > [Notes on calculating runtime cost: when the > structure of the code looks > like: > > for element1 in list1: > for element2 in list2: > some_operation_that_costs_K_time() > > then the overall cost of running this loop will be > > K * len(list1) * len(list2) > ] > > > We can do much better than this if we use a > "dictionary" data structure. A > "dictionary" can reduce the time it takes to do a > lookup search down from > a linear-time operation to an atomic-time one. Do > you know about > dictionaries yet? You can take a look at: > > http://www.ibiblio.org/obp/thinkCSpy/chap10.htm > > which will give an overview of a dictionary. It > doesn't explain why > dictionary lookup is fast, but we can talk about > that later if you want. > > > Please feel free to ask any questions about > dictionaries and their use. > Learning how to use a dictionary data structure is a > skill that pays back > extraordinarily well. > > > Good luck! > > __ Do you Yahoo!? Meet the all-new My Yahoo! - Try it today! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Selecting text
Dear group: I have two lists: 1. Lseq: >>> len(Lseq) 30673 >>> Lseq[20:25] ['NM_025164', 'NM_025164', 'NM_012384', 'NM_006380', 'NM_007032','NM_014332'] 2. refseq: >>> len(refseq) 1080945 >>> refseq[0:25] ['>gi|10047089|ref|NM_014332.1| Homo sapiens small muscle protein, X-linked (SMPX), mRNA', 'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGGCATCGGAATTGAGATCGCAGCT', 'CAGAGGACACCGGGCGTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT', 'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC', 'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAA', 'CCAGAAGGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC', 'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTGTGAA', 'CTTATGTAAAGCTGAACAGTAGTAGGAAGAAGGATTGATGTGAAGAAATAAAGAGGCA', 'GAAGATGGATTCAATAGCTCACTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG', 'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT', 'ATCTTCATGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGGAG', 'GAGGGATATGAATGGAGAATGATATGGCAATGTGCCTAACGAGATGGTTTCCCAAGCT', 'ACTTCCTACAGTAGGTCAATATTTGGAATGCGAGTTCTTCACCAAATTATGTCACTAA', 'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTTGTGA', '>gi|10047091|ref|NM_013259.1| Homo sapiens neuronal protein (NP25), mRNA', 'TGTGCTGCTATTGTGTGGATGCCGCGCGTGTCTTCTCTTCTTTCCAGAGATGGCTAACACCCGAGC', 'TATGGCTTAAGCCGAGAGGTGCAGGAGAAGATCGAGCAGAAGTATGATGCGGACCTGGAGAACAAGCTGG', 'TGGACTGGATCATCCTGCAGTGCGCCGAGGACATAGAGCACCCGCCGGCAGGGCCCACAGAA', 'ATGGTTAATGGACGGGACGGTCCTGTGCAAGCTGATAAATAGTTTATACCCACCAGGACAAGAGCCCATA', 'CCCAAGATCTCAGAGTCAAAGATGGCAAGCAGATGGAGCAAATCTCCCAGTTCCTGCTGCGG', 'AGACCTATGGTGTCAGAACCACCGACATCTTTCAGACGGTGGATCTATGGGAAGGGAAGGACATGGCAGC', 'TGTGCAGAGGACCCTGATGGCTTTAGGCAGCGTTGCAGTCACCAAGGATGATGGCTGCTATCAGAG', 'CCATCCTGGTTTCACAGGAAAGCCCAGCAGAATCGGAGAGGCCCGAGGAGCAGCTTCGCCAGGGAC', 'AGAACGTAATAGGCCTGCAGATGGGCAGCAACAAGGGAGCCTCCCAGGCGGGCATGACAGGGTACGGGAT', 'GCCCAGGCAGATCATGTTAGGACGCGGCATCCTGTGGTAGAGAGGACGAATGTTCCACACCATGGT'] If Lseq[i] is present in refseq[k], then I am interested in printing starting from refseq[k] until the element that starts with '>' sign. my Lseq has NM_014332 element and this is also present in second list refseq. I want to print starting from element where NM_014332 is present until next element that starts with '>' sign. In this case, it would be: '>gi|10047089|ref|NM_014332.1| Homo sapiens small muscle protein, X-linked (SMPX), mRNA', 'GTTCTCAATACCGGGAGAGGCACAGAGCTATTTCAGCCACATGGCATCGGAATTGAGATCGCAGCT', 'CAGAGGACACCGGGCGTTCCACCTTCCAAGGAGCTTTGTATTCTTGCATCTGGCTGCCTGGGACTT', 'CCCTTAGGCAGTAAACAAATACATAAAGCAGGGATAAGACTGCATGAATATGTCGAAACAGCCAGTTTCC', 'AATGTTAGAGCCATCCAGGCAAATATCAATATTCCAATGGGAGCCTTTCGGCCAGGAGCAGGTCAA', 'CCAGAAGGAATGTACTCCTGAAGTGGAGGAGGGTGTTCCTCCCACCTCGGATGAGGAGAAGAAGCC', 'AATTCCAGGAGCGAAGAAACTTCCAGGACCTGCAGTCAATCTATCGGAAATCCAGAATATTGTGAA', 'CTTATGTAAAGCTGAACAGTAGTAGGAAGAAGGATTGATGTGAAGAAATAAAGAGGCA', 'GAAGATGGATTCAATAGCTCACTATATATTTGTATGATGATTGTGAACCTCCTGAATGCCTG', 'AGACTCTAGCAGAAATGGCCTGTTTGTACATTTATATCTCTTCCTTCTAGTTGGCTGTATTTCTTACTTT', 'ATCTTCATGGCACCTCACAGAACAAATTAGCCCATAAATTCAACACCTGGAGGGTGTGGGAG', 'GAGGGATATGAATGGAGAATGATATGGCAATGTGCCTAACGAGATGGTTTCCCAAGCT', 'ACTTCCTACAGTAGGTCAATATTTGGAATGCGAGTTCTTCACCAAATTATGTCACTAA', 'ACTTTGTATGAGTTCAAATAAATATTTGACTAAATGTTGTGA' I could not think of any smart way to do this, although I have tried like this: >>> for ele1 in Lseq: for ele2 in refseq: if ele1 in ele2: k = ele2 s = refseq[ele2].startswith('>') print k,s Traceback (most recent call last): File "", line 5, in -toplevel- s = refseq[ele2].startswith('>') TypeError: list indices must be integers I do not know how to dictate to python to select lines between two > symbols. Could any one help me thanks. K __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Cluster algorithms
Hi: I am still trying to learn the OOPs side of python. however, things/circumstances dont seems to stop until I finish my practise and attaing higher understanding. may be, i am being pushed by circumstances into the stream and i am being tested if I can swim efficiently while I struggle with basic steps of swimming. The 100% analogy my perspective of learning python :-) I have a couple of questions to ask tutors: Are there any example programs depicting Clustering algorithms such as agglomerative, complete link, partional , squared error clustering, k-means or clustering algos based on Neural networks or genetic algorithm. although I just learned python, (to major extent in programming also), I need to apply some of these algos to my data. Any suggestions/recommendations? Do I have to know to code well using OOP methods to apply these algorithms? -Kumar __ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] files in a directory
Hello. I wrote a parser to parse spot intensities. The input to this parser i am giving one single file f1 = open('my_intensity_file.dml','r') int = f1.read().split('\n') my_vals = intParser(int) intParser return a list f2 = open('myvalues.txt','w') for line in my_vals: f2.write(line) f2.write('\n') f2.close() The problem with this approach is that, i have to give on file per a run. I have 50 files to pare and i want to do that in one GO. I kepy those 50 files in one directory. Can any one suggest an approach to automate this process. I tried to use f1 = stdin(...) it did not work. i dont know , possible is that i am using incorrect syntax. Any suggestions. Thank you. K __ Do you Yahoo!? All your favorites on one personal page Try My Yahoo! http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] files in a directory
Thank you Jay. It worked, I am V.V.happy. I tried Liam's suggestion also, but some weird things are going and I am not only getting results but also any error. I am working on that. Other thing. I a feeding my parser some coordinates specified by me, where I am asking the parser to extract the intensity values only for those coordinates. For exmple: Coordinates_file = open('xxx','r') def coOrs(coordinates_file): ... .. ## this parse extracts the my specified coordinates# ## and saves as a list for lookup in Intensity File## return my_coordinates_list def intPars er(Intensity File, my_coordinates_list): ... return intensities This above f(x) returns intensities and my coordinates. Now that I am reading many files at once, I wanted, to have a tab delim file op that looks like this: My_coors Int_file 1 Int_file2 IntFile3 01:26 34 235 245.45 04:42 342.4452.445.5 02:56 45.4 34.5 557.8 code: files = glob.glob("My_dir\*.ext") def parSer(file): f1 = open(file,'r') seelf = f1.read().split('\n') seelfile = seelf[24:506969] my_vals = intParser(seelfile,pbs) f2 = open(file+'.txt','w') for line in my_vals: f2.write(line+'\t') => asking for tab delim.. f2.write('\n') f2.close() def main(): for each in files: parSer(each) main() => putting here a '\t' did not work.. . Am i wrong here. Any suggestions, please. Thank you in advance. --- Jay Loden <[EMAIL PROTECTED]> wrote: > There's a few ways to accomplish this...the way that > comes to mind is: > > ## > import glob > > files = glob.glob("/path/to/director/*.dml") # > assuming you want only .dml > > def spot(file): > '''search for intensity spots and report them to > an output file''' > f1 = open('my_intensity_file.dml','r') > int = f1.read().split('\n') > > my_vals = intParser(int) > > intParser return a list > f2 = open('myvalues.txt','w') # you will want to > change this to output mult > for line in my_vals: # files, or to at least > append instead of overwriting > f2.write(line) > f2.write('\n') > f2.close() > > def main(): > for each in files: > spot(each) > > main() > > ## > > Basically, turn the parsing into a function, then > create a list of files, and > perform the parsing on each file. glob() lets you > grab a whole list of files > matching the wildcard just like if you typed "ls > *.dml" or whatever into a > command prompt. There wasn't too much info about > specifically how you needed > this to work, so this is a rough sketch of what you > want. Hopefully it helps. > > -Jay > > On Sunday 30 January 2005 03:03 am, kumar s wrote: > > Hello. > > > > I wrote a parser to parse spot intensities. The > input > > to this parser i am giving one single file > > > > f1 = open('my_intensity_file.dml','r') > > int = f1.read().split('\n') > > > > my_vals = intParser(int) > > > > intParser return a list > > f2 = open('myvalues.txt','w') > > for line in my_vals: > > f2.write(line) > > f2.write('\n') > > > > f2.close() > > > > > > The problem with this approach is that, i have to > give > > on file per a run. I have 50 files to pare and i > want > > to do that in one GO. I kepy those 50 files in > one > > directory. Can any one suggest an approach to > automate > > this process. > > > > I tried to use f1 = stdin(...) it did not work. i > dont > > know , possible is that i am using incorrect > syntax. > > > > Any suggestions. > > > > Thank you. > > K > > > > > > > > > > > > > > > > __ > > Do you Yahoo!? > > All your favorites on one personal page Try My > Yahoo! > > http://my.yahoo.com > > ___ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? Yahoo! Mail - Find what you need with new enhanced search. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] TypeError: can only concatenate list (not "str") to list
>nmr = nmrows[i] > pbr = cols[0] > print nmrow[i] +'\t'+cols[0] nmr = str(nmrows[i]) pbr = cols[0] print nmrow[i]+'\t'+cols[0] will print what you want. k --- Srinivas Iyyer <[EMAIL PROTECTED]> wrote: > Hello group, > I am trying to print rows from two lists together: > > how can i deal with TypeError' where i have to print > a > list and a string. > > for line in pb: # tab delim text with 12 columns > cols = line.split('\t') > temp_seq = cols[7].split('\n') # extract 7thcol > seq = temp_seq[0].split(',') #splitting it by , > for nm in seq: > for i in range(len(nmrows)): > if nm == nmrows[i][0] and nmrows[i][3] < cols[4] > and nmrows[i][4] > cols[5]: > nmr = nmrows[i] > pbr = cols[0] > print nmrow[i] +'\t'+cols[0] > > > > I tried the following also : > > I created an empty list outside for loop and tried > to > extend the elements of the list and string > > nmr = nmrows[i] > pbr = cols[0] > result.extend(nmr+'\t'+pbr) > > # result is the list i created. nmr is a list, and > pbr > is a string. > > can any one plaease help. > > thanks > Srini > > > > __ > Do you Yahoo!? > The all-new My Yahoo! - Get yours free! > http://my.yahoo.com > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > __ Do you Yahoo!? Yahoo! Mail - 250MB free storage. Do more. Manage less. http://info.mail.yahoo.com/mail_250 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Append function
Hello: In append function instead of appending one below the other can I append one next to other. I have a bunch of files where the first column is always the same. I want to collect all those files, extract the second columns by file wise and write the first column, followed by the other columns(extracted from files) next to each other. Any tricks , tips and hints. thanks K __ Do you Yahoo!? Yahoo! Mail - Helps protect you from nasty viruses. http://promotions.yahoo.com/new_mail ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Append function
Hi Danny: I have ~50 files in this format: File1: 680:209 3006.3 266:123 250.5 62:393 117.3 547:429 161.5 341:311 546.5 132:419 163.3 98:471 306.3 File 2: 266:123 168.0 62:393 119.3 547:429 131.0 341:311 162.3 132:419 149.5 98:471 85.0 289:215 207.0 75:553 517.0 I am generating these files using this module: f1 = open("test2_cor.txt","r") ana = f1.read().split('\n') ana = ana[:-1] pbs = [] for line in ana: cols = line.split('\t') pb = cols[0] pbs.append(pb) ##CEL Files section files = glob.glob("c:\files\*.cel") def parSer(file): f1 = open(file,'r') celf = f1.read().split('\n') celfile = celf[24:409624] my_vals = celParser(celfile,pbs) f2 = open(file+'.txt','w') for line in my_vals: f2.write(line+'\t') f2.write('\n') f2.close() def main(): for each in files: parSer(each) main() Because, I asked to write a file with the name of the file as output, it is generating 50 output files for 50 input files. What I am interested in is to append the output to one single file but with tab delimmitation. For example: for each file there are 2 columns. Cor and val file 1file 2file 3 file 4 cor val cor val cor val cor val x:x 1345 x:x 5434 x:x 4454 x:x 4462 x:y 3463 x:y 3435 x:y 3435 x:y 3435 Could you suggest a way. Thank you. __ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Append function
Hi Kent, Thank you for your suggestion. I keep getting IOError permission denied every time I try the tips that you provided. I tried to lookup on this error and did not get reasonable answer. Is this error something to do with Windows OS? Any suggestions. Thank you K >>> allColumns = [readColumns("C:\Documents and Settings\myfiles")for filePath in file_list] Traceback (most recent call last): File "", line 1, in -toplevel- allColumns = [readColumns("C:\Documents and Settings\myfiles")for filePath in file_list] File "", line 2, in readColumns rows = [line.split() for line in open(filePath)] IOError: [Errno 13] Permission denied: 'C:\\Documents and Settings\\myfiles' >>> > def readColumns(filePath): > rows = [ line.split() for line in > open(filePath) ] > return zip(*rows) > > # list of all the files to read > allFiles = [ 'f1.txt', 'f2.txt' ] > > # both columns from all files > allColumns = [ readColumns(filePath) for filePath in > allFiles ] > > # just the second column from all files > allSecondColumns = [ cols[1] for cols in allColumns > ] > > # a representative first column > col1 = allColumns[0][0] > > # zip it up into rows > allRows = zip(col1, *allSecondColumns) > > for row in allRows: > print '\t'.join(row) > > > Kent __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] please help formating
hi group, i have a data obtained from other student(over 100K) lines that looks like this: (39577484, 39577692) [['NM_003750']] (107906, 108011) [['NM_002443']] (113426, 113750) [['NM_138634', 'NM_002443']] (106886, 106991) [['NM_138634', 'NM_002443']] (100708, 100742) [['NM_138634', 'NM_002443']] (35055935, 35056061) [['NM_002313', 'NM_001003407', 'NM_001003408']] I know that first two items in () are tuples, and the next [[]] a list of list. I was told that the tuples were keys and the list was its value in a dictionary. how can I parse this into a neat structure that looks like this: 39577484, 39577692 \t NM_003750 107906, 108011 \t NM_002443 113426, 113750 \t NM_138634,NM_002443 106886, 106991 \t NM_138634,NM_002443 100708, 100742 \t NM_138634,NM_002443 35055935, 35056061 \t NM_002313,NM_001003407,NM_001003408 I treid substituting in vim editor but it is not effective. Thank you kum Pinpoint customers who are looking for what you sell. http://searchmarketing.yahoo.com/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] dealing with nested list values in a dictionary
Dear group, unfortunately my previous post got tagged as 'homework' mail and got no responses. In short, I have a dictionary structure as depicted below. I want to go over every key and print the key,value pairs in a more sensible way. I have written a small piece of code. May I request tutors to go through it and comment if it is correct or prone to bugs. Thank you. kum >>>md = {(21597133, 21597325): [['NM_032457']], (21399193, 21399334): [['NM_032456'], ['NM_002589']], (21397395, 21399192): [['NM_032457'], ['NM_032456'], ['NM_002589']], (21407733, 21408196): [['NM_002589']], (21401577, 21402315): [['NM_032456']], (21819453, 21820111): [['NM_032457']], (21399335, 21401576): [['NM_032457'], ['NM_032456'], ['NM_002589']]} >>> for item in md.keys(): mlst = [] for frnd in md[item]: for srnd in frnd: mlst.append(srnd) mystr = ','.join(mlst) print(('%d\t%d\t%s')%(item[0],item[1],mystr)) 2159713321597325NM_032457 2139919321399334NM_032456,NM_002589 2139739521399192NM_032457,NM_032456,NM_002589 2140773321408196NM_002589 2140157721402315NM_032456 2181945321820111NM_032457 2139933521401576NM_032457,NM_032456,NM_002589 Fussy? Opinionated? Impossible to please? Perfect. Join Yahoo!'s user panel and lay it on us. http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] looping problem
hi, the reason could be that I did not quite understand the concept of looping I have a list of 48 elements I want to create another two lists , listA and listB I want to loop through the list with 48 elements and select element with index 0,3,6,9,12 ..etc into listA select elements with index 2,5,8,11 etc into listB. Could any one help me how can I do that thankyou __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] looping problem
hi, thank you. this is not a homework question. I have a very huge file of fasta sequence. > GeneName \t AATTAAGGAA.. (1000 lines) AATAAGGA >GeneName \t GGAGAGAGATTAAGAA (15000 lines) when I read this as: f2= open('myfile','r') dat = f2.read().split('\n') turned out to be very expensive deal on computer. Instead I tried this: dat = f2.read() (reading into jumbo file of 19,100,442,1342 lines is easy but getting into what i want is a problem). I want to create a dictionary where 'GeneName' as key and sequence of ATGC characters as value biglist = dat.split('\t') ['GeneName ','','ATTAAGGCCAA'...] Now I want to select ''GeneName ' into listA and 'ATTAAGGCCAA' into listB so I want to select 0,3,6,9 elements into listA and 2,5,8,11 and so on elements into listB then I can do dict(zip(listA,listB)) however, the very loops concept is getting blanked out in my brain when I want to do this: for j in range(len(biglist)): from here .. I cannot think anything.. may be it is just mental block.. thats the reason I seek help on forum. Thanks --- jim stockford <[EMAIL PROTECTED]> wrote: > > keep a counter in your loop. is this a homework > question? > > On Sep 23, 2006, at 8:34 AM, kumar s wrote: > > > hi, > > > > the reason could be that I did not quite > understand > > the concept of looping > > > > I have a list of 48 elements > > > > I want to create another two lists , listA and > listB > > > > I want to loop through the list with 48 elements > and > > > > select element with index 0,3,6,9,12 ..etc into > listA > > > > select elements with index 2,5,8,11 etc into > listB. > > > > > > Could any one help me how can I do that > > > > thankyou > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > ___ > > Tutor maillist - Tutor@python.org > > http://mail.python.org/mailman/listinfo/tutor > > > > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] extracting numbers from a list
hi : I have a simple question to ask tutors: list A : a = [10,15,18,20,25,30,40] I want to print 10 15 (first two elements) 16 18 (16 is last number +1) 19 20 21 25 26 30 31 40 >>> fx = a[0] >>> fy = a[1] >>> b = a[2:] >>> ai = iter(b) >>> last = ai.next() >>> for j in ai: ... print fy+1,last ... last = j ... 16 18 16 20 16 25 16 30 can any one help please. thank you __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] extracting numbers from a list
In continuation to : Re: [Tutor] extracting numbers from a list hello list I have coordinates for exons (chunks of sequence). For instance: 10 - 50 A 10 - 20 B 35 - 50 B 60 - 70 A 60 - 70 B 80 - 100 A 80 - 100 B (The above coordinates and names are easier than in dat) Here my aim is to creat chunks of exons specific to A or B. For instance: 10 - 20,35 - 50 are common to both A and B, whereas 21 - 34 is specific only to A. The desired output for me is : 10 \t 20 A,B 21 \t 34 A 35 \t 50 A,B 60 \t 70 A,B 80 \t 100 A,B I just learned python frm a friend and he is also a novice. What I could get is the break up of chunks. A problem here I am getting number different from what I need: [10, 20] [10, 50] [21, 35] [10, 50] [36, 50] [10, 50] [60, 70] [60, 70] [80, 100] [80, 100] The list next to chunks is the pairs( the longer ones). could any one help me how can I correct [21, 35],[36, 50] to 21 \t 34 , 35 \t 50. I tried chaning the indexs in function chunker, it is not working for me. Also, how can I point chunks to their names. This is the abstract example of the complex numbers and their sequence names. I want to change the simple code and then go to the complex one. Thank you very much for your valuable time. REsult: what I am getting now: [10, 20] [10, 50] [21, 35] [10, 50] [36, 50] [10, 50] [60, 70] [60, 70] [80, 100] [80, 100] My code: from sets import Set dat = ['10\t50\tA', '10\t20\tB', '35\t50\tB', '60\t70\tA', '60\t70\tB', '80\t100\tA', '80\t100\tB'] # creating a dictionary with coordiates as key and NM_ as value # ekda = {} for j in dat: cols = j.split('\t') ekda.setdefault(cols[0]+'\t'+cols[1],[]).append(cols[2]) ## #getting tab delim numbers only and not the A,B bat = [] for j in dat: cols = j.split('\t') bat.append(cols[0]+'\t'+cols[1]) pairs = [ map(int, x.split('\t')) for x in bat ] # # this function takes pairs (from the above result)and longer blocks(exons). # For instance: # 10 - 20; 14 - 25; 19 - 30; 40 - 50; 45 - 60; 70 - 80 # a = [[10,20],[14,25],[19,30],[40,50],[45,60],[70,80]] # for j in exoner(a): # print j #The result would be: #10 - 30; 40 - 60; 70 - 80 # def exoner(pairs): pairs.sort() i = iter(pairs) last = i.next() for current in i: if current[0] in xrange(last[0],last[1]): if current[1] > last[1]: last = [last[0], current[1]] else: last = [last[0],last[1]] else: yield last last = current yield last lon = exoner(pairs) # ## Here I am getting all the unique numbers in dat nums = [] for j in pairs: for k in j: nums.append(k) unm = Set(nums) unums = [] for x in unm: unums.append(x) unums.sort() # ### This function takes a list of numbers and breaks it in pieces ## For instance [10,15,20,25,30] #>>> i = [10,15,20,25,30] #>>> chunker(i) #[[10, 15], [16, 20], [21, 25], [26, 30]] def chunker(lis): res = [] res.append([lis[0],lis[1]]) for m in range(2,len(lis)): res.append([lis[m-1]+1,lis[m]]) return res # Here I take each pair (longer block) and roll over all the unique numbers ((unums) from dat) and check if that number is in#the range of pair, if so, I will break all those set of number in pair range into small blocks ## gdic = {} unums.sort() for pair in exoner(pairs): x = pair[0] y = pair[1]+1 sml = [] for k in unums: if k in range(x,y): sml.append(k) else: pass for j in chunker(sml): print j,pair __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] extracting numbers from a list
Thank you Danny. I am going over your email and trying to understand (i am a biologist with bioinformatics training). I am not sure if I got your opinion about the way I solved. do you mean that there is something wrong with the way i solved it. I am not sure If I explained the problem correctly in terms of exons, transcripts. If not I would be happy to send you a pdf file with a figure. Thanks again. --- Danny Yoo <[EMAIL PROTECTED]> wrote: > > > On Mon, 16 Oct 2006, kumar s wrote: > > > I have a simple question to ask tutors: > > > > list A : > > > > a = [10,15,18,20,25,30,40] > > > Hi Kumar, > > If you're concerned about correctness, I'd recommend > that you try thinking > about the problem inductively. An inductive > definition for what you're > asking is straightforward to state in about three or > four lines of code. > I'll try to go through it slowly so you see what the > reasoning behind it > is. The code sketch above uses a technique that you > should already know > called "mathematical induction." > > > http://en.wikipedia.org/wiki/Mathematical_induction > > > Let's say we're designing a function called > getSpans(). Here are some > sample behavior we'd like from it: > > getSpans([10, 15]) = [(10, 15)] > getSpans([10, 15, 18]) = [(10, 15), (16, 18)] > getSpans([10, 15, 18, 20]) = [(10, 15), (16, > 18), (19, 20)] > > Would you agree that this is reasonable output for a > function like this? > getSpans() takes a list of numbers, and returns a > list of pairs of > numbers. > > > There is one "base" case to this problem. The > smallest list we'd like to > consider is a list of two elements. If we see that, > then we're happy, > because the answer is really simple: > > getSpans([a, b]) = [(a, b)] > > > Otherwise, let's imagine a list that's a bit longer, > with three elements. > Concretely, we know that this is going to look like: > > getSpans([a, b, c]) = [(a, b), (b+1, c)] > > But another way to say this, though is that: > > getSpans([a, b, c]) = [(a, b)] + getSpans([b+1, > c]) > > That is, we try to restate the problem in terms of > smaller subproblems. > > > > Let's look at what the case for four elements might > look like: > > getSpans([a, b, c, d]) = [(a, b), (b+1, c), > (c+1, d)] > > Concretely, we know that that's the list of spans > we'd like to see. But > if we think about it, we might also restate this as: > > getSpans([a, b, c, d]) = [a, b] + > getSpans([b+1, c, d]) > > because getSpans([b+1, c, d]) is going to give us: > > [(b+1, c), (c+1, d)] > > All we need to do is add on [(a, b)] to that to get > the complete answer to > getSpans([a, b, c, d]). > > > Generally, for any particular list L that's longer > than two elements: > > getExons(L) = [L[0:2]] + getExons([L[1] + 1] + > L[2:]) > > When we work inductively, all we really need to > think about is "base case" > and "inductive case": the solution will often just > fall through from > stating those two cases. An inductively-designed > function is going to > look something like: > > def solve(input): > if input looks like a base-case: > handle that directly in a base-case way > else: > break up the problem into smaller > pieces > that we assume can be solve()d by > induction > > The inductive definition above is slightly > inefficient because we're doing > physical list slicing. Rewriting it to use loops > and list indicies > instead of slicing is a little harder, but not much > harder. > > Another example: how do we add up a list of numbers? > If there's just one > number, that must be the sum. Otherwise, we can add > up the first number > to the sum of the rest of the numbers. > > # > def mysum(L): > if len(L) == 1: > return L[0] > else: > return L[0] + mysum(L[1:]) > # > > It's a funky way of doing this, but this is a real > definition that works > (modulo limits in Python's recursion > implementation). It's inefficient, > but it's easy to state and reason about. I'm > assuming you're more > interested in correctness than efficiency at the > moment. Get it correct > first, then if you really need to, work to get it > fast. > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] count numbers only in the string
Hi : I have some strings with both alpha-numeric strings. I want to add all the numbers in that string and leave characters and special characters. 1A0G19 5G0C25^C52 0G2T3T91 44^C70 How can I count only the numbers in the above. 1 A 0 G 19 = 1+0+19 = 20 5 G 0 C 25 ^C 52 = 5+0+25+52 = 82 0 G 2 T 3 T 91 = 0+2+3+91 = 96 44 ^C 70 = 44+70 = 114 In first string 1A0G19 I am only adding 1, 0, and 19. I am not splitting 19 to add 1+9 which will give totally wrong answer for me. Is there a way I can do this. Thanks for your advise. kumar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] substitute using re.sub
Hi group, I am trying to substitute in the following way and i cannot. Could you point out whats wrong in what i am doing. >>> z'.|D' >>> re.sub(z,'1',z)'111' I just want only '1' and not '111'. I want:>>> re.sub(z,'1',z)'1' re.sub is repeatedly inserting 3 times because z has .|D . How can I substitute only 1. ThanksKumar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor