Re: [Tutor] how to calculate execution time and complexity
Hi Praveen I am still new to the language but here is what I would do. Sorry I can't comment on how to best check for efficiency. my_str='google' split_by= 2 [ my_str[i:i+split_by] for i in range(0, len(my_str), split_by) ] Just using a list comprehension. best, -Abhi On Thu, Oct 27, 2011 at 10:38 PM, Praveen Singh wrote: > >>> splitWord('google', 2) > ['go', 'og', 'le'] > > > >>> splitWord('google', 3) > ['goo', 'gle'] > > > >>> splitWord('apple', 1) > ['a', 'p', 'p', 'l', 'e'] > > > >>> splitWord('apple', 4) > ['appl', 'e'] > > > > def splitWord(word, number): > length=len(word) > list1=[] > x=0 > increment=number > while number<=length+increment: > list1.append(word[x:number]) > x=x+increment > > > number=number+increment > > for d in list1: > if d=='': > list1.remove('') > return list1 > > I am getting the desired output and this code is working fine..but i think it > is quite bulky for this small operation. > > > qus.1-- can you guys suggest me some better solution?? > qus 2-- i know writing just a piece of code is not going to help me. i have > to write efficient code.i want to know how to calculate execution time of my > code and > > > can you guys suggest me some links so that i can learn how to find > complexity of code?? > > Thanks in advance... > > > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] ignoring certain lines while reading through CSV
Hi Guys I am wondering if there is a keyword to ignore certain lines ( for eg lines starting with # ) when I am reading them through stl module csv. Example code: input_file = sys.argv[1] csv.register_dialect('multiplex_info',delimiter=' ') with open(input_file, 'rb') as fh: reader= csv.reader(fh,'multiplex_info') for row in reader: print row best, -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] ignoring certain lines while reading through CSV
Hi Joel Here is a sample ['1', 'AAA', '4344', '0.001505'] : want to keep this one ['#', 'AAA', '4344', '0.001505'] : and throw this one You are right I am checking after parsing. I dint find an option in csv.reader to ignore lines. -Abhi On Fri, Jan 27, 2012 at 2:42 PM, Joel Goldstick wrote: > On Fri, Jan 27, 2012 at 5:13 PM, Abhishek Pratap > wrote: >> Hi Guys >> >> I am wondering if there is a keyword to ignore certain lines ( for eg >> lines starting with # ) when I am reading them through stl module csv. >> >> Example code: >> >> input_file = sys.argv[1] >> csv.register_dialect('multiplex_info',delimiter=' ') >> >> with open(input_file, 'rb') as fh: >> reader= csv.reader(fh,'multiplex_info') >> for row in reader: >> print row >> >> >> best, >> -Abhi >> ___ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor > > You could look up the docs for csv.reader, but if there isn't, in your > for loop you can use row[0].startswith('"#") to check if your line > starts with #. > Can you show what the row looks like? > > -- > Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] ignoring certain lines while reading through CSV
Thansk Joel. Thats exactly what I am doing. -A On Fri, Jan 27, 2012 at 3:04 PM, Joel Goldstick wrote: > On Fri, Jan 27, 2012 at 5:48 PM, Abhishek Pratap > wrote: >> Hi Joel >> >> Here is a sample >> >> ['1', 'AAA', '4344', '0.001505'] : want to keep this one >> >> ['#', 'AAA', '4344', '0.001505'] : and throw this one > > Ok, so you are getting single quotes around your data. So do > row[0].startswith("#") to test your row. > You may be able to test for row[0]=="#" if you always get only the # > in the first position of the row. >> >> >> You are right I am checking after parsing. I dint find an option in >> csv.reader to ignore lines. >> >> -Abhi >> >> >> >> >> >> On Fri, Jan 27, 2012 at 2:42 PM, Joel Goldstick >> wrote: >>> On Fri, Jan 27, 2012 at 5:13 PM, Abhishek Pratap >>> wrote: >>>> Hi Guys >>>> >>>> I am wondering if there is a keyword to ignore certain lines ( for eg >>>> lines starting with # ) when I am reading them through stl module csv. >>>> >>>> Example code: >>>> >>>> input_file = sys.argv[1] >>>> csv.register_dialect('multiplex_info',delimiter=' ') >>>> >>>> with open(input_file, 'rb') as fh: >>>> reader= csv.reader(fh,'multiplex_info') >>>> for row in reader: >>>> print row >>>> >>>> >>>> best, >>>> -Abhi >>>> ___ >>>> Tutor maillist - Tutor@python.org >>>> To unsubscribe or change subscription options: >>>> http://mail.python.org/mailman/listinfo/tutor >>> >>> You could look up the docs for csv.reader, but if there isn't, in your >>> for loop you can use row[0].startswith('"#") to check if your line >>> starts with #. >>> Can you show what the row looks like? >>> > -- > Joel Goldstick ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] creating dict of dict : similar to perl hash of hash
Hi Guys I am looking for a way to build dictionaries of dict in python. For example in perl I could do my $hash_ref = {}; $hash->{$a}->{$b}->{$c} = "value"; if (exists $hash->{$a}->{$b}->{$c} ){ print "found value"} Can I do something similar with dictionaries in Python. Thanks -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] inserting new lines in long strings while printing
I have this one big string in python which I want to print to a file inserting a new line after each 100 characters. Is there a slick way to do this without looping over the string. I am pretty sure there shud be something its just I am new to the lang. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] inserting new lines in long strings while printing
thanks guys .. -Abhi On Tue, Mar 6, 2012 at 5:41 PM, Steven D'Aprano wrote: > On Tue, Mar 06, 2012 at 05:26:26PM -0800, Abhishek Pratap wrote: > > I have this one big string in python which I want to print to a file > > inserting a new line after each 100 characters. Is there a slick way to > do > > this without looping over the string. I am pretty sure there shud be > > something its just I am new to the lang. > > >>> s = "a"*100 > >>> print '\n'.join(s[i:i+10] for i in range(0, len(s), 10)) > aa > aa > aa > aa > aa > aa > aa > aa > aa > aa > > > -- > Steven > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] feedback on writing pipelines in python
Hi Guys I am in the process of perl to python transition for good. I wanted to get some feedback or may be best practice for the following. 1. stitch pipelines : I want python to act as a glue allowing me to run various linux shell based programs. If needed wait for a program to finish and then move on, logs if required 2. run the same pipeline but on a local grid if required. (SGE flavor mainly) Any modules which can reduce the number of lines i write will be helpful. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] feedback on writing pipelines in python
Hi Steve I agree Steve perl is perfectly fine for the stuff I described but I am also interested trying alternatives. I am seeing quite interesting data handling stuff coming up in Python and I would like to try and sometimes as a programmer I dont like so many ways of doing the same things but it is subjective. Having many options can be good for some. -Abhi On Wed, Mar 21, 2012 at 11:20 AM, Steve Willoughby wrote: > On 21-Mar-12 11:03, Abhishek Pratap wrote: > >> Hi Guys >> >> I am in the process of perl to python transition for good. I wanted to >> > > Why? Perl is still a perfectly good tool. Just not, IMHO, good for > exactly the same things Python is good for. > > > 1. stitch pipelines : I want python to act as a glue allowing me to run >> various linux shell based programs. If needed wait for a program to >> finish and then move on, logs if required >> > > Look at the subprocess standard library module. It offers a complete set > of options for launching processes, piping their data aound, waiting for > them, handling exceptions, and so forth. > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] weird error in my python program : merge sort
I am imlpementing a merge sort algo for clarity purposes but my program is giving me weird answers. Sometimes it is able to sort and other times it does funky things. Help appreciated from random import * from numpy import * nums = [random.randint(100) for num in range(4)] #nums = [3,7,2,10] def merge_sort(nums, message='None'): #print "%s : num of elements in the list %d" % (message,len(nums)) print '[merge_sort] %s : %s' % ( message, nums) if len(nums) <= 1: return nums middle = len(nums)/2 print '[merge_sort] Mid point is %d' % middle left = nums[:middle] right = nums[middle:] merge_sort(left,'left') merge_sort(right,'right') print '[merge_sort] Calling merge on left: %s right : %s' % (left,right) result = merge(left,right) print '[merge_sort] %s' % result return result def merge(left,right): result = [] i,j = 0,0 print '[merge] left %s, right %s' % (left, right) while i < len(left) and j < len(right): print '[merge]Comparing left %d to right %d' % (left[i],right[j]) if left[i] <= right[j]: result.append(left[i]) i += 1 else: result.append(right[j]) j += 1 print '[merge]pushing to result',result result.extend(left[i:]) result.extend(right[j:]) print '[merge] return',result return result merge_sort(nums,'start') ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] weird error in my python program : merge sort : resolved
I was not updating the list from the recursive call. > merge_sort(left,'left') > merge_sort(right,'right') left = merge_sort(left,'left') right = merge_sort(right,'right') -A -Abhi On Thu, Mar 22, 2012 at 1:40 PM, Abhishek Pratap wrote: > I am imlpementing a merge sort algo for clarity purposes but my > program is giving me weird answers. Sometimes it is able to sort and > other times it does funky things. Help appreciated > > > from random import * > from numpy import * > > nums = [random.randint(100) for num in range(4)] > #nums = [3,7,2,10] > > def merge_sort(nums, message='None'): > #print "%s : num of elements in the list %d" % (message,len(nums)) > print '[merge_sort] %s : %s' % ( message, nums) > > if len(nums) <= 1: > return nums > > middle = len(nums)/2 > print '[merge_sort] Mid point is %d' % middle > left = nums[:middle] > right = nums[middle:] > > merge_sort(left,'left') > merge_sort(right,'right') > print '[merge_sort] Calling merge on left: %s right : %s' % (left,right) > result = merge(left,right) > print '[merge_sort] %s' % result > return result > > > def merge(left,right): > result = [] > i,j = 0,0 > > print '[merge] left %s, right %s' % (left, right) > > while i < len(left) and j < len(right): > print '[merge]Comparing left %d to right %d' % (left[i],right[j]) > if left[i] <= right[j]: > result.append(left[i]) > i += 1 > else: > result.append(right[j]) > j += 1 > > print '[merge]pushing to result',result > > result.extend(left[i:]) > result.extend(right[j:]) > print '[merge] return',result > return result > > > merge_sort(nums,'start') ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] concurrent file reading using python
Hi Guys I want to utilize the power of cores on my server and read big files (> 50Gb) simultaneously by seeking to N locations. Process each separate chunk and merge the output. Very similar to MapReduce concept. What I want to know is the best way to read a file concurrently. I have read about file-handle.seek(), os.lseek() but not sure if thats the way to go. Any used cases would be of help. PS: did find some links on stackoverflow but it was not clear to me if I found the right solution. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] concurrent file reading using python
Thanks Walter and Steven for the insight. I guess I will post my question to python main mailing list and see if people have anything to say. -Abhi On Mon, Mar 26, 2012 at 3:28 PM, Walter Prins wrote: > Abhi, > > On 26 March 2012 19:05, Abhishek Pratap wrote: >> I want to utilize the power of cores on my server and read big files >> (> 50Gb) simultaneously by seeking to N locations. Process each >> separate chunk and merge the output. Very similar to MapReduce >> concept. >> >> What I want to know is the best way to read a file concurrently. I >> have read about file-handle.seek(), os.lseek() but not sure if thats >> the way to go. Any used cases would be of help. > > Your idea won't work. Reading from disk is not a CPU-bound process, > it's an I/O bound process. Meaning, the speed by which you can read > from a conventional mechanical hard disk drive is not constrained by > how fast your CPU is, but generally by how fast your disk(s) can read > data from the disk surface, which is limited by the rotation speed and > areal density of the data on the disk (and the seek time), and by how > fast it can shovel the data down it's I/O bus. And *that* speed is > still orders of magnitude slower than your RAM and your CPU. So, in > reality even just one of your cores will spend the vast majority of > its time waiting for the disk when reading your 50GB file. There's > therefore __no__ way to make your file reading faster by increasing > your __CPU cores__ -- the only way is by improving your disk I/O > throughput. You can for example stripe several hard disks together in > RAID0 (but that increases the risk of data loss due to data being > spread over multiple drives) and/or ensure you use a faster I/O > subsystem (move to SATA3 if you're currently using SATA2 for example), > and/or use faster hard disks (use 10,000 or 15,000 RPM instead of > 7,200, or switch to SSD [solid state] disks.) Most of these options > will cost you a fair bit of money though, so consider these thoughts > in that light. > > Walter ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] generators
Hey Mike The following link should help you. http://www.dabeaz.com/generators/ . Cool slide deck with examples from David Beazley's explanation of generators. -A On Tue, Apr 3, 2012 at 11:38 AM, mike jackson wrote: > I am trying understand python and have done fairly well, So for it has been > easy to learn and is concise. However I seem to not quite understand the use > of a generator over a function(I am familiar with functions [other languages > and math]). To me (excepting obvious syntax differences) a generator is a > function. Why should I use a generator instead of a function or vice versa? > Is perhaps specfic uses it was created to handle? A great web page with good > examples would be nice. Of course if you can sum it up rather easy then by > all means go ahead. > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] updating step size while in loop
hey guys I want to know whether it is possible for dynamically update the step size in xrange or someother slick way. Here is what I am trying to do, if during a loop I find the x in list I want to skip next #n iterations. for x in xrange(start,stop,step): if x in list: step = 14 else: step = 1 Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] updating step size while in loop
Ok thanks Hugo. I have the while loop working -A On Mon, Jul 9, 2012 at 3:06 PM, Hugo Arts wrote: > On Mon, Jul 9, 2012 at 11:59 PM, Abhishek Pratap > wrote: >> >> hey guys >> >> I want to know whether it is possible for dynamically update the step >> size in xrange or someother slick way. >> >> Here is what I am trying to do, if during a loop I find the x in list >> I want to skip next #n iterations. >> >> >> for x in xrange(start,stop,step): >> if x in list: >> step = 14 >> else: >> step = 1 >> >> >> >> Thanks! >> -Abhi > > > It is not possible with a range object. You'll have to make a while loop and > keep track of the step yourself. > > Hugo ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Problem When Iterating Over Large Test Files
Hi Ryan One quick comment I dint get through all your code to figure out the fine details but my hunch is you might be having issues related to linux to dos EOF char. Could you check the total number of lines in your fastq# are same as read by a simple python file iterator. If not then it is mostly becoz readline is reading more than one line at a time. -Abhi On Wed, Jul 18, 2012 at 4:33 PM, Ryan Waples wrote: > I'm seeing some unexpected output when I use a script (included at > end) to iterate over large text files. I am unsure of the source of > the unexpected output and any help would be much appreciated. > > Background > Python v 2.7.1 > Windows 7 32bit > Reading and writing to an external USB hard drive > > Data files are ~4GB text (.fastq) file, it has been uncompressed > (gzip). This file has no errors or formatting problems, it seems to > have uncompressed just fine. 64M lines, each 'entry' is split across > 4 consecutive lines, 16M entries. > > My python script iterates over data files 4 lines at a time, selects > and writes groups of four lines to the output file. I will end up > selecting roughly 85% of the entries. > > In my output I am seeing lines that don't occur in the original file, > and that don't match any lines in the original file. The incidences > of badly formatted lines don't seem to match up with any patterns in > the data file, and occur across multiple different data files. > > I've included 20 consecutive lines of input and output. Each of these > 5 'records' should have been selected and printed to the output file. > But there is a problem with the 4th and 5th entries in the output, and > it no longer matches the input as expected. For example the line: > TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT > never occurs in the original data. > > Sorry for the large block of text below. > Other pertinent info, I've tried a related perl script, and ran into > similar issues, but not in the same places. > > Any help or insight would be appreciated. > > Thanks > > > __EXAMPLE RAW DATA FILE REGION__ > > @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0: > CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC > + > @@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@# > @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0: > TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA > + > @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?A @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0: > CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA > + > CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCA>AD>BA > @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0: > ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC > + > CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFF>CBAECBDDDC:??B=AAACD?8@:>C@?8CBDDD@D99B@>3884>A > @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0: > CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC > + > > > __EXAMPLE PROBLEMATIC OUTPUT FILE REGION__ > > @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0: > CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC > + > @@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@# > @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0: > TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA > + > @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?A @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0: > CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA > + > CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCA>AD>BA > TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT > + > BCCFFDFFFIJIJJHIFGGGGIGGIJIJIGIGIGIGHHIGIIJGJJJIIJIIEHIHHHFFFB@>CCE@BEDCDDAC?CC?ACC??>ADDD > @HWI-ST0747:167:B02DEACXX:8:1304:19473:44548 1:N:0: > CTACAGTGCAGGCACCCGGCCCGCCACAATGAGTCGCTAGAGCGCAATGAGACAAGTAAAGCTGACCAAACCCTTAACCCGGACGATGCTGGG > + > BCCFHIJEHJJIIGIJIGIJIDHDGIGIGGED@CCDDC>C>BBD?BDBAABDDD@BCD@?@BDBDDDBDCCC2 > > > > > __PYTHON CODE __ > > > import glob > > my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq') > > for each in my_in_files: > #print(each) > out = each.replace('/gzip', '/rem_clusters2' ) > #print (out) > INFILE = open (each, 'r') > OUTFILE = open (out , 'w') > > # Tracking Variables > Reads = 0 > Writes = 0 > Check_For_End_Of_File = 0 > > #Updates > print ("Reading File: " + each) > print ("Writing File: " + out) > > # Read FASTQ File by group of four lines > while Check
[Tutor] *large* python dictionary with persistence storage for quick look-ups
Hey Guys I have asked a question on stackoverflow and thought I would post it here as it has some learning flavor attached to it...atleast I feel that. http://stackoverflow.com/questions/11837229/large-python-dictionary-with-persistence-storage-for-quick-look-ups -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] using multiprocessing efficiently to process large data file
Hi Guys I have a with few million lines. I want to process each block of 8 lines and from my estimate my job is not IO bound. In other words it takes a lot more time to do the computation than it would take for simply reading the file. I am wondering how can I go about reading data from this at a faster pace and then farm out the jobs to worker function using multiprocessing module. I can think of two ways. 1. split the split and read it in parallel(dint work well for me ) primarily because I dont know how to read a file in parallel efficiently. 2. keep reading the file sequentially into a buffer of some size and farm out a chunks of the data through multiprocessing. Any example would be of great help. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] managing memory large dictionaries in python
Hi Guys For my problem I need to store 400-800 million 20 characters keys in a dictionary and do counting. This data structure takes about 60-100 Gb of RAM. I am wondering if there are slick ways to map the dictionary to a file on disk and not store it in memory but still access it as dictionary object. Speed is not the main concern in this problem and persistence is not needed as the counting will only be done once on the data. We want the script to run on smaller memory machines if possible. I did think about databases for this but intuitively it looks like a overkill coz for each key you have to first check whether it is already present and increase the count by 1 and if not then insert the key into dbase. Just want to take your opinion on this. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] managing memory large dictionaries in python
On Tue, Oct 16, 2012 at 7:22 PM, Alexander wrote: > On Tue, Oct 16, 2012 at 20:43 EST, Mark Lawrence > wrote: >> For the record Access is not a database, or so some geezer called Alex >> Martelli reckons http://code.activestate.com/lists/python-list/48130/, so >> please don't shoot the messenger:) >> Cheers. >> Mark Lawrence. > > Mark I don't believe your response is relevant or helpful to the > original post so please don't hijack. > > > -- > 7D9C597B > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor Thanks guys..I think I will try shelve and sqlite. I dont think creating millions of files (one for each key) will make the sys admins/file system happy. Best, -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] learning to program in cython
Hi Guys With the help of an awesome python community I have been able to pick up the language and now willing to explore other cool extensions of it. I routinely have large loops which could be ported to cython for speed. However I have never written a single line of cython code. Any pointers on getting started. A tutorial text or video would be of great help. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] increment a counter inside generator
Hey Guys I might be missing something obvious here. import numpy as np count = 0 [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20] File "", line 2 [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20] ^ SyntaxError: invalid syntax Also tried ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] increment a counter inside generator
On Wed, Mar 13, 2013 at 2:02 PM, Oscar Benjamin wrote: > On 13 March 2013 19:50, Abhishek Pratap wrote: >> Hey Guys >> >> I might be missing something obvious here. >> >> >> import numpy as np >> >> count = 0 >> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20] >> >> File "", line 2 >> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20] >> ^ >> SyntaxError: invalid syntax > > I think this does what you want: > >>>> import numpy as np >>>> a = np.random.random_integers(1, 100, 20) >>>> (a > 20).sum() > 17 > > I don't know if this really applies to what you're doing but the > result of this computation is a binomially distributed random number > that you could generate directly (without creating the intermediate > array): > >>>> np.random.binomial(100, .2) > 26 > > > Oscar Hi Oscar I just used a very contrived example to ask if we can increment a counter inside a generator. The real case is more specific and dependent on other code and not necessarily useful for the question. -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] increment a counter inside generator
On Wed, Mar 13, 2013 at 2:08 PM, Dave Angel wrote: > On 03/13/2013 03:50 PM, Abhishek Pratap wrote: >> >> Hey Guys >> >> I might be missing something obvious here. >> >> >> import numpy as np >> >> count = 0 >> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20] >> >> File "", line 2 >> [ count += 1 for num in np.random.random_integers(1,100,20) if num > >> 20] >> ^ >> SyntaxError: invalid syntax >> >> >> Also tried >> > > > I can't help with the numpy portion of that, but that's not the correct > syntax for a list comprehension. The first item must be an expression, and > count+=1 is NOT. > > You probably want (untested) > count = sum([ 1 for num in ..]) > > which will add a bunch of ones. That will probably give you a count of how > many of the random integers are > 20. > > There also may very well be a function in numpy that would do it in one > step. See Oscar's message. > > -- > DaveA > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor Thanks Dave. That probably is the reason why I am getting the error. -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] help with itertools.izip_longest
Hey Guys I am trying to use itertools.izip_longest to read a large file in chunks based on the examples I was able to find on the web. However I am not able to understand the behaviour of the following python code. (contrived form of example) for x in itertools.izip_longest(*[iter([1,2,3])]*2): print x ###output: (1, 2) (3, None) It gives me the right answer but I am not sure how it is doing it. I also referred to the itertools doc but could not comprehend much. In essence I am trying to understand the intracacies of the following documentation from the itertools package. "The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using izip(*[iter(s)]*n)." How is *n able to group the data and the meaning of '*' in the beginning just after izip. Thanks! -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] help with itertools.izip_longest
On Sat, Mar 16, 2013 at 2:32 PM, Oscar Benjamin wrote: > On 16 March 2013 21:14, Abhishek Pratap wrote: >> Hey Guys >> >> I am trying to use itertools.izip_longest to read a large file in >> chunks based on the examples I was able to find on the web. However I >> am not able to understand the behaviour of the following python code. >> (contrived form of example) >> >> for x in itertools.izip_longest(*[iter([1,2,3])]*2): >> print x >> >> >> ###output: >> (1, 2) >> (3, None) >> >> >> It gives me the right answer but I am not sure how it is doing it. I >> also referred to the itertools doc but could not comprehend much. In >> essence I am trying to understand the intracacies of the following >> documentation from the itertools package. >> >> "The left-to-right evaluation order of the iterables is guaranteed. >> This makes possible an idiom for clustering a data series into >> n-length groups using izip(*[iter(s)]*n)." >> >> How is *n able to group the data and the meaning of '*' in the >> beginning just after izip. > > The '*n' part is to multiply the list so that it repeats. This works > for most sequence types in Python: > >>>> a = [1,2,3] >>>> a * 2 > [1, 2, 3, 1, 2, 3] > > In this particular case we multiply a list containing only one item, > the iterator over s. This means that the new list contains the same > element twice: >>>> it = iter(a) >>>> [it] > [] >>>> [it] * 2 > [, ] > > So if every element of the list is the same iterator, then we can call > next() on any of them to get the same values in the same order: >>>> d = [it]*2 >>>> d > [, ] >>>> next(d[1]) > 1 >>>> next(d[0]) > 2 >>>> next(d[0]) > 3 >>>> next(d[0]) > Traceback (most recent call last): > File "", line 1, in > StopIteration >>>> next(d[1]) > Traceback (most recent call last): > File "", line 1, in > StopIteration > > The * just after izip is for argument unpacking. This allows you to > call a function with arguments unpacked from a list: > >>>> def f(x, y): > ... print('x is %s' % x) > ... print('y is %s' % y) > ... >>>> f(1, 2) > x is 1 > y is 2 >>>> args = [1,2] >>>> f(args) > Traceback (most recent call last): > File "", line 1, in > TypeError: f() takes exactly 2 arguments (1 given) >>>> f(*args) > x is 1 > y is 2 > > So the original expression, izip(*[iter(s)]*2), is another way of writing > > it = iter(s) > izip(it, it) > > And izip(*[iter(s)]*10) is equivalent to > > izip(it, it, it, it, it, it, it, it, it, it) > > Obviously writing it out like this will get a bit unwieldy if we want > to do izip(*[iter(s)]*100) so the preferred method is > izip(*[iter(s)]*n) which also allows us to choose what value to give > for n without changing anything else in the code. > > > Oscar Thanks a bunch Oscar. This is why I love this community. It is absolutely clear now. It is funny I am getting the solution over the mailing list while I am at pycon :) best, -Abhi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] help with itertools.izip_longest
On Sat, Mar 16, 2013 at 2:53 PM, Peter Otten <__pete...@web.de> wrote: > Abhishek Pratap wrote: > >> I am trying to use itertools.izip_longest to read a large file in >> chunks based on the examples I was able to find on the web. However I >> am not able to understand the behaviour of the following python code. >> (contrived form of example) >> >> >> >> for x in itertools.izip_longest(*[iter([1,2,3])]*2): >> print x >> >> >> ###output: >> (1, 2) >> (3, None) >> >> >> It gives me the right answer but I am not sure how it is doing it. I >> also referred to the itertools doc but could not comprehend much. In >> essence I am trying to understand the intracacies of the following >> documentation from the itertools package. >> >> "The left-to-right evaluation order of the iterables is guaranteed. >> This makes possible an idiom for clustering a data series into >> n-length groups using izip(*[iter(s)]*n)." >> >> How is *n able to group the data and the meaning of '*' in the >> beginning just after izip. > > Break the expression into smaller chunks: > > items = [1, 2, 3] > it = iter(items) > args = [it] * 2 # same as [it, it] > chunks = itertools.izip_longest(*args) # same as izip_longest(it, it) > > As a consequence of passing the same iterator twice getting the first item > from the "first" iterator will advance the "second" iterator (which is > actually the same as the first iterator) to the second item which will in > turn advance the "first" iterator to the third item. Try to understand the > implementation given for izip() at > Thanks Peter. I guess I missed the trick on how each iterator will be moved ahead automatically as the are basically same, replicated N times. -Abhi > http://docs.python.org/2/library/itertools.html#itertools.izip > > before you proceed to izip_longest(). > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python debugger/IDE that can be launched from a remote command line
On Fri, May 10, 2013 at 10:58 AM, Michael O'Leary wrote: > I am working on a project in which the code and data I am working with are > all on an Amazon EC2 machine. So far I have been ssh'ing to the EC2 machine > in two terminal windows, running emacs or vi in one of them to view and > update the code and running the "python -m pdb ..." debugger in the other > one to step through the code. > > I would prefer to work with an IDE that displays and updates program state > automatically, but I don't know which ones I could launch from a remote > machine and have it display within a terminal window or use XWindows or GTK > to display in its own window. Are there any Python debuggers or IDEs that > can be used in this kind of setting? > Thanks, > Mike > > I think IPython could be a useful here. Kick start a IPython notebook on Amazon machine and open it over https locally. More information on this here http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html -Abhi > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor