Re: [Tutor] how to calculate execution time and complexity

2011-10-27 Thread Abhishek Pratap
Hi Praveen

I am still new to the language  but here is what I would do. Sorry I can't
comment on how to best check for efficiency.

my_str='google'
split_by= 2
[ my_str[i:i+split_by]  for i in range(0, len(my_str), split_by) ]

Just using a list comprehension.

best,
-Abhi


On Thu, Oct 27, 2011 at 10:38 PM, Praveen Singh wrote:

> >>> splitWord('google', 2)
> ['go', 'og', 'le']
>
>
> >>> splitWord('google', 3)
> ['goo', 'gle']
>
>
> >>> splitWord('apple', 1)
> ['a', 'p', 'p', 'l', 'e']
>
>
> >>> splitWord('apple', 4)
> ['appl', 'e']
>
>
>
> def splitWord(word, number):
>   length=len(word)
>   list1=[]
>   x=0
>   increment=number
>   while number<=length+increment:
>   list1.append(word[x:number])
>   x=x+increment
>
>
>   number=number+increment
>
>   for d in list1:
>   if d=='':
>   list1.remove('')
>   return list1
>
> I am getting the desired output and this code is working fine..but i think it 
> is quite bulky for this small operation.
>
>
> qus.1-- can you guys suggest me some better solution??
> qus 2-- i know writing just a piece of code is not going to help me. i have 
> to write efficient code.i want to know how to calculate execution time of my 
> code and
>
>
> can you guys suggest me some links so that i can learn how to find 
> complexity of code??
>
> Thanks in advance...
>
>
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] ignoring certain lines while reading through CSV

2012-01-27 Thread Abhishek Pratap
Hi Guys

I am wondering if there is a keyword to ignore certain lines ( for eg
lines starting with # ) when I am reading them through stl module csv.

Example code:

input_file = sys.argv[1]
csv.register_dialect('multiplex_info',delimiter=' ')

with open(input_file, 'rb') as fh:
reader= csv.reader(fh,'multiplex_info')
for row in reader:
print row


best,
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] ignoring certain lines while reading through CSV

2012-01-27 Thread Abhishek Pratap
Hi Joel

Here is a sample

['1', 'AAA', '4344', '0.001505'] : want to keep this one

['#', 'AAA', '4344', '0.001505'] : and throw this one


You are right I am checking after parsing. I dint find an option in
csv.reader to ignore lines.

-Abhi





On Fri, Jan 27, 2012 at 2:42 PM, Joel Goldstick
 wrote:
> On Fri, Jan 27, 2012 at 5:13 PM, Abhishek Pratap  
> wrote:
>> Hi Guys
>>
>> I am wondering if there is a keyword to ignore certain lines ( for eg
>> lines starting with # ) when I am reading them through stl module csv.
>>
>> Example code:
>>
>> input_file = sys.argv[1]
>> csv.register_dialect('multiplex_info',delimiter=' ')
>>
>> with open(input_file, 'rb') as fh:
>>    reader= csv.reader(fh,'multiplex_info')
>>    for row in reader:
>>        print row
>>
>>
>> best,
>> -Abhi
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>
> You could look up the docs for csv.reader, but if there isn't, in your
> for loop you can use row[0].startswith('"#")  to check if your line
> starts with #.
> Can you show what the row looks like?
>
> --
> Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] ignoring certain lines while reading through CSV

2012-01-27 Thread Abhishek Pratap
Thansk Joel. Thats exactly what I am doing.

-A

On Fri, Jan 27, 2012 at 3:04 PM, Joel Goldstick
 wrote:
> On Fri, Jan 27, 2012 at 5:48 PM, Abhishek Pratap  
> wrote:
>> Hi Joel
>>
>> Here is a sample
>>
>> ['1', 'AAA', '4344', '0.001505'] : want to keep this one
>>
>> ['#', 'AAA', '4344', '0.001505'] : and throw this one
>
> Ok, so you are getting single quotes around your data.  So do
> row[0].startswith("#") to test your row.
> You may be able to test for row[0]=="#" if you always get only the #
> in the first position of the row.
>>
>>
>> You are right I am checking after parsing. I dint find an option in
>> csv.reader to ignore lines.
>>
>> -Abhi
>>
>>
>>
>>
>>
>> On Fri, Jan 27, 2012 at 2:42 PM, Joel Goldstick
>>  wrote:
>>> On Fri, Jan 27, 2012 at 5:13 PM, Abhishek Pratap  
>>> wrote:
>>>> Hi Guys
>>>>
>>>> I am wondering if there is a keyword to ignore certain lines ( for eg
>>>> lines starting with # ) when I am reading them through stl module csv.
>>>>
>>>> Example code:
>>>>
>>>> input_file = sys.argv[1]
>>>> csv.register_dialect('multiplex_info',delimiter=' ')
>>>>
>>>> with open(input_file, 'rb') as fh:
>>>>    reader= csv.reader(fh,'multiplex_info')
>>>>    for row in reader:
>>>>        print row
>>>>
>>>>
>>>> best,
>>>> -Abhi
>>>> ___
>>>> Tutor maillist  -  Tutor@python.org
>>>> To unsubscribe or change subscription options:
>>>> http://mail.python.org/mailman/listinfo/tutor
>>>
>>> You could look up the docs for csv.reader, but if there isn't, in your
>>> for loop you can use row[0].startswith('"#")  to check if your line
>>> starts with #.
>>> Can you show what the row looks like?
>>>
> --
> Joel Goldstick
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] creating dict of dict : similar to perl hash of hash

2012-03-06 Thread Abhishek Pratap
Hi Guys

I am looking for a way to build dictionaries of dict in python.

For example in perl I could do

my $hash_ref = {};
$hash->{$a}->{$b}->{$c} = "value";
if (exists $hash->{$a}->{$b}->{$c} ){ print "found value"}

Can I do something similar with dictionaries in Python.


Thanks
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] inserting new lines in long strings while printing

2012-03-06 Thread Abhishek Pratap
I have this one big string in python which I want to print to a file
inserting a new line after each 100 characters. Is there a slick way to do
this without looping over the string.  I am pretty sure there shud be
something its just I am new to the lang.


Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] inserting new lines in long strings while printing

2012-03-06 Thread Abhishek Pratap
thanks guys ..


-Abhi

On Tue, Mar 6, 2012 at 5:41 PM, Steven D'Aprano  wrote:

> On Tue, Mar 06, 2012 at 05:26:26PM -0800, Abhishek Pratap wrote:
> > I have this one big string in python which I want to print to a file
> > inserting a new line after each 100 characters. Is there a slick way to
> do
> > this without looping over the string.  I am pretty sure there shud be
> > something its just I am new to the lang.
>
> >>> s = "a"*100
> >>> print '\n'.join(s[i:i+10] for i in range(0, len(s), 10))
> aa
> aa
> aa
> aa
> aa
> aa
> aa
> aa
> aa
> aa
>
>
> --
> Steven
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] feedback on writing pipelines in python

2012-03-21 Thread Abhishek Pratap
Hi Guys

I am  in the process of perl to python transition for good.  I wanted to
get some feedback or may be best practice for the following.

1. stitch pipelines : I want python to act as a glue allowing me to run
various linux shell based programs. If needed wait for a program to finish
and then move on, logs if required

2. run the same pipeline but on a local grid if required. (SGE flavor
mainly)

Any modules which can reduce the number of lines i write will be helpful.


Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] feedback on writing pipelines in python

2012-03-21 Thread Abhishek Pratap
Hi Steve

I agree Steve perl is perfectly fine for the stuff I described but I am
also interested trying alternatives. I am seeing quite interesting data
handling stuff coming up in Python and I would like to try and sometimes as
a programmer I dont like so many ways of doing the same things but it is
subjective. Having many options can be good for some.


-Abhi



On Wed, Mar 21, 2012 at 11:20 AM, Steve Willoughby wrote:

> On 21-Mar-12 11:03, Abhishek Pratap wrote:
>
>> Hi Guys
>>
>> I am  in the process of perl to python transition for good.  I wanted to
>>
>
> Why?  Perl is still a perfectly good tool.  Just not, IMHO, good for
> exactly the same things Python is good for.
>
>
>  1. stitch pipelines : I want python to act as a glue allowing me to run
>> various linux shell based programs. If needed wait for a program to
>> finish and then move on, logs if required
>>
>
> Look at the subprocess standard library module.  It offers a complete set
> of options for launching processes, piping their data aound, waiting for
> them, handling exceptions, and so forth.
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] weird error in my python program : merge sort

2012-03-22 Thread Abhishek Pratap
I am imlpementing a merge sort algo for clarity purposes but my
program is giving me weird answers. Sometimes it is able to sort and
other times it does funky things. Help appreciated


from random import *
from numpy import *

nums = [random.randint(100) for num in range(4)]
#nums = [3,7,2,10]

def merge_sort(nums, message='None'):
    #print "%s : num of elements in the list %d" % (message,len(nums))
    print '[merge_sort] %s : %s' % ( message, nums)

    if len(nums) <= 1:
        return nums

    middle = len(nums)/2
    print '[merge_sort] Mid point is %d' % middle
    left  = nums[:middle]
    right = nums[middle:]

    merge_sort(left,'left')
    merge_sort(right,'right')
    print '[merge_sort] Calling merge on left: %s right : %s' % (left,right)
    result = merge(left,right)
    print '[merge_sort] %s' % result
    return result


def merge(left,right):
    result = []
    i,j = 0,0

    print '[merge] left %s, right %s' % (left, right)

    while i < len(left) and j < len(right):
        print '[merge]Comparing left %d to right %d' % (left[i],right[j])
        if left[i] <= right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1

        print '[merge]pushing to result',result

    result.extend(left[i:])
    result.extend(right[j:])
    print '[merge] return',result
    return result


merge_sort(nums,'start')
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] weird error in my python program : merge sort : resolved

2012-03-22 Thread Abhishek Pratap
I was not updating the list from the recursive call.

>     merge_sort(left,'left')
>     merge_sort(right,'right')

left = merge_sort(left,'left')
right = merge_sort(right,'right')


-A

-Abhi

On Thu, Mar 22, 2012 at 1:40 PM, Abhishek Pratap  wrote:
> I am imlpementing a merge sort algo for clarity purposes but my
> program is giving me weird answers. Sometimes it is able to sort and
> other times it does funky things. Help appreciated
>
>
> from random import *
> from numpy import *
>
> nums = [random.randint(100) for num in range(4)]
> #nums = [3,7,2,10]
>
> def merge_sort(nums, message='None'):
>     #print "%s : num of elements in the list %d" % (message,len(nums))
>     print '[merge_sort] %s : %s' % ( message, nums)
>
>     if len(nums) <= 1:
>         return nums
>
>     middle = len(nums)/2
>     print '[merge_sort] Mid point is %d' % middle
>     left  = nums[:middle]
>     right = nums[middle:]
>
>     merge_sort(left,'left')
>     merge_sort(right,'right')
>     print '[merge_sort] Calling merge on left: %s right : %s' % (left,right)
>     result = merge(left,right)
>     print '[merge_sort] %s' % result
>     return result
>
>
> def merge(left,right):
>     result = []
>     i,j = 0,0
>
>     print '[merge] left %s, right %s' % (left, right)
>
>     while i < len(left) and j < len(right):
>         print '[merge]Comparing left %d to right %d' % (left[i],right[j])
>         if left[i] <= right[j]:
>             result.append(left[i])
>             i += 1
>         else:
>             result.append(right[j])
>             j += 1
>
>         print '[merge]pushing to result',result
>
>     result.extend(left[i:])
>     result.extend(right[j:])
>     print '[merge] return',result
>     return result
>
>
> merge_sort(nums,'start')
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] concurrent file reading using python

2012-03-26 Thread Abhishek Pratap
Hi Guys


I want to utilize the power of cores on my server and read big files
(> 50Gb) simultaneously by seeking to N locations. Process each
separate chunk and merge the output. Very similar to MapReduce
concept.

What I want to know is the best way to read a file concurrently. I
have read about file-handle.seek(),  os.lseek() but not sure if thats
the way to go. Any used cases would be of help.

PS: did find some links on stackoverflow but it was not clear to me if
I found the right solution.


Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] concurrent file reading using python

2012-03-26 Thread Abhishek Pratap
Thanks Walter and  Steven for the insight. I guess I will post my
question to python main mailing list and see if people have anything
to say.

-Abhi

On Mon, Mar 26, 2012 at 3:28 PM, Walter Prins  wrote:
> Abhi,
>
> On 26 March 2012 19:05, Abhishek Pratap  wrote:
>> I want to utilize the power of cores on my server and read big files
>> (> 50Gb) simultaneously by seeking to N locations. Process each
>> separate chunk and merge the output. Very similar to MapReduce
>> concept.
>>
>> What I want to know is the best way to read a file concurrently. I
>> have read about file-handle.seek(),  os.lseek() but not sure if thats
>> the way to go. Any used cases would be of help.
>
> Your idea won't work.  Reading from disk is not a CPU-bound process,
> it's an I/O bound process.  Meaning, the speed by which you can read
> from a conventional mechanical hard disk drive is not constrained by
> how fast your CPU is, but generally by how fast your disk(s) can read
> data from the disk surface, which is limited by the rotation speed and
> areal density of the data on the disk (and the seek time), and by how
> fast it can shovel the data down it's I/O bus.  And *that* speed is
> still orders of magnitude slower than your RAM and your CPU.  So, in
> reality even just one of your cores will spend the vast majority of
> its time waiting for the disk when reading your 50GB file.  There's
> therefore __no__ way to make your file reading faster by increasing
> your __CPU cores__ -- the only way is by improving your disk I/O
> throughput.  You can for example stripe several hard disks together in
> RAID0 (but that increases the risk of data loss due to data being
> spread over multiple drives) and/or ensure you use a faster I/O
> subsystem (move to SATA3 if you're currently using SATA2 for example),
> and/or use faster hard disks (use 10,000 or 15,000 RPM instead of
> 7,200, or switch to SSD [solid state] disks.)  Most of these options
> will cost you a fair bit of money though, so consider these thoughts
> in that light.
>
> Walter
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] generators

2012-04-03 Thread Abhishek Pratap
Hey Mike

The following link should help you. http://www.dabeaz.com/generators/
. Cool slide deck with examples from David Beazley's  explanation of
generators.


-A




On Tue, Apr 3, 2012 at 11:38 AM, mike jackson  wrote:
> I am trying understand python and have done fairly well, So for it has been 
> easy to learn and is concise.  However I seem to not quite understand the use 
> of a generator over a function(I am familiar with functions [other languages 
> and math]).  To me (excepting obvious syntax differences) a generator is a 
> function.  Why should I use a generator instead of a function or vice versa?  
> Is perhaps specfic uses it was created to handle?  A great web page with good 
> examples would be nice.  Of course if you can sum it up rather easy then by 
> all means go ahead.
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] updating step size while in loop

2012-07-09 Thread Abhishek Pratap
hey guys

I want to know whether it is possible for dynamically update the step
size in xrange  or someother slick way.

Here is what I am trying to do, if during a loop I find the x in list
I want to skip next #n iterations.


for x in xrange(start,stop,step):
if x in list:
 step = 14
else:
 step = 1



Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] updating step size while in loop

2012-07-09 Thread Abhishek Pratap
Ok thanks Hugo. I have the while loop working

-A

On Mon, Jul 9, 2012 at 3:06 PM, Hugo Arts  wrote:
> On Mon, Jul 9, 2012 at 11:59 PM, Abhishek Pratap 
> wrote:
>>
>> hey guys
>>
>> I want to know whether it is possible for dynamically update the step
>> size in xrange  or someother slick way.
>>
>> Here is what I am trying to do, if during a loop I find the x in list
>> I want to skip next #n iterations.
>>
>>
>> for x in xrange(start,stop,step):
>> if x in list:
>>  step = 14
>> else:
>>  step = 1
>>
>>
>>
>> Thanks!
>> -Abhi
>
>
> It is not possible with a range object. You'll have to make a while loop and
> keep track of the step yourself.
>
> Hugo
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Problem When Iterating Over Large Test Files

2012-07-18 Thread Abhishek Pratap
Hi Ryan

One quick comment

I dint get through all your code to figure out the  fine details but
my hunch is you might be having issues related to linux to dos  EOF
char.  Could you check the total  number of lines in your fastq# are
same as read by a simple python file iterator. If not then it is
mostly becoz readline is reading more than one line at a time.


-Abhi

On Wed, Jul 18, 2012 at 4:33 PM, Ryan Waples  wrote:
> I'm seeing some unexpected output when I use a script (included at
> end) to iterate over large text files.  I am unsure of the source of
> the unexpected output and any help would be much appreciated.
>
> Background
> Python v 2.7.1
> Windows 7 32bit
> Reading and writing to an external USB hard drive
>
> Data files are ~4GB text (.fastq) file, it has been uncompressed
> (gzip).  This file has no errors or formatting problems, it seems to
> have uncompressed just fine.  64M lines, each 'entry' is split across
> 4 consecutive lines, 16M entries.
>
> My python script iterates over data files 4 lines at a time, selects
> and writes groups of four lines to the output file.  I will end up
> selecting roughly 85% of the entries.
>
> In my output I am seeing lines that don't occur in the original file,
> and that don't match any lines in the original file.  The incidences
> of badly formatted lines don't seem to match up with any patterns in
> the data file, and occur across multiple different data files.
>
> I've included 20 consecutive lines of input and output.  Each of these
> 5 'records' should have been selected and printed to the output file.
> But there is a problem with the 4th and 5th entries in the output, and
> it no longer matches the input as expected.  For example the line:
> TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT
> never occurs in the original data.
>
> Sorry for the large block of text below.
> Other pertinent info, I've tried a related perl script, and ran into
> similar issues, but not in the same places.
>
> Any help or insight would be appreciated.
>
> Thanks
>
>
> __EXAMPLE RAW DATA FILE REGION__
>
> @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
> CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
> +
> @@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@#
> @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
> TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
> +
> @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?A @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
> CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA
> +
> CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCA>AD>BA
> @HWI-ST0747:167:B02DEACXX:8:1101:3022:167094 1:N:0:
> ATTCCGTGCAGGCCAACTCCCGACGGACATCCTTGCTCAGACTGCAGCGATAGTGGTCGATCAGGGCCCTGTTGTTCCATCCCACTCCGGCGACCAGGTTC
> +
> CCCFHIDHJIIHIIIJIJIIGGIIFHJIIIIEIFHFF>CBAECBDDDC:??B=AAACD?8@:>C@?8CBDDD@D99B@>3884>A
> @HWI-ST0747:167:B02DEACXX:8:1101:3095:167100 1:N:0:
> CGTGATTGCAGGGACGTTACAGAGACGTTACAGGGATGTTACAGGGACGTTACAGAGACGTTAAAGAGATGTTACAGGGATGTTACAGACAGAGACGTTAC
> +
>
>
> __EXAMPLE PROBLEMATIC OUTPUT FILE REGION__
>
> @HWI-ST0747:167:B02DEACXX:8:1101:3182:167088 1:N:0:
> CGCGTGTGCAGGTTTATAGAACCAGCTGCAGATTAGTAGCAGCGCACGGAGAGGTGTGTCTGTTTATTGTCCTCAGCAGGCAGACATGTTTGTGGTC
> +
> @@@DDADDHB9+2A;6(5@CDAC(5(5:5,(8?88?BC@#
> @HWI-ST0747:167:B02DEACXX:8:1101:3134:167090 1:N:0:
> TTCTAGTGCAGGGCGACAGCGTTGCGGAGCCGGTCCGAGTCTGCTGGGTCAGTCATGGCTAGTTGGTACTATAACGACACAGGGCGAGACCCAGATGCAAA
> +
> @CCFFFDFHJJIJHHIIIJHGHIJI@GFFDDDFDDCEEEDCCBDCCCCCB>>@C(4@ADCA>>?BBBDDABB055<>-?A @HWI-ST0747:167:B02DEACXX:8:1101:3002:167092 1:N:0:
> CTTTGCTGCAGGCTCATCCTGACATGACCCTCCAGCATGACAATGCCACCAGCCATACTGCTCGTTCTGTGTGTGATTTCCAGCAAGTAAATATGTA
> +
> CCCFHIJIEHIH@AHFAGHIGIIGGEIJGIJIIIGIIIGEHGEHIIJIEHH@FHGH@=ACEHHFBFFCE@AACCA>AD>BA
> TTCTGTGAGTGATTTCCTGCAAGACAGGAATGTCAGT
> +
> BCCFFDFFFIJIJJHIFGGGGIGGIJIJIGIGIGIGHHIGIIJGJJJIIJIIEHIHHHFFFB@>CCE@BEDCDDAC?CC?ACC??>ADDD
> @HWI-ST0747:167:B02DEACXX:8:1304:19473:44548 1:N:0:
> CTACAGTGCAGGCACCCGGCCCGCCACAATGAGTCGCTAGAGCGCAATGAGACAAGTAAAGCTGACCAAACCCTTAACCCGGACGATGCTGGG
> +
> BCCFHIJEHJJIIGIJIGIJIDHDGIGIGGED@CCDDC>C>BBD?BDBAABDDD@BCD@?@BDBDDDBDCCC2
>
>
>
>
> __PYTHON CODE __
>
>
> import glob
>
> my_in_files = glob.glob ('E:/PINK/Paired_End/raw/gzip/*.fastq')
>
> for each in my_in_files:
> #print(each)
> out = each.replace('/gzip', '/rem_clusters2' )
> #print (out)
> INFILE = open (each, 'r')
> OUTFILE = open (out , 'w')
>
> # Tracking Variables
> Reads = 0
> Writes = 0
> Check_For_End_Of_File = 0
>
> #Updates
> print ("Reading File: " + each)
> print ("Writing File: " + out)
>
> # Read FASTQ File by group of four lines
> while Check

[Tutor] *large* python dictionary with persistence storage for quick look-ups

2012-08-06 Thread Abhishek Pratap
Hey Guys

I have asked a question on stackoverflow and thought I would post it
here as it has some learning flavor attached to it...atleast I feel
that.

http://stackoverflow.com/questions/11837229/large-python-dictionary-with-persistence-storage-for-quick-look-ups

-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] using multiprocessing efficiently to process large data file

2012-08-30 Thread Abhishek Pratap
Hi Guys

I have a with few million lines. I want to process each block of 8
lines and from my estimate my job is not IO bound. In other words it
takes a lot more time to do the computation than it would take for
simply reading the file.

I am wondering how can I go about reading data from this at a faster
pace and then farm out the jobs to worker function using
multiprocessing module.

I can think of two ways.

1. split the split and read it in parallel(dint work well for me )
primarily because I dont know how to read a file in parallel
efficiently.
2. keep reading the file sequentially into a buffer of some size and
farm out a chunks of the data through multiprocessing.

Any example would be of great help.

Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] managing memory large dictionaries in python

2012-10-16 Thread Abhishek Pratap
Hi Guys

For my problem I need to store 400-800 million 20 characters keys in a
dictionary and do counting. This data structure takes about 60-100 Gb
of RAM.
I am wondering if there are slick ways to map the dictionary to a file
on disk and not store it in memory but still access it as dictionary
object. Speed is not the main concern in this problem and persistence
is not needed as the counting will only be done once on the data. We
want the script to run on smaller memory machines if possible.

I did think about databases for this but intuitively it looks like a
overkill coz for each key you have to first check whether it is
already present and increase the count by 1  and if not then insert
the key into dbase.

Just want to take your opinion on this.

Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] managing memory large dictionaries in python

2012-10-16 Thread Abhishek Pratap
On Tue, Oct 16, 2012 at 7:22 PM, Alexander  wrote:
> On Tue, Oct 16, 2012 at 20:43 EST, Mark Lawrence
>  wrote:
>> For the record Access is not a database, or so some geezer called Alex
>> Martelli reckons http://code.activestate.com/lists/python-list/48130/, so
>> please don't shoot the messenger:)
>> Cheers.
>> Mark Lawrence.
>
> Mark I don't believe your response is relevant or helpful to the
> original post so please don't hijack.
>
>
> --
> 7D9C597B
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor

Thanks guys..I think I will try shelve and sqlite. I dont think
creating millions of files (one for each key) will make the sys
admins/file system happy.

Best,
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] learning to program in cython

2013-01-16 Thread Abhishek Pratap
Hi Guys

With the help of an awesome python community I have been able to pick up
the language and now willing to explore other cool extensions of it.

I routinely have large loops which could be ported to cython for speed.
However I have never written a single line of cython code. Any pointers on
getting started.

A tutorial text or video would be of great help.

Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] increment a counter inside generator

2013-03-13 Thread Abhishek Pratap
Hey Guys

I might be missing something obvious here.


import numpy as np

count = 0
[ count += 1 for num in np.random.random_integers(1,100,20) if num > 20]

 File "", line 2
[ count += 1 for num in np.random.random_integers(1,100,20) if num > 20]
 ^
SyntaxError: invalid syntax


Also tried
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] increment a counter inside generator

2013-03-13 Thread Abhishek Pratap
On Wed, Mar 13, 2013 at 2:02 PM, Oscar Benjamin
 wrote:
> On 13 March 2013 19:50, Abhishek Pratap  wrote:
>> Hey Guys
>>
>> I might be missing something obvious here.
>>
>>
>> import numpy as np
>>
>> count = 0
>> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20]
>>
>>  File "", line 2
>> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20]
>>  ^
>> SyntaxError: invalid syntax
>
> I think this does what you want:
>
>>>> import numpy as np
>>>> a = np.random.random_integers(1, 100, 20)
>>>> (a > 20).sum()
> 17
>
> I don't know if this really applies to what you're doing but the
> result of this computation is a binomially distributed random number
> that you could generate directly (without creating the intermediate
> array):
>
>>>> np.random.binomial(100, .2)
> 26
>
>
> Oscar

Hi Oscar

I just used a very contrived example to ask if we can increment a
counter inside a generator. The real case is more specific and
dependent on other code and not necessarily useful for the question.

-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] increment a counter inside generator

2013-03-13 Thread Abhishek Pratap
On Wed, Mar 13, 2013 at 2:08 PM, Dave Angel  wrote:
> On 03/13/2013 03:50 PM, Abhishek Pratap wrote:
>>
>> Hey Guys
>>
>> I might be missing something obvious here.
>>
>>
>> import numpy as np
>>
>> count = 0
>> [ count += 1 for num in np.random.random_integers(1,100,20) if num > 20]
>>
>>   File "", line 2
>>  [ count += 1 for num in np.random.random_integers(1,100,20) if num >
>> 20]
>>   ^
>> SyntaxError: invalid syntax
>>
>>
>> Also tried
>> 
>
>
> I can't help with the numpy portion of that, but that's not the correct
> syntax for a list comprehension.  The first item must be an expression, and
> count+=1 is NOT.
>
> You probably want  (untested)
>   count = sum([  1 for num in ..])
>
> which will add a bunch of ones.  That will probably give you a count of how
> many of the random integers are > 20.
>
> There also may very well be a function in numpy that would do it in one
> step.  See Oscar's message.
>
> --
> DaveA
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor


Thanks Dave. That probably is the reason why I am getting the error.

-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] help with itertools.izip_longest

2013-03-16 Thread Abhishek Pratap
Hey Guys


I am trying to use itertools.izip_longest to read a large file in
chunks based on the examples I was able to find on the web. However I
am not able to understand the behaviour of the following python code.
(contrived form of example)



for x in itertools.izip_longest(*[iter([1,2,3])]*2):
print x


###output:
(1, 2)
(3, None)


It gives me the right answer but I am not sure how it is doing it. I
also referred to the itertools doc but could not comprehend much. In
essence I am trying to understand the intracacies of the following
documentation from the itertools package.

"The left-to-right evaluation order of the iterables is guaranteed.
This makes possible an idiom for clustering a data series into
n-length groups using izip(*[iter(s)]*n)."

How is *n able to group the data and the meaning of '*' in the
beginning just after izip.


Thanks!
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] help with itertools.izip_longest

2013-03-16 Thread Abhishek Pratap
On Sat, Mar 16, 2013 at 2:32 PM, Oscar Benjamin
 wrote:
> On 16 March 2013 21:14, Abhishek Pratap  wrote:
>> Hey Guys
>>
>> I am trying to use itertools.izip_longest to read a large file in
>> chunks based on the examples I was able to find on the web. However I
>> am not able to understand the behaviour of the following python code.
>> (contrived form of example)
>>
>> for x in itertools.izip_longest(*[iter([1,2,3])]*2):
>> print x
>>
>>
>> ###output:
>> (1, 2)
>> (3, None)
>>
>>
>> It gives me the right answer but I am not sure how it is doing it. I
>> also referred to the itertools doc but could not comprehend much. In
>> essence I am trying to understand the intracacies of the following
>> documentation from the itertools package.
>>
>> "The left-to-right evaluation order of the iterables is guaranteed.
>> This makes possible an idiom for clustering a data series into
>> n-length groups using izip(*[iter(s)]*n)."
>>
>> How is *n able to group the data and the meaning of '*' in the
>> beginning just after izip.
>
> The '*n' part is to multiply the list so that it repeats. This works
> for most sequence types in Python:
>
>>>> a = [1,2,3]
>>>> a * 2
> [1, 2, 3, 1, 2, 3]
>
> In this particular case we multiply a list containing only one item,
> the iterator over s. This means that the new list contains the same
> element twice:
>>>> it = iter(a)
>>>> [it]
> []
>>>> [it] * 2
> [, ]
>
> So if every element of the list is the same iterator, then we can call
> next() on any of them to get the same values in the same order:
>>>> d = [it]*2
>>>> d
> [, ]
>>>> next(d[1])
> 1
>>>> next(d[0])
> 2
>>>> next(d[0])
> 3
>>>> next(d[0])
> Traceback (most recent call last):
>   File "", line 1, in 
> StopIteration
>>>> next(d[1])
> Traceback (most recent call last):
>   File "", line 1, in 
> StopIteration
>
> The * just after izip is for argument unpacking. This allows you to
> call a function with arguments unpacked from a list:
>
>>>> def f(x, y):
> ... print('x is %s' % x)
> ... print('y is %s' % y)
> ...
>>>> f(1, 2)
> x is 1
> y is 2
>>>> args = [1,2]
>>>> f(args)
> Traceback (most recent call last):
>   File "", line 1, in 
> TypeError: f() takes exactly 2 arguments (1 given)
>>>> f(*args)
> x is 1
> y is 2
>
> So the original expression, izip(*[iter(s)]*2), is another way of writing
>
> it = iter(s)
> izip(it, it)
>
> And izip(*[iter(s)]*10) is equivalent to
>
> izip(it, it, it, it, it, it, it, it, it, it)
>
> Obviously writing it out like this will get a bit unwieldy if we want
> to do izip(*[iter(s)]*100) so the preferred method is
> izip(*[iter(s)]*n) which also allows us to choose what value to give
> for n without changing anything else in the code.
>
>
> Oscar


Thanks a bunch Oscar. This is why I love this community. It is
absolutely clear now. It is funny I am getting the solution over the
mailing list while I am at pycon :)


best,
-Abhi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] help with itertools.izip_longest

2013-03-16 Thread Abhishek Pratap
On Sat, Mar 16, 2013 at 2:53 PM, Peter Otten <__pete...@web.de> wrote:
> Abhishek Pratap wrote:
>
>> I am trying to use itertools.izip_longest to read a large file in
>> chunks based on the examples I was able to find on the web. However I
>> am not able to understand the behaviour of the following python code.
>> (contrived form of example)
>>
>>
>>
>> for x in itertools.izip_longest(*[iter([1,2,3])]*2):
>> print x
>>
>>
>> ###output:
>> (1, 2)
>> (3, None)
>>
>>
>> It gives me the right answer but I am not sure how it is doing it. I
>> also referred to the itertools doc but could not comprehend much. In
>> essence I am trying to understand the intracacies of the following
>> documentation from the itertools package.
>>
>> "The left-to-right evaluation order of the iterables is guaranteed.
>> This makes possible an idiom for clustering a data series into
>> n-length groups using izip(*[iter(s)]*n)."
>>
>> How is *n able to group the data and the meaning of '*' in the
>> beginning just after izip.
>
> Break the expression into smaller chunks:
>
> items = [1, 2, 3]
> it = iter(items)
> args = [it] * 2 # same as [it, it]
> chunks = itertools.izip_longest(*args) # same as izip_longest(it, it)
>
> As a consequence of passing the same iterator twice getting the first item
> from the "first" iterator will advance the "second" iterator (which is
> actually the same as the first iterator) to the second item which will in
> turn advance the "first" iterator to the third item. Try to understand the
> implementation given for izip() at
>

Thanks Peter. I guess I missed the trick on how each iterator will be
moved ahead automatically as the are basically same, replicated N
times.

-Abhi


> http://docs.python.org/2/library/itertools.html#itertools.izip
>
> before you proceed to izip_longest().
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python debugger/IDE that can be launched from a remote command line

2013-05-10 Thread Abhishek Pratap
On Fri, May 10, 2013 at 10:58 AM, Michael O'Leary wrote:

> I am working on a project in which the code and data I am working with are
> all on an Amazon EC2 machine. So far I have been ssh'ing to the EC2 machine
> in two terminal windows, running emacs or vi in one of them to view and
> update the code and running the "python -m pdb ..." debugger in the other
> one to step through the code.
>
> I would prefer to work with an IDE that displays and updates program state
> automatically, but I don't know which ones I could launch from a remote
> machine and have it display within a terminal window or use XWindows or GTK
> to display in its own window. Are there any Python debuggers or IDEs that
> can be used in this kind of setting?
> Thanks,
> Mike
>
>
I think IPython could be a useful here. Kick start a IPython notebook on
Amazon machine and open it over https locally. More information on this here

http://ipython.org/ipython-doc/dev/interactive/htmlnotebook.html


-Abhi



> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor