event loop vs threads

2016-09-26 Thread srinivas devaki
how does Python switch execution and maintain context i.e function stack
etc,.. for co-routines and why is it less costly than switching threads
which almost do the same, and both are handled by Python Interpreter
itself(event loop for co-routines and GIL scheduling for threading), so
where does the extra overhead for threads come from ?

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Counting words in a string??

2016-09-30 Thread srinivas devaki
On Oct 1, 2016 12:10 AM, "Jake"  wrote:
>
> Hi, I need a program which:
> 1) Asks the user for a sentence of their choice (not including
punctuation)
> 2) Ask the user which word they would like to know is repeated
> 3) Print out to the user how many times the word came up which they chose
from their sentence.
>

typical home work assignment, even though stop asking for programs and
start asking how to make the same.

anyway if you ever try to write code for this you have to split you
sentence and use a dict for counting

Python has Counter from collections but it is a little bit slower when
compared to defaultdict for this kind of purpose.

Regards
Srinivas Devaki
Senior (final yr) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: N-grams

2016-11-09 Thread srinivas devaki
> def ngrams(iterable, n=2):
> if n < 1:
> raise ValueError
> t = tee(iterable, n)
> for i, x in enumerate(t):
> for j in range(i):
> next(x, None)
> return zip(*t)

def myngrams(iterable, n=2):
t = list(tee(iterable, 1))
for _ in range(n - 1):
t.extend(tee(t.pop()))
next(t[-1], None)
return zip(*t)


complexity wise it's O(N), but space complexity is O(N**2) to execute
this function,

to consume the output i.e completely iterate over zip object, the time
complexity is obviously O(N*M), and but space complexity is just
O(N*N).


---

In [1]: %paste
def ngrams(iterable, n=2):
if n < 1:
raise ValueError
t = tee(iterable, n)
for i, x in enumerate(t):
for j in range(i):
next(x, None)
return zip(*t)
## -- End pasted text --

In [2]: %paste
def myngrams(iterable, n=2):
t = list(tee(iterable, 1))
for _ in range(n - 1):
t.extend(tee(t.pop()))
next(t[-1], None)
return zip(*t)

## -- End pasted text --

In [3]: from itertools import *

In [4]: %timeit ngrams(range(1000), n=500)
100 loops, best of 3: 17.3 ms per loop

In [5]: %timeit myngrams(range(1000), n=500)
1000 loops, best of 3: 469 µs per loop

In [6]: %timeit list(ngrams(range(1000), n=500))
10 loops, best of 3: 21.2 ms per loop

In [7]: %timeit list(myngrams(range(1000), n=500))
100 loops, best of 3: 4.41 ms per loop

In [8]: %timeit ngrams(range(1000), n=100)
1000 loops, best of 3: 722 µs per loop

In [9]: %timeit myngrams(range(1000), n=100)
1 loops, best of 3: 94.9 µs per loop

In [10]: %timeit list(ngrams(range(1000), n=100))
100 loops, best of 3: 2.09 ms per loop

In [11]: %timeit list(myngrams(range(1000), n=100))
1000 loops, best of 3: 1.46 ms per loop

In [12]:

---



Regards
Srinivas Devaki
Senior (4th year) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
phone: +91 9491 383 249
telegram: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: N-grams

2016-11-09 Thread srinivas devaki
On Thu, Nov 10, 2016 at 12:43 PM, srinivas devaki
 wrote:
> complexity wise it's O(N), but space complexity is O(N**2) to execute
> this function,

I'm sorry, that is a mistake.
I just skimmed through the itertoolsmodule.c, and it seems like the
space complexity is just O(N), as when tee objects are copied only the
current index is tracked, so the space complexity is just O(N) and
time complexity is also O(N) as tee object copy operation takes O(1)


Regards
Srinivas Devaki
Senior (4th year) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
phone: +91 9491 383 249
telegram: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: N-grams

2016-11-10 Thread srinivas devaki
On Thu, Nov 10, 2016 at 2:26 PM, Peter Otten <[email protected]> wrote:
>
> I don't think I've seen tee(iterable, 1) before. Did you do this for
> aesthetic reasons or is there an advantage over
>
>   t = [iter(iterable)]

Yeah just to be aesthetic, there's no extra advantage over that as
with n=1 tee just returns a wrapper around the iterable.


Regards
Srinivas Devaki
Senior (4th year) student at Indian Institute of Technology (ISM), Dhanbad
Computer Science and Engineering Department
phone: +91 9491 383 249
telegram: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Help me

2016-03-30 Thread srinivas devaki
ahh, this is the beginning of a conspiracy to waste my time.

PS: just for laughs. not to offend any one.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
On Mar 30, 2016 2:11 PM, "Ben Finney"  wrote:

> Smith  writes:
>
> > Il 29/03/2016 11:17, Ben Finney ha scritto:
> > > You'll get better help if you:
> > >
> > > * Summarise the problem briefly in the Subject field.
> > >
> > > * Actually say anything useful in the message body.
> > >
> > thanks a lot
>
> You're welcome. Feel free to ask about the Python language here,
> following the advice above.
>
> --
>  \  “Programs must be written for people to read, and only |
>   `\incidentally for machines to execute.” —Abelson & Sussman, |
> _o__)  _Structure and Interpretation of Computer Programs_ |
> Ben Finney
>
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


how to set nth bit of large binary file.

2016-04-25 Thread srinivas devaki
I use aria2c to download files, aria2c has this feature of allocating the
memory to file before downloading the file and then it will download using
multiple connections so filling the data into this file concurrently.

So i wonder how to do it. I found a way to do that from here
http://stackoverflow.com/questions/3407505/writing-binary-data-to-middle-of-a-sparse-file

but it only supports if you are constructing the data in file from scratch
and aria2c can resume the download too i.e not from scratch.

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: String concatenation (was: Steve D'Aprano, you're the "master". What's wrong with this concatenation statement?)

2016-05-08 Thread srinivas devaki
I'm so sorry, forgot to lock my phone.
On May 9, 2016 9:01 AM, "srinivas devaki" 
wrote:

> f be gfdnbh be b GB GB BH GB vbjfhjb GB bffbbubbv GB hbu hbu
> fjbjfbbbufhbvh VB have fqbgvfb NB bb GB GB GB GB bbu GB vu GB vu GB GB
> b GB fbufjnb BH GB GB bvvfbubbjubuv GB b fbufbbby GB bfff GB f GB
> bbbu GB GB ffinj GB vh vh fjb GB fj GB h h GB gjfthey're the b GB gjf GBG
> GBG q GB fbb b bh VB ffbff GBG fbfvrgv
> On May 9, 2016 7:49 AM, "Chris Angelico"  wrote:
>
> On Mon, May 9, 2016 at 10:44 AM, Thomas 'PointedEars' Lahn
>  wrote:
> > With the “%” string operator (deprecated), str.format(), and
> str.Template,
> > you can use other values in string values even without concatenation.
>
> Not deprecated. Don't spread FUD.
>
> > Finally, with SQL you should prefer Prepared Statements and Stored
> > Procedures, not bare strings, to avoid SQL injection:
> >
> > <https://www.owasp.org/index.php/SQL_Injection_Prevention_Cheat_Sheet>
>
> He is safe. He's using parameterized queries.
>
> > Also, it would be a good idea if you posted under your real name.
> Internet
> > is the thing with cables; Usenet is the thing with people.  I for one
> tend
> > to avoid communicating with few-letter entities; exceptions to that would
> > probably include only E.T., M.J., ALF, and K.I.T.T.
>
> I'm not using Usenet, Mr PointedEars.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: String concatenation (was: Steve D'Aprano, you're the "master". What's wrong with this concatenation statement?)

2016-05-08 Thread srinivas devaki
f be gfdnbh be b GB GB BH GB vbjfhjb GB bffbbubbv GB hbu hbu
fjbjfbbbufhbvh VB have fqbgvfb NB bb GB GB GB GB bbu GB vu GB vu GB GB
b GB fbufjnb BH GB GB bvvfbubbjubuv GB b fbufbbby GB bfff GB f GB
bbbu GB GB ffinj GB vh vh fjb GB fj GB h h GB gjfthey're the b GB gjf GBG
GBG q GB fbb b bh VB ffbff GBG fbfvrgv
On May 9, 2016 7:49 AM, "Chris Angelico"  wrote:

On Mon, May 9, 2016 at 10:44 AM, Thomas 'PointedEars' Lahn
 wrote:
> With the “%” string operator (deprecated), str.format(), and str.Template,
> you can use other values in string values even without concatenation.

Not deprecated. Don't spread FUD.

> Finally, with SQL you should prefer Prepared Statements and Stored
> Procedures, not bare strings, to avoid SQL injection:
>
> 

He is safe. He's using parameterized queries.

> Also, it would be a good idea if you posted under your real name.
Internet
> is the thing with cables; Usenet is the thing with people.  I for one tend
> to avoid communicating with few-letter entities; exceptions to that would
> probably include only E.T., M.J., ALF, and K.I.T.T.

I'm not using Usenet, Mr PointedEars.

ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Steve D'Aprano, you're the "master". What's wrong with this concatenation statement?

2016-05-12 Thread srinivas devaki
On May 9, 2016 5:31 AM, "Tim Chase"  wrote:
>
> then that's a bad code-smell (you get quadratic behavior as the
> strings are constantly resized), usually better replaced with
>

I just want to point out that in Python s += str in loop is not giving
quadratic behavior. I don't know why but it runs fast. I'm very much
interested to know why it is so?

In [3]: %%timeit
   ...: s = ''
   ...: for x in xrange(10**6):
   ...: s += str(x)
   ...:
1 loop, best of 3: 383 ms per loop

In [4]: %%timeit
s = ''
for x in xrange(10**6):
s = s + str(x)
   ...:
1 loop, best of 3: 383 ms per loop
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Getting data out of Mozilla Thunderbird with Python?

2015-12-09 Thread srinivas devaki
On Dec 9, 2015 4:45 PM, "Steven D'Aprano"  wrote:
>
> Maildir is also *much* safer too. With mbox, a single error when writing
> email to the mailbox will likely corrupt *all* emails from that point on,
> so potentially every email in the mailbox. With maildir, a single error
> when writing will, at worst, corrupt one email.
>

may be with frequent backup of mbox file and storing checksum to each email
will be faster and safe too.
I wonder if they already do that.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading lines of text from 7z compressed files in Python

2015-12-09 Thread srinivas devaki
On Dec 9, 2015 3:07 PM, "Anmol Dalmia"  wrote:
>
>
> I wish to use the native LZMA library of Python 3.4 for faster performance
> than any other third- party packages. Is it possible to do so?
>

you can check the source of lzma module main compression and decompression
algorithms were written in c. so obviously you will get faster performance.
In [21]: import _lzma

In [22]: _lzma.__file__
Out[22]:
'/home/eightnoteight/.anaconda3/envs/snakes3.4.3/lib/python3.4/lib-dynload/_
lzma.cpython-34m.so'


and regarding your problem, here's a simple example on how you can read
line by line of your compressed 7z text file.


import lzma
with lzma.open('test.7z', 'w') as lf:
lf.write(b'123\n'*1000)
with lzma.open('test.7z', 'r') as lf:
arr = list(lf)
print(len(arr))
print(set(arr))
print(arr[0])
print(arr[0].decode('utf-8'))

[gist] https://gist.github.com/38681cad88928b089abb

later you can even extract that test.7z with 7z command line client with
(7z x test.7z)
-- 
https://mail.python.org/mailman/listinfo/python-list


How to use internal python c funtions, from python code

2015-12-09 Thread srinivas devaki
Hi
I'm coming from this link (
https://groups.google.com/forum/#!topic/python-ideas/cBFvxq1LQHM), which
proposes to use long_to_decimal_string(), int_to_decimal_string() functions
for printing integers in different bases.

Now is there anyway i can use such internal functions from pure python
code, passing ctypes when the arguments are c datatypes.

For competitive programming purposes I really want to use those functions
for speed.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to use internal python c funtions, from python code

2015-12-10 Thread srinivas devaki
Thank you Chris,
later I  decided that this would be cheating  and I have to think about
another algorithmic approach.

most of the competitive programming platforms provide python with a time
limit of 5 times of c/c++ time limit. but in many cases like if the
algorithms are recursive(like segment trees etc) that 5 factor is just not
enough.

but still I think it would be cool to be able to access internal c
functions without any fuss. I can use such feature with heapq too(sift
operations),
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Trailing zeros of 100!

2016-01-02 Thread srinivas devaki
let's put an end to this.

from math import log
# simple one to understand. complexity: O(n*log(n))
def countzeros_va(n):
count = 0
for x in xrange(1, n + 1):
while x % 5 == 0:
count += 1
x //= 5
return count

# better approach. complexity: O(log(n))
def countzeros_vb(n):
count, c5 = 0, 5
while c5 <= n:
count += (n // c5)
c5 *= 5
return count

# this is same as before, its just that while loops irk me
def countzeros_vc(n):
return sum(n // (5**x) for x in range(1, int(log(n, 5) + 3)))
# adding +3 to be sure. never trust approximations.

def run_sample_tests():
precal = {3: 0, 60: 14, 100: 24, 1024: 253, 23456: 5861, 8735373: 2183837}
for x in precal:
assert precal[x] == countzeros_va(x) == countzeros_vb(x) ==
countzeros_vc(x)

if __name__ == '__main__':
run_sample_tests()

Although the code is deterministic, it can be further tested from
http://www.wolframalpha.com/widgets/view.jsp?id=54da27e6e09dc404890a578735b9f7d8
http://www.spoj.com/problems/FCTRL/

On Jan 2, 2016 5:22 PM,  wrote:
>
> Hi, newbie here!
> I'm trying to write a python program to find how many trailing zeros are in 
> 100! (factorial of 100).
> I used factorial from the math module, but my efforts to continue failed. 
> Please help.
>
> Thank you,
> Yehuda
> --
> https://mail.python.org/mailman/listinfo/python-list

On Sun, Jan 3, 2016 at 5:50 AM, Ben Finney  wrote:
> Robin Koch  writes:
>
>> Am 02.01.2016 um 22:57 schrieb Chris Angelico:
>> >>> But did you actually test it?
>> >>
>> >> Yes, should work for n >= 1.
>
> By “test it”, Chris of course means test it *by implementing it in a
> program and running that program in Python*.
>
>> >> Why do you ask?
>> >
>> > Your "should work" does not sound good as a response to "actually
>> > test". Normally I would expect the response to be "Yes, and it
>> > worked for me" (maybe with a log of an interactive session).
>>
>> Well, honestly, I trusted my math and didn't thought much about the
>> technical limitations.
>
> That's why it's good to actually test the hypothesis in a real computer
> program, run on the actual computer system you're going to use.
>
> Computers are physical systems, with technical compromises to the
> physical constraints under which they were built.
>
> They are not perfect implementations of our ideal mathematics, and
> testing the mathematics is no guarantee the mathematical assumptions
> will survive your program unscathed.
>
> So, a request “Did you actually test it?” is both a polite reminder to
> do that, and an attempt to get you to do so if you didn't.
>
> If you didn't, then answering “yes” is wasting everyone's time.
>
> --
>  \   “As the most participatory form of mass speech yet developed, |
>   `\the Internet deserves the highest protection from governmental |
> _o__)   intrusion.” —U.S. District Court Judge Dalzell |
> Ben Finney
>
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-08 Thread srinivas devaki
You can create a single heap with primary key as timestamp and
secondary key as priority, i.e by creating a tuple
insert the elements into the heap as
(timestamp, priority)


If there is any underlying meaning for creating 2 heaps. please mention.


On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze  wrote:
> Hi everybody,
>
> suppose, I need items sorted by two criteria (say timestamp and priority).
> For that purpose, I use two heaps (heapq module):
>
> heapA # items sorted by timestamp
> heapB # items sorted by priority
>
> Now my actual problem. When popping an item of heapA (that's the oldest
> item), I need to remove the very same item from heapB, regardlessly where it
> is in heapB. And vice versa.
>
> Is there a datastructure or a simple trick to achieve that in an efficient
> matter?
>
> Best,
> Sven
> --
> https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-08 Thread srinivas devaki
So you need a task scheduler with expiration and priority on each task.
Interesting Problem..

as peter said about marking the task in heapB to be deleted. but this
needs searching everytime you pop off an old item [complexity:
O(len(heapB))]. you may as well use python del on it as complexity is
same.
But if you already know the where to look in the heapB then you can
avoid searching and thereby reducing the complexity. you can do this
by saving the references of heapB in heapA and viceversa

and if you marking delete on a number of low priority tasks, then it
can increase your heapB enormously because being low priority items
they can stagnate. to resolve this error you have to perform linear
checking iterations at every critical length (this critical length can
be obtained mathmatically), so that your amortized complexity of push,
pop remains log(number_of_valid_tasks_at_any_time);
the number of valid tasks at any time are those tasks which are not
expired at the time.

My Algorithm:
version_a: https://gist.github.com/9ce7a0e534c6e768239e
this version has simple scheduler which has methods for popping high
priority one and popping oldest task.
But this version has a disadvantage, i.e If lets say you are pushed
some 10**5 tasks with priority 2, and then pushed some 10**5 tasks
with priority 1. and then you decided to pop the oldest 10**5 items.
in this version that 10**5 elements will stagnate in priority heap if
in future all priorities are less than 1.
now this is not a big issue but if you are running a webserver and
over a span of 5 days there could be 10**10 tasks, and even if half of
those tasks stagnate your server is going to crash with out of memory.

version_b: https://gist.github.com/99b4d590753ba234eeed
this version resolved that stagnation. this one will run sweeps
whenever there are more than half of useless items, thereby giving us
an amortized complexity of O(log(n)) for push, pop, etc

but this version doesn't have the feature of expiration.

version_c: https://gist.github.com/9dfd0d291672c0ffa5c3
in this one we simply keep a variable expiration, and relieve the
expired tasks on any operation. i have coded it in such a way that
expiration is specific time, you can change it to delta time if you
want to.

Time Complexity: O(log(n)) insertion, O(log(n)) deletion   [amortized]
Space Complexity: O(n)  [amortized]
here n is number of valid items i.e which are not expired.

I hope this helps with your problem

PS:
sorry for posting links, it's just that the code is large for email.
I'm using minimum number has highest priority convention.


On Fri, Jan 8, 2016 at 10:15 PM, Sven R. Kunze  wrote:
> Thanks for your suggestion.
>
> On 08.01.2016 14:21, srinivas devaki wrote:
>>
>> You can create a single heap with primary key as timestamp and
>> secondary key as priority, i.e by creating a tuple
>> insert the elements into the heap as
>> (timestamp, priority)
>
> I think I cannot use that because I need the list sorted by both criteria.
>>
>> If there is any underlying meaning for creating 2 heaps. please mention.
>
>
> I use two heaps because I need to insert arbitrary items fast and remove the
> ones fast which are too old (timestamp) or are next in order (priority).
>
> Basically a task scheduler where tasks can be thrown away once they are too
> long in the queue.
>
>
>> On Fri, Jan 8, 2016 at 4:22 AM, Sven R. Kunze  wrote:
>>>
>>> Hi everybody,
>>>
>>> suppose, I need items sorted by two criteria (say timestamp and
>>> priority).
>>> For that purpose, I use two heaps (heapq module):
>>>
>>> heapA # items sorted by timestamp
>>> heapB # items sorted by priority
>>>
>>> Now my actual problem. When popping an item of heapA (that's the oldest
>>> item), I need to remove the very same item from heapB, regardlessly where
>>> it
>>> is in heapB. And vice versa.
>>>
>>> Is there a datastructure or a simple trick to achieve that in an
>>> efficient
>>> matter?
>>>
>>> Best,
>>> Sven
>>> --
>>> https://mail.python.org/mailman/listinfo/python-list
>
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-11 Thread srinivas devaki
On Jan 11, 2016 12:18 AM, "Sven R. Kunze"  wrote:
> Indeed. I already do the sweep method as you suggested. ;)
>
> Additionally, you provided me with a reasonable condition when to do the
sweep in order to achieve O(log n). Thanks much for that. I currently used
a time-bases approached (sweep each 20 iterations).
>
> PS: Could you add a note on how you got to the condition (
2*self.useless_b > len(self.heap_b))?
>

oh that's actually simple,
that condition checks if more than half of heap is useless items.
the sweep complexity is O(len(heap)), so to keep the extra amortized
complexity as O(1), we have to split that work(virtually) with O(len(heap))
operations, so when our condition becomes true we have done len(heap)
operations, so doing a sweep at that time means we splitted that
work(O(len(heap))) with every operation.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-11 Thread srinivas devaki
On Jan 10, 2016 12:05 AM, "Paul Rubin"  wrote:
>
> You could look up "timing wheels" for a simpler practical approach that
> the Linux kernel scheduler used to use (I think it changed a few years
> ago).

this is not related to OP's topic

I googled about "timing wheels" and "Linux kernel scheduler", I couldn't
find any learning resources or at least the resources that I can
understand. Could you please point me to some learning resources for a
beginner.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to remove item from heap efficiently?

2016-01-13 Thread srinivas devaki
On Wed, Jan 13, 2016 at 4:50 PM, Cem Karan  wrote:
>
> Is that so?  I'll be honest, I never tested its asymptotic performance, I 
> just assumed that he had a dict coupled with a heap somehow, but I never 
> looked into the code.
>

I have just tested the code, the aymptotic performance is O(log(n))
for all operations. Infact the code is very simple to understand,
technically the heapdict class is composed of a dict and heap, each element of
heap is a mutable list and dict stores references to that mutable list,
so that a specific element can be deleted in O(log(n))
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-01-30 Thread srinivas devaki
@Sven
actually you are not sweeping at all, as i remember from my last post
what i meant by sweeping is periodically deleting the elements which
were marked as popped items.

kudos on that __setitem__ technique,
instead of using references to the items like in HeapDict, it is
brilliant of you to simply use __setitem__

On Sun, Jan 31, 2016 at 4:17 AM, Sven R. Kunze  wrote:
> Hi again,
>
> as the topic of the old thread actually was fully discussed, I dare to open
> a new one.
>
> I finally managed to finish my heap implementation. You can find it at
> https://pypi.python.org/pypi/xheap + https://github.com/srkunze/xheap.
>
> I described my motivations and design decisions at
> http://srkunze.blogspot.com/2016/01/fast-object-oriented-heap-implementation.html
>
> @Cem
> You've been worried about a C implementation. I can assure you that I did
> not intend to rewrite the incredibly fast and well-tested heapq
> implementation. I just re-used it.
>
> I would really be grateful for your feedback as you have first-hand
> experience with heaps.
>
> @srinivas
> You might want to have a look at the removal implementation. Do you think it
> would be wiser/faster to switch for the sweeping approach?
>
> I plan to publish some benchmarks to compare heapq and xheap.
>
> @all
> What's the best/standardized tool in Python to perform benchmarking? Right
> now, I use a self-made combo of unittest.TestCase and time.time + proper
> formatting.
>
> Best,
> Sven
>
>
> PS: fixing some weird typos and added missing part.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-01 Thread srinivas devaki
On Feb 1, 2016 10:54 PM, "Sven R. Kunze"  wrote:
>
> Maybe I didn't express myself well. Would you prefer the sweeping
approach in terms of efficiency over how I implemented xheap currently?
>

complexity wise your approach is the best one of all that I have seen till
now

> Without running some benchmarks, I have absolutely no feeling which
approach is faster/more memory efficient etc.
>

this is obviously memory efficient but I don't know whether this approach
would be faster than previous approaches, with previous approaches there is
no call back into Python code from C code for comparison.
but this should be faster than HeapDict as HeapDict is directly using its
own methods for heappush, heappop etc

PS: if you have time, could you please review my pull request.
-- 
https://mail.python.org/mailman/listinfo/python-list


_siftup and _siftdown implementation

2016-02-04 Thread srinivas devaki
_siftdown function breaks out of the loop when the current pos has a valid
parent.

but _siftup function is not implemented in that fashion, if a valid subheap
is given to the _siftup, it will bring down the root of sub heap and then
again bring it up to its original place.

I was wondering why it is so, is it just to make the code look simple???

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: _siftup and _siftdown implementation

2016-02-04 Thread srinivas devaki
On Feb 5, 2016 5:45 AM, "Steven D'Aprano"  wrote:
>
> On Fri, 5 Feb 2016 07:50 am, srinivas devaki wrote:
>
> > _siftdown function breaks out of the loop when the current pos has a
valid
> > parent.
> >
> > but _siftup function is not implemented in that fashion, if a valid
> > subheap is given to the _siftup, it will bring down the root of sub heap
> > and then again bring it up to its original place.

as I come to think of it again, it is not subheap, it actually heap cut at
some level hope you get the idea from the usage of _siftup. so even though
the `pos` children are valid the _siftup brings down the new element (i.e
the element which is at first at `pos`) upto its leaf level and then again
it is brought up by using _siftdown. why do the redundant work when it can
simply breakout?

> >
> > I was wondering why it is so, is it just to make the code look simple???
>
> Hi Srinivas,
>
> I'm sure that your question is obvious to you, but it's not obvious to us.
> Where are _siftup and _siftdown defined? Are they in your code? Somebody
> else's code? A library? Which library? What do they do? Where are they
> from?

_siftup and _siftdown are functions from python standard heapq module.

PS: I do competitive programming, I use these modules every couple of days
when compared to other modules. so didn't give much thought when posting to
the mailing list. sorry for that.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: _siftup and _siftdown implementation

2016-02-05 Thread srinivas devaki
On Fri, Feb 5, 2016 at 8:12 PM, Sven R. Kunze  wrote:
> On 05.02.2016 02:26, srinivas devaki wrote:
> What do you think about our use-case?
>
Oh, the logic is sound, every element that we have inserted has to be popped,
We are spending some *extra* time in rearranging the elements only to be
sure that we won't be spending more than this *extra* time when doing other
operations, and our use-case isn't much different either, If by rearranging the
elements in the heap(*subheap*) gets optimal for other operations like
popping the
root element(heap[0]) then obviously it is optimal for popping other elements
(children of heap[0]).


PS: @sven But don't yet merge the pull request, I could be wrong.
as the heapq module already says that the variance is very small,
let me write some tests(on more than 10**3 elements) and then get back here.
-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: _siftup and _siftdown implementation

2016-02-05 Thread srinivas devaki
wow, that's great.
you read a comment in the code, and you test it, to only find that it
is indeed true,
sounds ok, but feels great. :)

Just experimenting a bit, I swaped the lines _siftdown and _siftup and something
strange happened the number of comparisions in both the versions remained same.
I'm attaching the files.

do you have any idea why this happened?

On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze  wrote:
>
> Can we do better here?
>
I don't know, I have to read TAOP knuth article.

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: _siftup and _siftdown implementation

2016-02-06 Thread srinivas devaki
As the comments in the heapq module says, in most of the cases
(probability 0.837 [from knuth vol3]) the the new inserted element
whose initial location is end of the array, which means that it will
be larger than the `pos` children. So the old version and new version
code is same in these cases because in old version the comparison is
taking place at line 129, in the new version comparison is taking
place inside _siftdown.
SO the total number of comparisons are not decreased by those cases
the other type case is the swapped element is smaller than `pos`
parents, in this case the old version does a comparison and then goto
_sifdown but in the new version _siftdown will be executed and then it
comes to _siftup, here in 50% of the cases(len(heap last level) ==
len(heap) // 2) the pos doesn't have a child, so no comparison.


`pos` | probability | `old version comparisons`  | `new
version comparisions`

last level|  0.5| 1  | 0
last - 1 level|  0.25   | 1  | 1
lower levels  |  0.25   | 1  | >=1


as the new version doesn't do a comparison if it is in the last level,
the optimization is occurring in that place.
Which makes the reason behind the heapq module's choice of _siftup
code is not at all related to this cause.


PS:
please copy the table to some text editor, for better visualization.

On Fri, Feb 5, 2016 at 11:12 PM, srinivas devaki
 wrote:
> wow, that's great.
> you read a comment in the code, and you test it, to only find that it
> is indeed true,
> sounds ok, but feels great. :)
>
> Just experimenting a bit, I swaped the lines _siftdown and _siftup and 
> something
> strange happened the number of comparisions in both the versions remained 
> same.
> I'm attaching the files.
>
> do you have any idea why this happened?
>
> On Fri, Feb 5, 2016 at 9:57 PM, Sven R. Kunze  wrote:
>>
>> Can we do better here?
>>
> I don't know, I have to read TAOP knuth article.
>
> --
> Regards
> Srinivas Devaki
> Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
> Computer Science and Engineering Department
> ph: +91 9491 383 249
> telegram_id: @eightnoteight



-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-07 Thread srinivas devaki
On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
>
>
>
> I know that there are methods of handling this from the client-side
(tuples with unique counters come to mind), but if your library can handle
it directly, then that could be useful to others as well.

yeah it is a good idea to do at client side.
but if it should be introduced as feature into the library, instead of
tuples, we should just piggyback a single counter it to the self._indexes
dict, or better make another self._counts dict which will be light and fast.
and if you think again with this method you can easily subclass with just
using self._counts dict  in your subclass. but still I think it is good to
introduce it as a feature in the library.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-08 Thread srinivas devaki
On Feb 8, 2016 5:17 PM, "Cem Karan"  wrote:
>
> On Feb 7, 2016, at 10:15 PM, srinivas devaki 
wrote:
> > On Feb 8, 2016 7:07 AM, "Cem Karan"  wrote:
> > > I know that there are methods of handling this from the client-side
(tuples with unique counters come to mind), but if your library can handle
it directly, then that could be useful to others as well.
> >
> > yeah it is a good idea to do at client side.
> > but if it should be introduced as feature into the library, instead of
tuples, we should just piggyback a single counter it to the self._indexes
dict, or better make another self._counts dict which will be light and fast.
> > and if you think again with this method you can easily subclass with
just using self._counts dict  in your subclass. but still I think it is
good to introduce it as a feature in the library.
> >
> > Regards
> > Srinivas Devaki
>
> I meant that the counter is a trick to separate different instances of
(item, priority) pairs when you're pushing in the same item multiple times,
but with different priorities.

oh okay, I'm way too off.

what you are asking for is a Priority Queue like feature.

but the emphasis is on providing extra features to heap data structure.

and xheap doesn't support having duplicate items.

and if you want to insert same items with distinct priorities, you can
provide the priority with key argument to the xheap. what xheap doesn't
support is having same keys/priorities.
So I got confused and proposed a method to have same keys.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Heap Implementation

2016-02-09 Thread srinivas devaki
On Feb 10, 2016 6:11 AM, "Cem Karan"  wrote:
>
> Eh, its not too bad once you figure out how to do it.  It's easier in C
though; you can use pointer tricks that let you find the element in
constant time, and then removal will involve figuring out how to fix up
your heap after you've removed the element.
>

If you can do it with C pointers then you can do it with python's
references/mutable objects. :)
in case of immutable objects, use a light mutable wrapper or better use
list for performance.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: There has to be a better way to split this string!

2016-02-09 Thread srinivas devaki
On Feb 10, 2016 6:56 AM, "Anthony Papillion" 
wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
>
> Hello Everyone,
>
> I am using datetime.now() to create a unique version of a filename.
> When the final file is named, it will look something like:
>
> myfile-2015-02-09-19-08-45-4223
>

You can easily do this(retrieving the tokens) using re module.

In [33]: mat =
re.search(r'(\S+)-(\d{4})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{4})',
'myfi-le-2015-02-09-19-08-45-4223')

In [34]: mat

Out[34]: <_sre.SRE_Match object; span=(0, 32),
match='myfi-le-2015-02-09-19-08-45-4223'>

In [35]: mat.groups() Out[35]: ('myfi-le', '2015', '02', '09', '19', '08',
'45', '4223')

In [36]: mat =
re.search(r'(\S+)-(\d{4})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{4})',
'myfile-2015-02-09-19-08-45-4223')
In [37]: mat

Out[37]: <_sre.SRE_Match object; span=(0, 31),
match='myfile-2015-02-09-19-08-45-4223'> In [38]: mat.groups()

Out[38]: ('myfile', '2015', '02', '09', '19', '08', '45', '4223')

if you don't want fiddle with regex you can use parse module(
https://github.com/r1chardj0n3s/parse). but why use an external library
when stdlib already provides it? :)

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: There has to be a better way to split this string!

2016-02-09 Thread srinivas devaki
On Feb 10, 2016 7:23 AM, "srinivas devaki" 
wrote:
>
>
> On Feb 10, 2016 6:56 AM, "Anthony Papillion" 
wrote:
> >
> > -BEGIN PGP SIGNED MESSAGE-
> > Hash: SHA512
> >
> > Hello Everyone,
> >
> > I am using datetime.now() to create a unique version of a filename.
> > When the final file is named, it will look something like:
> >
> > myfile-2015-02-09-19-08-45-4223
> >
>
> You can easily do this(retrieving the tokens) using re module.
>
> In [33]: mat =
re.search(r'(\S+)-(\d{4})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{4})',
'myfi-le-2015-02-09-19-08-45-4223')
>
> In [34]: mat
>
> Out[34]: <_sre.SRE_Match object; span=(0, 32),
match='myfi-le-2015-02-09-19-08-45-4223'>
>
> In [35]: mat.groups() Out[35]: ('myfi-le', '2015', '02', '09', '19',
'08', '45', '4223')
>
> In [36]: mat =
re.search(r'(\S+)-(\d{4})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{2})-(\d{4})',
'myfile-2015-02-09-19-08-45-4223')
> In [37]: mat
>
> Out[37]: <_sre.SRE_Match object; span=(0, 31),
match='myfile-2015-02-09-19-08-45-4223'> In [38]: mat.groups()
>
> Out[38]: ('myfile', '2015', '02', '09', '19', '08', '45', '4223')
>
> if you don't want fiddle with regex you can use parse module(
https://github.com/r1chardj0n3s/parse). but why use an external library
when stdlib already provides it? :)

I'm a stupid.
as soon as I saw strftime it looked like strptime and I assumed he is
trying to extract the tokens and wrote that stupid/unrelated mail.

PS: trying to read mailing list when you are half woke, is a bad idea and
trying reply to it is even bad idea.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Storing a big amount of path names

2016-02-11 Thread srinivas devaki
On Feb 12, 2016 6:05 AM, "Paulo da Silva" 
wrote:
>
> Hi!
>
> What is the best (shortest memory usage) way to store lots of pathnames
> in memory where:
>
> 1. Path names are pathname=(dirname,filename)
> 2. There many different dirnames but much less than pathnames
> 3. dirnames have in general many chars
>
> The idea is to share the common dirnames.
>
> More realistically not only the pathnames are stored but objects each
> object being a MyFile containing
> self.name - 
> getPathname(self) - 
> other stuff
>
> class MyFile:
>
>   __allfiles=[]
>
>   def __init__(self,dirname,filename):
> self.dirname=dirname  # But I want to share this with other files
> self.name=filename
> MyFile.__allfiles.append(self)
> ...
>
>   def getPathname(self):
> return os.path.join(self.dirname,self.name)
>

what you want is Trie data structure, which won't use extra memory if the
basepath of your strings are common.

instead of having constructing a char Trie, try to make it as string Trie
i.e each directory name is a node and all the files and folders are it's
children, each node can be of two types a file and folder.

if you come to think about it this is most intuitive way to represent the
file structure in your program.

you can extract the directory name from the file object by traversing it's
parents.

I hope this helps.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Multiple Assignment a = b = c

2016-02-16 Thread srinivas devaki
Hi,

a = b = c

as an assignment doesn't return anything, i ruled out a = b = c as
chained assignment, like a = (b = c)
SO i thought, a = b = c is resolved as
a, b = [c, c]


at-least i fixed in my mind that every assignment like operation in
python is done with references and then the references are binded to
the named variables.
like globals()['a'] = result()

but today i learned that this is not the case with great pain(7 hours
of debugging.)

class Mytest(object):
def __init__(self, a):
self.a = a
def __getitem__(self, k):
print('__getitem__', k)
return self.a[k]
def __setitem__(self, k, v):
print('__setitem__', k, v)
self.a[k] = v

roots = Mytest([0, 1, 2, 3, 4, 5, 6, 7, 8])
a = 4
roots[4] = 6
a = roots[a] = roots[roots[a]]


the above program's output is
__setitem__ 4 6
__getitem__ 4
__getitem__ 6
__setitem__ 6 6


But the output that i expected is
__setitem__ 4 6
__getitem__ 4
__getitem__ 6
__setitem__ 4 6

SO isn't it counter intuitive from all other python operations.
like how we teach on how python performs a swap operation???

I just want to get a better idea around this.

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Multiple Assignment a = b = c

2016-02-16 Thread srinivas devaki
On Tue, Feb 16, 2016 at 6:35 PM, Sven R. Kunze  wrote:
>
> First, the rhs is evaluated.
> Second, the lhs is evaluated from left to right.
Great, I will remember these two lines :)

On Tue, Feb 16, 2016 at 8:46 PM, Steven D'Aprano  wrote:
> _temp = c
> a = _temp
> b = _temp
> del _temp
>
>
> except the name "_temp" isn't actually used.
>
So it is like first right most expression is evaluated and then lhs is
evaluated from left to right.

> py> from dis import dis
> py> code = compile("a = L[a] = x", "", "exec")
> py> dis(code)
>   1   0 LOAD_NAME0 (x)
>   3 DUP_TOP
>   4 STORE_NAME   1 (a)
>   7 LOAD_NAME2 (L)
>  10 LOAD_NAME1 (a)
>  13 STORE_SUBSCR
>  14 LOAD_CONST   0 (None)
>  17 RETURN_VALUE
>
>
> Translated to English:
>
> - evaluate the expression `x` and push the result onto the stack;
>
> - duplicate the top value on the stack;
>
> - pop the top value off the stack and assign to name `a`;
>
> - evaluate the name `L`, and push the result onto the stack;
>
> - evaluate the name `a`, and push the result onto the stack;
>
> - call setattr with the top three items from the stack.
>
thank-you so much, for explaining how to find the underlying details.


>> SO isn't it counter intuitive from all other python operations.
>> like how we teach on how python performs a swap operation???
>
> No. Let's look at the equivalent swap:
>
> In all cases, the same rule applies:
>
> - evaluate the right hand side from left-most to right-most, pushing the
> values onto the stack;
>
> - perform assignments on the left hand side, from left-most to right-most.
>

uhh, i related it with swap because I was thinking variables are
binded, like first of all for all lhs assignments get their references
or names and then put the value of rhs in them.
as `a` is a name, so the rhs reference is copied to the a
`roots[a]` is a reference to an object, so it is initialized with the
reference of rhs.

anyway I got it, and all my further doubts are cleared from that
compiled code. I tried some other examples and understood how it
works.

thanks a lot.

-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Continuing indentation

2016-03-04 Thread srinivas devaki
thought i should add this here so that people will get to this after
someone decides a standard way to do this :P

look for second if condition in the source code of
subprocess.Popen(*args, **kwargs).communicate

def communicate(self, input=None, timeout=None):
"""Interact with process: Send data to stdin.  Read data from
stdout and stderr, until end-of-file is reached.  Wait for
process to terminate.

The optional "input" argument should be data to be sent to the
child process (if self.universal_newlines is True, this should
be a string; if it is False, "input" should be bytes), or
None, if no data should be sent to the child.

communicate() returns a tuple (stdout, stderr).  These will be
bytes or, if self.universal_newlines was True, a string.
"""

if self._communication_started and input:
raise ValueError("Cannot send input after starting communication")

# Optimization: If we are not worried about timeouts, we haven't
# started communicating, and we have one or zero pipes, using select()
# or threads is unnecessary.
if (timeout is None and not self._communication_started and
[self.stdin, self.stdout, self.stderr].count(None) >= 2):
stdout = None
stderr = None
if self.stdin:
self._stdin_write(input)
elif self.stdout:
stdout = self.stdout.read()
self.stdout.close()
elif self.stderr:
stderr = self.stderr.read()
[. extra code snapped]

ps: Python3.5.1


On Sat, Mar 5, 2016 at 8:19 AM, Tim Chase  wrote:
> On 2016-03-04 17:17, [email protected] wrote:
>> x \
>> = \
>> 5
>> if \
>> y \
>> == \
>> z:
>> print \
>> 'this is terrible'
>> print \
>> 'but still not incorrect
>>
>> It would be terrible, still but not incorrect.
>
> And has the sociopathic benefit that the diffs make it quite clear
> what changed.  None of this
> looking-deep-into-lines-to-see-what-changed.
>
>   x \
>   = \
>   5
>   if \
>   y \
> - != \
> + == \
>   z:
>   print \
>   'this is terrible'
>   print \
>   'but still not incorrect
>
> Still terrible.  But not quite as useless as a knee-jerk reaction
> might suggest.
>
> I actually hacked together a binary-diff something like this,
> emitting every hex-formatted byte of each file on its own line, then
> diffing the two results.  I could see doing something similar to diff
> Python ASTs.
>
> -tkc
>
>
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: even faster heaps

2016-03-07 Thread srinivas devaki
, u'heapq  ', u' 0.04 ( 1.00x)  0.32 ( 1.00x)  3.49 (
1.00x) 33.92 ( 1.00x)')
(u'', u'Heap   ', u' 0.05 ( 1.18x)  0.39 ( 1.22x)  4.21 (
1.20x) 42.03 ( 1.24x)')
(u'', u'RemovalHeap', u' 0.06 ( 1.52x)  0.52 ( 1.62x)  5.60 (
1.60x) 56.54 ( 1.67x)')


(u'init', u'heapq', u' 0.44 ( 1.00x)  7.92 ( 1.00x) 106.52 (
1.00x) 915.20 ( 1.00x)')
(u'', u'OrderHeap', u' 0.50 ( 1.14x)  8.67 ( 1.10x) 111.99 (
1.05x) 1129.89 ( 1.23x)')
(u'', u'XHeap', u' 0.61 ( 1.38x) 10.75 ( 1.36x) 140.86 (
1.32x) 1417.84 ( 1.55x)')

(u'pop ', u'heapq', u' 0.04 ( 1.00x)  0.55 ( 1.00x)  6.59 ( 1.00x)
76.81 ( 1.00x)')
(u'', u'OrderHeap', u' 0.06 ( 1.68x)  0.79 ( 1.43x)  9.04 ( 1.37x)
101.72 ( 1.32x)')
(u'', u'XHeap', u' 0.09 ( 2.43x)  1.03 ( 1.85x) 11.48 ( 1.74x)
125.94 ( 1.64x)')

(u'push', u'heapq', u' 0.01 ( 1.00x)  0.16 ( 1.00x)  1.85 ( 1.00x)
14.65 ( 1.00x)')
(u'', u'OrderHeap', u' 0.04 ( 4.32x)  0.46 ( 2.81x)  4.74 ( 2.56x)
42.25 ( 2.88x)')
(u'', u'XHeap', u' 0.03 ( 3.73x)  0.42 ( 2.55x)  4.34 ( 2.35x)
38.37 ( 2.62x)')


(u'remove', u'RemovalHeap', u' 0.05 ( 1.00x)  0.54 ( 1.00x)  5.62 (
1.00x) 57.04 ( 1.00x)')
(u'  ', u'XHeap  ', u' 0.04 ( 0.88x)  0.44 ( 0.80x)  4.41 (
0.78x) 43.86 ( 0.77x)')




So as the results are not much effected apart of __init__, i think you
should consider this.

Note: i'm not using collections.Counter because it is written in
python, and from my previous experience it is slower than using
defaultdict for this kind of purposes.

ps: there are two error's when i ran tests with test_xheap. OMG why
did you keep repititions = 1, at first i though my pentium laptop
is too slow that it is not printing a single line even after 20
minutes. then i saw the number of computations are in the order of
10**10, how many days did it took to completely run the tests?


On Sun, Mar 6, 2016 at 7:29 PM, Sven R. Kunze  wrote:
> Hi python-list, hi Srinivas,
>
> I managed to implement the mark&sweep approach for fast removal from heaps.
> This way, I got three pleasant results:
>
> 1) a substantial speed up!
> 2) an improved testsuite
> 3) discovery and fixing of several bugs
>
> @Srinivas I would be honored if you could have a look at the implementation:
> https://github.com/srkunze/xheap . After all, it was your idea. I only
> perform the sweeping step during pop and remove with the condition of yours.
> :)
>
> Using the original xheap benchmark, I could see huge speedups: from 50x/25x
> down to 3x/2x compared to heapq. That's a massive improvement. I will
> publish an update soon.
>
> Best,
> Sven



-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: looping and searching in numpy array

2016-03-13 Thread srinivas devaki
problem is infact not related to numpy at all. the complexity of your
algorithm is O(len(npArray1) * len(npArray2))

which means the number of computations that you are doing is in the range
of 10**10,

if the absolute difference between the maximum element and minimum element
is less than 10**6, you can improve your code by pre-computing the first
occurrence of a number by using an array of size of that difference(afore
mentioned).

if your npArray2 doesn't have such a pattern, you have to precompute it by
using a dict (I don't know if numpy has such data structure)

an optimised pseudo code would look like

mmdiff = max(npArray2) - min(npArray2)
if mmdiff < 10**6:
precom = np.array([-1] * mmdiff)
offset = min(npArray2)
for i, x in enumerate(npArray2):
precom[x - offset] = i
for id in npArray1:
if 0 <= id - offset < mmdiff and precom[id - offset] != -1:
new_id = precom[id]
# your code
else:
precom = {}
for i, x in enumerate(npArray1):
if x not in precom:
precom[x] = i
for id in npArray1:
if id in precom:
new_id = precom[id]
# your code


you can just use the else case which will work for all cases but if your
npArray2 has such a pattern then the above code will perform better.

Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
On Mar 10, 2016 5:15 PM, "Heli"  wrote:

Dear all,

I need to loop over a numpy array and then do the following search. The
following is taking almost 60(s) for an array (npArray1 and npArray2 in the
example below) with around 300K values.


for id in np.nditer(npArray1):

   newId=(np.where(npArray2==id))[0][0]


Is there anyway I can make the above faster? I need to run the script above
on much bigger arrays (50M). Please note that my two numpy arrays in the
lines above, npArray1 and npArray2  are not necessarily the same size, but
they are both 1d.


Thanks a lot for your help,

--
https://mail.python.org/mailman/listinfo/python-list
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: sobering observation, python vs. perl

2016-03-18 Thread srinivas devaki
please upload the log file,

and global variables in python are slow, so just keep all that in a
function and try again. generally i get 20-30% time improvement by
doin that.


On Thu, Mar 17, 2016 at 8:59 PM, Charles T. Smith
 wrote:
> I've really learned to love working with python, but it's too soon
> to pack perl away.  I was amazed at how long a simple file search took
> so I ran some statistics:
>
> $ time python find-rel.py
> ./find-relreq *.out | sort -u
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real1m4.581s
> user1m4.412s
> sys 0m0.140s
>
>
> $ time python find-rel.py
> # modified to use precompiled REs:
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real0m29.337s
> user0m29.174s
> sys 0m0.100s
>
>
> $ time perl find-rel.pl
> find-relreq.pl *.out | sort -u
> TestCase_F_00_P
> TestCase_F_00_S
> TestCase_F_01_S
> TestCase_F_02_M
>
> real0m5.009s
> user0m4.932s
> sys 0m0.072s
>
> Here's the programs:
>
> #!/usr/bin/env python
> # vim: tw=0
> import sys
> import re
>
> isready = re.compile ("(.*) is ready")
> relreq = re.compile (".*release_req")
> for fn in sys.argv[1:]: # logfile name
> tn = None
> with open (fn) as fd:
> for line in fd:
> #match = re.match ("(.*) is ready", line)
> match = isready.match (line)
> if match:
> tn = match.group(1)
> #match = re.match (".*release_req", line)
> match = relreq.match (line)
> if match:
> #print "%s: %s" % (tn, line),
> print tn
>
> vs.
>
> while (<>) {
> if (/(.*) is ready/) {
> $tn = $1;
> }
>     elsif (/release_req/) {
> print "$tn\n";
> }
> }
>
> Look at those numbers:
> 1 minute for python without precompiled REs
> 1/2 minute with precompiled REs
> 5 seconds with perl.
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
Regards
Srinivas Devaki
Junior (3rd yr) student at Indian School of Mines,(IIT Dhanbad)
Computer Science and Engineering Department
ph: +91 9491 383 249
telegram_id: @eightnoteight
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unbuffered stderr in Python 3

2015-11-06 Thread srinivas devaki
On Mon, Nov 2, 2015 at 1:22 PM, Steven D'Aprano
 wrote:
>
> So how come Python 3 has line buffered stderr? And more importantly, how can
> I turn buffering off?
>
> I don't want to use the -u unbuffered command line switch, because that
> effects stdout as well. I'm happy for stdout to remain buffered.
>
you can simply turn buffering off for stderr by redefining the print
function or declaring a new print function like


from functools import partial
print = partial(print, flush=True)
# or
from functools import partial
import sys
printerr = partial(print, flush=True, file=sys.stderr)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How To Create A Endles List Of Lists In Python...???

2015-11-20 Thread srinivas devaki
On Fri, Nov 20, 2015 at 6:39 PM, Chris Angelico  wrote:
> My crystal ball suggests that defaultdict(list) might be useful here.
>
> ChrisA

I used something similar to this for some problem in hackerrank,
anyway i think this is what you want.

class defaultlist(object):
def __init__(self, factory, data=None):
self.factory = factory
self.list = []
self.data = data

def __getitem__(self, x):
if x >= len(self.list):
self.list.extend([self.factory() for _ in
range(len(self.list), x + 1)])
return self.list[x]

def __repr__(self):
return str(self)

def __str__(self):
if len(self.list) == 0:
return '(' + str(self.data) +  ')[...]'
return ''.join(['(', str(self.data), ')['] + map(str,
self.list) + [', ...]'])


def __setitem__(self, x, v):
if x >= len(self.list):
self.list.extend([self.factory() for _ in
range(len(self.list), x + 1)])
self.list[x] = v

def main():
factory = lambda: defaultlist(factory)
list_of_lists = defaultlist(factory)
print (list_of_lists[0])
list_of_lists[0][10].data = 20
print (list_of_lists[0])


main()

Gist: https://gist.github.com/c0c2ee1e7c6535ef8c3d
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How To Create A Endles List Of Lists In Python...???

2015-11-20 Thread srinivas devaki
On Fri, Nov 20, 2015 at 11:58 PM, srinivas devaki
 wrote:
> def __str__(self):
> if len(self.list) == 0:
> return '(' + str(self.data) +  ')[...]'
> return ''.join(['(', str(self.data), ')['] + map(str, self.list) + 
> [', ...]'])
> ...
> Gist: https://gist.github.com/c0c2ee1e7c6535ef8c3d

uhh, there is an error in representing it, the code should be

def __str__(self):
if len(self.list) == 0:
return '(' + str(self.data) +  ')[...]'
return ''.join(['(', str(self.data), ')[', ', '.join(map(str,
self.list)), ', ...]'])

Output:
(None)[...]
(None)[(None)[...], (None)[...], (None)[...], (20)[...], ...]


ps: code is updated in gist.
-- 
https://mail.python.org/mailman/listinfo/python-list