Re: h(re) for help, import re - on NameError

2016-09-23 Thread Chris Angelico
On Fri, Sep 23, 2016 at 4:40 PM, Peter Otten <[email protected]> wrote:
> By the way, the current help() already loads a module if you pass its name
> as a string:
>

Yes, which is the basis of my alternate exec trick:

exec(tb.tb_frame.f_code, tb.tb_frame.f_globals, {n: n})

Basically it creates a new locals dict that just has (eg) re="re",
which allows help(re) to function as help("re").

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


PyThreadState_Get

2016-09-23 Thread Bharadwaj Srivatsa
Which ever project I am trying to install using python setup.py install 
command, i am getting the following error..



python -mtrace --trace setup.py install
Fatal Python error: PyThreadState_Get: no current thread
ABORT instruction (core dumped)

How to get rid of this error and whats the cause for this
-- 
https://mail.python.org/mailman/listinfo/python-list


memory utilization blow up with dict structure

2016-09-23 Thread Christian
Hi,

I'm wondering why python blow up a dictionary structure so much.

The ids and cat substructure could have 0..n entries but in the most cases they 
are <= 10,t is limited by <= 6.

Thanks for any advice to save memory.
Christian


Example:

{'0a0f7a3a0e09826caef1bff707785662': {'ids': 
{'aa316b86-8169-11e6-bab9-0050563e2d7c',
 'aa3174f0-8169-11e6-bab9-0050563e2d7c',
 'aa319408-8169-11e6-bab9-0050563e2d7c',
 'aa3195e8-8169-11e6-bab9-0050563e2d7c',
 'aa319732-8169-11e6-bab9-0050563e2d7c',
 'aa319868-8169-11e6-bab9-0050563e2d7c',
 'aa31999e-8169-11e6-bab9-0050563e2d7c',
 'aa319b06-8169-11e6-bab9-0050563e2d7c'},
  't': {'type1', 'type2'},
  'dt': datetime.datetime(2016, 9, 11, 15, 15, 54, 343000),
  'nids': 8,
  'ntypes': 2,
  'cat': [('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''),
   ('ABC','aa3174f0-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
   ('ABC','aa319408-8169-11e6-bab9-0050563e2d7c','3', 'type1'),
   ('ABC','aa3195e8-8169-11e6-bab9-0050563e2d7c', '3', 'type2'),
   ('ABC','aa319732-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
   ('ABC','aa319868-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
   ('ABC','aa31999e-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
   ('ABC','aa319b06-8169-11e6-bab9-0050563e2d7c', '3', 'type2')]},

   
I did a fresh read from pickled object to have a "clean" env.


linux-64bit:

sys.getsizeof(superdict)
50331744
len(superdict)
941272


VmPeak:  2981364 kB
VmSize:  2850288 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM:   2108936 kB
VmRSS:   1978076 kB
VmData:  2541724 kB
VmStk:   140 kB
VmExe: 4 kB
VmLib:36 kB
VmPTE:  4380 kB
VmSwap:0 kB
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory utilization blow up with dict structure

2016-09-23 Thread Chris Angelico
On Fri, Sep 23, 2016 at 7:05 PM, Christian  wrote:
> I'm wondering why python blow up a dictionary structure so much.
>
> The ids and cat substructure could have 0..n entries but in the most cases 
> they are <= 10,t is limited by <= 6.
>
> Example:
>
> {'0a0f7a3a0e09826caef1bff707785662': {'ids': 
> {'aa316b86-8169-11e6-bab9-0050563e2d7c',
>  'aa3174f0-8169-11e6-bab9-0050563e2d7c',
>  'aa319408-8169-11e6-bab9-0050563e2d7c',
>  'aa3195e8-8169-11e6-bab9-0050563e2d7c',
>  'aa319732-8169-11e6-bab9-0050563e2d7c',
>  'aa319868-8169-11e6-bab9-0050563e2d7c',
>  'aa31999e-8169-11e6-bab9-0050563e2d7c',
>  'aa319b06-8169-11e6-bab9-0050563e2d7c'},
>   't': {'type1', 'type2'},
>   'dt': datetime.datetime(2016, 9, 11, 15, 15, 54, 343000),
>   'nids': 8,
>   'ntypes': 2,
>   'cat': [('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''),
>('ABC','aa3174f0-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319408-8169-11e6-bab9-0050563e2d7c','3', 'type1'),
>('ABC','aa3195e8-8169-11e6-bab9-0050563e2d7c', '3', 'type2'),
>('ABC','aa319732-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319868-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa31999e-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319b06-8169-11e6-bab9-0050563e2d7c', '3', 'type2')]},
>
>
> sys.getsizeof(superdict)
> 50331744
> len(superdict)
> 941272

So... you have a million entries in the master dictionary, each of
which has an associated collection of data, consisting of half a dozen
things, some of which have subthings. The very smallest an object will
ever be on a 64-bit Linux system is 16 bytes:

>>> sys.getsizeof(object())
16

and most of these will be much larger:

>>> sys.getsizeof(8)
28
>>> sys.getsizeof(datetime.datetime(2016, 9, 11, 15, 15, 54, 343000))
48
>>> sys.getsizeof([])
64
>>> sys.getsizeof(('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''))
80
>>> sys.getsizeof('aa316b86-8169-11e6-bab9-0050563e2d7c')
85
>>> sys.getsizeof({})
240

(Bear in mind that sys.getsizeof counts only the object itself, not
the things it references - that's why the tuple can take up less space
than one of its members.)

I don't think your collections can average less than about 1KB (even
the textual representation of your example data is about that big),
and you have a million of them. That's a gigabyte of memory, right
there. Your peak memory usage is showing 3GB, so most likely, my
conservative estimates have put an absolute lower bound on this. Try
doing everything exactly the same as you did, only without actually
loading the pickle - then see what memory usage is. I think you'll
find that the usage is fully legitimate.

> Thanks for any advice to save memory.

Use a database. I suggest PostgreSQL. You won't have to load
everything into memory all at once that way, and (bonus!) you can even
update stuff on disk without rewriting everything.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: get the sum of differences between integers in a list

2016-09-23 Thread Peter Otten
Daiyue Weng wrote:

> i, I am new to the advanced python techniques, and by studying your code,
> I understand that when calling grouped function with values[1, 2, 3, 6, 8,
> 9, 10, 11, 13, 17, 19],
> 
> tee(values, 3) generated 3 iterators shared values
> 
> left contains [1, 2, 3, 6, 8, 9, 10, 11, 13, 17, 19],
> mid contains [1, 2, 3, 6, 8, 9, 10, 11, 13, 17, 19],
> right contains [1, 2, 3, 6, 8, 9, 10, 11, 13, 17, 19].
> 
> passing them into zip(),
> 
> chain([None], left) generated,
> 
> [1, 2, 3, 6, 8, 9, 10, 11, 13, 17, 19]
> 
> chain(islice(right,1,None), [None]) generated,
> 
> [2, 3, 6, 8, 9, 10, 11, 13, 17, 19]
> 
> zip(chain([None], left), mid, chain(islice(right, 1, None), [None])
> generated triples with,
> 
> [(1,1,2), (2,2,3), (3,3,6),...,(17,17,19)]
> 
> The question is how does triples work in groupby(triples, lonely)?
> especially within lonely(triple)?

Consecutive triples with the same value for lonely(triple) are put into the 
same group. A smaller example:

values = [1, 2, 4, 6, 8, 9]

gives the group keys (True == lonely)

[False, False, True, True, False, Fale]

and results in the groups

not loneley: [1, 2]
lonely [4, 6]
not lonely: [8, 9]
 
> e.g. for first 3 tuples (1,1,2), (2,2,3), (3,3,6), lonely(triple) will
> generated
> 
> False, False, True
> 
> how does this result work in groupby()?
> 
> and what's the necessariness of
> 
> if left is not None and value - left == 1:
> return False

If you look only at one side the groups will become

no gap on the right: [1]
gap on the right: [2, 4, 6]
no gap on the right: [8, 9]

assuming that we declare there's no gap on the right side of the last item. 
In code:

>>> from itertools import *
>>> def triples(values):
... a, b, c = tee(values, 3)
... return zip(chain([None], a), b, chain(islice(c, 1, None), [None]))
... 
>>> def gap(x, y): return x is not None and y is not None and y - x != 1
... 
>>> def values(triples): return [t[1] for t in triples]
... 
>>> sample = [1, 2, 4, 6, 8, 9]
>>> [values(g) for k, g in groupby(triples(sample), lambda v: gap(*v[:2]) 
and gap(*v[1:]))]
[[1, 2], [4, 6], [8, 9]]

I'm lazy, so I continue using triples even though pairs would be sufficient 
below.

>>> [values(g) for k, g in groupby(triples(sample), lambda v: gap(*v[1:]))]
[[1], [2, 4, 6], [8, 9]]

We could use a stateful key to start a group every time we see a gap

>>> def make_key():
... group = True
... def key(v):
... nonlocal group
... if gap(*v[:2]): group = not group
... return group
... return key
... 
>>> [values(g) for k, g in groupby(triples(sample), make_key())]
[[1, 2], [4], [6], [8, 9]]

but this has the disadvantage(?) that every lonely value lands in a separate 
group. Your usecase could then be addressed by checking the groups' lengths:

>>> consecutive = []
>>> other = []
>>> for k, g in groupby(triples(sample), make_key()):
... g = values(g)
... if len(g) == 1: other.extend(g)
... else: consecutive.append(g)
... 
>>> consecutive
[[1, 2], [8, 9]]
>>> other
[4, 6]


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Looking for tips and gotchas for working with Python 3.5 zipapp feature

2016-09-23 Thread Paul Moore
On Tuesday, 20 September 2016 05:45:53 UTC+1, Malcolm Greene  wrote:
>  I really appreciate the detailed response. You answered all my
>  questions. I'm looking forward to testing out your pylaunch wrapper. 

Just one further note, which may or may not be obvious.

If your application uses external dependencies from PyPI, you can bundle them 
with your application using pip's --target option.

So suppose you have an application "myapp", in a directory called "myapp", 
which depends on (say) click and requests. So you have something like

myapp
__main__.py
mymod
__init__.py
code.py
morecode.py

To install requests and click ready for deployment, you can do

pip install --target myapp requests click

That will give you the structure

myapp
__main__.py
mymod
__init__.py
code.py
morecode.py
requests
requests-2.11.1.dist-info
click
click-6.6.dist-info

You can delete the .dist-info directories if you wish, they are package 
metadata and not really important for deployment.

Then just bundle your application using "python -m zipapp myapp" and you have a 
fully standalone .pyz file, that doesn't require the user to have click or 
requests installed.

There are some external dependencies that won't work when bundled in a zipapp 
(most notably anything with a C extension) but for the majority of cases this 
works just fine.

Also, again in case you're not aware because it wasn't very well publicised 
when it was released, you can test your application before zipping it just by 
giving Python the name of the directory structured as above (python myapp). 
This won't catch dependencies that don't like being zipped, but it will save 
you having to go through zip/test/unzip/fix/rezip cycles during development.

Anyway, I hope this is useful.

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory utilization blow up with dict structure

2016-09-23 Thread Christian
Am Freitag, 23. September 2016 12:02:47 UTC+2 schrieb Chris Angelico:
> On Fri, Sep 23, 2016 at 7:05 PM, Christian  wrote:
> > I'm wondering why python blow up a dictionary structure so much.
> >
> > The ids and cat substructure could have 0..n entries but in the most cases 
> > they are <= 10,t is limited by <= 6.
> >
> > Example:
> >
> > {'0a0f7a3a0e09826caef1bff707785662': {'ids': 
> > {'aa316b86-8169-11e6-bab9-0050563e2d7c',
> >  'aa3174f0-8169-11e6-bab9-0050563e2d7c',
> >  'aa319408-8169-11e6-bab9-0050563e2d7c',
> >  'aa3195e8-8169-11e6-bab9-0050563e2d7c',
> >  'aa319732-8169-11e6-bab9-0050563e2d7c',
> >  'aa319868-8169-11e6-bab9-0050563e2d7c',
> >  'aa31999e-8169-11e6-bab9-0050563e2d7c',
> >  'aa319b06-8169-11e6-bab9-0050563e2d7c'},
> >   't': {'type1', 'type2'},
> >   'dt': datetime.datetime(2016, 9, 11, 15, 15, 54, 343000),
> >   'nids': 8,
> >   'ntypes': 2,
> >   'cat': [('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''),
> >('ABC','aa3174f0-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
> >('ABC','aa319408-8169-11e6-bab9-0050563e2d7c','3', 'type1'),
> >('ABC','aa3195e8-8169-11e6-bab9-0050563e2d7c', '3', 'type2'),
> >('ABC','aa319732-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
> >('ABC','aa319868-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
> >('ABC','aa31999e-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
> >('ABC','aa319b06-8169-11e6-bab9-0050563e2d7c', '3', 'type2')]},
> >
> >
> > sys.getsizeof(superdict)
> > 50331744
> > len(superdict)
> > 941272
> 
> So... you have a million entries in the master dictionary, each of
> which has an associated collection of data, consisting of half a dozen
> things, some of which have subthings. The very smallest an object will
> ever be on a 64-bit Linux system is 16 bytes:
> 
> >>> sys.getsizeof(object())
> 16
> 
> and most of these will be much larger:
> 
> >>> sys.getsizeof(8)
> 28
> >>> sys.getsizeof(datetime.datetime(2016, 9, 11, 15, 15, 54, 343000))
> 48
> >>> sys.getsizeof([])
> 64
> >>> sys.getsizeof(('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''))
> 80
> >>> sys.getsizeof('aa316b86-8169-11e6-bab9-0050563e2d7c')
> 85
> >>> sys.getsizeof({})
> 240
> 
> (Bear in mind that sys.getsizeof counts only the object itself, not
> the things it references - that's why the tuple can take up less space
> than one of its members.)

Thanks for this clarification!

> 
> I don't think your collections can average less than about 1KB (even
> the textual representation of your example data is about that big),
> and you have a million of them. That's a gigabyte of memory, right
> there. Your peak memory usage is showing 3GB, so most likely, my
> conservative estimates have put an absolute lower bound on this. Try
> doing everything exactly the same as you did, only without actually
> loading the pickle - then see what memory usage is. I think you'll
> find that the usage is fully legitimate.
> 
> > Thanks for any advice to save memory.
> 
> Use a database. I suggest PostgreSQL. You won't have to load
> everything into memory all at once that way, and (bonus!) you can even
> update stuff on disk without rewriting everything.

Yes it seems I haven't a chance to avoid that, especially because the dict 
example isn't smaller then it will be in real. I'm in a trade-off between 
performance and scalability , so the dict construction should be fast as 
possible and having reads+writes (using mongodb) is a performance drawback.
Christian

> ChrisA

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: memory utilization blow up with dict structure

2016-09-23 Thread Peter Otten
Christian wrote:

> Hi,
> 
> I'm wondering why python blow up a dictionary structure so much.
> 
> The ids and cat substructure could have 0..n entries but in the most cases
> they are <= 10,t is limited by <= 6.
> 
> Thanks for any advice to save memory.
> Christian
> 
> 
> Example:
> 
> {'0a0f7a3a0e09826caef1bff707785662': {'ids':
> {{'aa316b86-8169-11e6-bab9-0050563e2d7c',
>  'aa3174f0-8169-11e6-bab9-0050563e2d7c',
>  'aa319408-8169-11e6-bab9-0050563e2d7c',
>  'aa3195e8-8169-11e6-bab9-0050563e2d7c',
>  'aa319732-8169-11e6-bab9-0050563e2d7c',
>  'aa319868-8169-11e6-bab9-0050563e2d7c',
>  'aa31999e-8169-11e6-bab9-0050563e2d7c',
>  'aa319b06-8169-11e6-bab9-0050563e2d7c'},
>   't': {'type1', 'type2'},
>   'dt': datetime.datetime(2016, 9, 11, 15, 15, 54, 343000),
>   'nids': 8,
>   'ntypes': 2,
>   'cat': [('ABC', 'aa316b86-8169-11e6-bab9-0050563e2d7c', '74', ''),
>('ABC','aa3174f0-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319408-8169-11e6-bab9-0050563e2d7c','3', 'type1'),
>('ABC','aa3195e8-8169-11e6-bab9-0050563e2d7c', '3', 'type2'),
>('ABC','aa319732-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319868-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa31999e-8169-11e6-bab9-0050563e2d7c', '3', 'type1'),
>('ABC','aa319b06-8169-11e6-bab9-0050563e2d7c', '3', 'type2')]},

Not so much to save memory, but because redundant data always bears the risk 
to get out of sync:

For a value v in your dict, do

v["ids"] == {t[1] for t in v["cat"]} 
len(v["ids"]) == len(v["cat"])

v["nids"] ==  len(v["ids"])
v["ntypes"] == len(v["t"])
v["t"] == {t[-1] for t in v["cat"]} - {""}

always hold? 

And if you want to go fancy: are the IDs always 128-bit integers that share 
all but the leading 32 bits?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: pypy on windows much slower than linux/mac when using complex number type?

2016-09-23 Thread Irmen de Jong
On 20-9-2016 22:38, Irmen de Jong wrote:
> Hi,
> 
> I've stumbled across a peculiar performance issue with Pypy across some 
> different
> platforms. It was very visible in some calculation heavy code that I wrote 
> that uses
> Python's complex number type to calculate the well-known Mandelbrot set.
> 
> Pypy running the code on my Windows machine is doing okay, but when running 
> the same
> code on Pypy on different systems, the performance difference is so big it is 
> not even
> funny. The other implementations are MUCH faster than the windows one. Which 
> is quite
> unexpected because the other machines I've tested on have the same or much 
> lower
> physical CPU specs than the windows machine.  Here's the comparison:
> 
> Machine specs:
>  Windows: 64 bits Windows 7, Intel Core 2 Quad 3.4 Ghz
>  Linux: 32 bits Mint 18, Virtualbox VM on above windows machine
>  Mac mini: OS X 10.11.6, Intel Core 2 Duo 2.53 Ghz
> 
> The test code I've been using is here:
>  https://gist.github.com/irmen/c6b12b4cf88a6a4fcf5ff721c7089078
> 
> Test results:
>   function:  mandel   / iterations
>  Mac mini, Pypy 5.4.1 (64-bit):  0.81 sec / 0.65 sec
>  Linux, Pypy 5.1 (32-bit):   1.06 sec / 0.64 sec
>  Windows, Pypy 5.4.1 (32-bit):   5.59 sec / 2.87 sec
> 
> 
> What could cause such a huge difference?
> 
> Is it perhaps a compiler issue (where gcc/clang are MUCH better at optimizing 
> certain
> things, although I wonder how much of a factor this is because Pypy is doing 
> JITting by
> itself as far as I am aware)?   Or is something strange going on with the way 
> the
> complex number type is implemented?   (the difference doesn't occur when 
> using only floats)
> 
> 
> Regards
> Irmen de Jong
> 


The problem boiled down to a performance issue in window's 32 bits 
implementation of the
hypot() function   (which abs(z) uses when z is a complex number type).
The 64 bits windows crt lib version is much faster (on par with what is to be 
expected
from the linux or osx version), but  unfortunately there's no 64 bits pypy
implementation for windows.
Replacing abs(z) by sqrt(r*r+i*i) avoids the problem and is even faster still.

More details here https://bitbucket.org/pypy/pypy/issues/2401

Cheers
Irmen de Jong

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Obtain the raw line of text read by CSVDictReader when reporting errors?

2016-09-23 Thread Lawrence D’Oliveiro
On Friday, September 23, 2016 at 3:38:21 AM UTC+12, Chris Angelico wrote:
> This is why, despite the confusion it sometimes causes, we all prefer
> duck typing to static typing. The csv.DictReader wants a "file-like
> object", not necessarily a file - and in this case, all it asks is an
> iterable of lines, so a simple generator will work. This is true of
> MANY, if not all, places that a file is used.

Duck type is great for sticking pieces of Python code together.

And anybody who doesn’t like it can go Java themselves...
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: how to automate java application in window using python

2016-09-23 Thread Lawrence D’Oliveiro
On Thursday, September 22, 2016 at 8:34:20 AM UTC+12, Emile wrote:
> Hmm, then I'll have to wait longer to experience the unreliability as 
> the handful of automated gui tools I'm running has only been up 10 to 12 
> years or so.

You sound like you have a solution for the OP, then.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to import all things defined the files in a module directory in __init__.py?

2016-09-23 Thread Lawrence D’Oliveiro
On Friday, September 23, 2016 at 4:25:21 AM UTC+12, Chris Angelico wrote:
> For reference, the Decimal module (ignoring the C accelerator) is over six
> thousand lines of code, as a single module. Now, that might be pushing the
> boundaries a bit ...

What “boundaries” do you think that might be pushing? 6000 lines doesn’t sound 
unreasonable to me at all.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to import all things defined the files in a module directory in __init__.py?

2016-09-23 Thread Chris Angelico
On Sat, Sep 24, 2016 at 11:42 AM, Lawrence D’Oliveiro
 wrote:
> On Friday, September 23, 2016 at 4:25:21 AM UTC+12, Chris Angelico wrote:
>> For reference, the Decimal module (ignoring the C accelerator) is over six
>> thousand lines of code, as a single module. Now, that might be pushing the
>> boundaries a bit ...
>
> What “boundaries” do you think that might be pushing? 6000 lines doesn’t 
> sound unreasonable to me at all.

It's a large and complex module, and about at the boundary of being
broken up a bit. So it's likely to be the largest file in the stdlib
(not counting tests).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Obtain the raw line of text read by CSVDictReader when reporting errors?

2016-09-23 Thread Tim Chase
On 2016-09-23 16:58, Lawrence D’Oliveiro wrote:
> Duck type is great for sticking pieces of Python code together.
> 
> And anybody who doesn’t like it can go Java themselves...

Sorry, my source code doesn't declare that I support JavaInterface...



-tkc




-- 
https://mail.python.org/mailman/listinfo/python-list