[Python-Dev] Discussion related to memory leaks requested

2016-01-13 Thread Matthew Paulson

Hi:

I've spent some time performing memory leak analysis while using Python 
in an embedded configuration.


The pattern is:

   Py_Initialize();

   ... run empty python source file ...

   Py_Finalize();


I've identified several suspect areas including dictionary maitenace in 
import.c:~ 414


/* Clear the modules dict. */
PyDict_Clear(modules);
/* Restore the original builtins dict, to ensure that any
   user data gets cleared. */
dict = PyDict_Copy(interp->builtins);
if (dict == NULL)
PyErr_Clear();
PyDict_Clear(interp->builtins);
if (PyDict_Update(interp->builtins, interp->builtins_copy))
PyErr_Clear();
Py_XDECREF(dict);
/* Clear module dict copies stored in the interpreter state */


Is there someone in the group that would like to discuss this topic.  
There seems to be other leaks as well.  I'm new to Python-dev, but 
willing to help or work with someone who is more familiar with these 
areas than I.


Thanks,

Matt


--
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Discussion related to memory leaks requested

2016-01-13 Thread Matthew Paulson

Hi Victor:

No, I'm using the new heap analysis functions in DS2015.  We think we 
have found one issue. In the following sequence, dict has no side 
effects, yet it is used -- unless someone can shed light on why dict is 
used in this case:


/* Clear the modules dict. */
PyDict_Clear(modules);
/* Restore the original builtins dict, to ensure that any
   user data gets cleared. */
dict = PyDict_Copy(interp->builtins);
if (dict == NULL)
PyErr_Clear();
PyDict_Clear(interp->builtins);
if (PyDict_Update(interp->builtins, interp->builtins_copy))
PyErr_Clear();
Py_XDECREF(dict);

And removing dict from this sequence seems to have fixed one of the 
issues, yielding 14k per iteration.


Simple program: Good idea.  We will try that -- right now it's embedded 
in a more complex environment, but we have tried to strip it down to a 
very simple sequence.


The next item on our list is memory that is not getting freed after 
running simple string.  It's in the parsertok sequence -- it seems that 
the syntax tree is not getting cleared -- but this opinion is preliminary.


Best,

Matt

On 1/13/2016 5:10 PM, Victor Stinner wrote:

Hi,

2016-01-13 20:32 GMT+01:00 Matthew Paulson :

I've spent some time performing memory leak analysis while using Python in an 
embedded configuration.

Hum, did you try tracemalloc?

https://docs.python.org/dev/library/tracemalloc.html
https://pytracemalloc.readthedocs.org/


Is there someone in the group that would like to discuss this topic.  There 
seems to be other leaks as well.  I'm new to Python-dev, but willing to help or 
work with someone who is more familiar with these areas than I.

Are you able to reproduce the leak with a simple program?

Victor




--
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Discussion related to memory leaks requested

2016-01-13 Thread Matthew Paulson

Hi Andrew:

These are all good points, and I defer to your experience -- I am new to 
python internals, but the fact remains that after multiple iterations of 
our embedded test case, we are seeing continued allocations (DS2015) and 
growth of the working set (windows task manager).  If your are pooling 
resources on the free list, wouldn't you expect these items to get 
reused and for things to stabilize after a while?  We're not seeing that.


I think Victor's suggestion of a very simple test case is probably the 
best idea.  I'll try to put that together in the next few days and if it 
also demonstrates the problem, then I'll submit it here.


Thanks for your time and help.

Best,

Matt

On 1/13/2016 6:45 PM, Andrew Barnert wrote:
On Jan 13, 2016, at 14:49, Matthew Paulson <mailto:paul...@busiq.com>> wrote:



Hi Victor:

No, I'm using the new heap analysis functions in DS2015.


Isn't that going to report any memory that Python's higher level 
allocators hold in their freelists as leaked, even though it isn't leaked?


We think we have found one issue. In the following sequence, dict has 
no side effects, yet it is used -- unless someone can shed light on 
why dict is used in this case:


Where do you see an issue here? The dict will have one ref, so the 
decref at the end should return it to the freelist.


Also, it looks like there _is_ a side effect here. When you add a 
bunch of elements to a dict, it grows. When you delete a bunch of 
elements, it generally doesn't shrink. But when you clear the dict, it 
does shrink. So, copying it to a temporary dict, clearing it, updating 
it from the temporary dict, and then releasing the temporary dict 
should force it to shrink.


So, the overall effect should be that you have a smaller hash table 
for the builtins dict, and a chunk of memory sitting on the freelists 
ready to be reused. If your analyzer is showing the freelists as 
leaked, this will look like a net leak rather than a net recovery, but 
that's just a problem in the analyzer.


Of course I could be wrong, but I think the first step is to rule out 
the possibility that you're measuring the wrong thing...



/* Clear the modules dict. */
PyDict_Clear(modules);
/* Restore the original builtins dict, to ensure that any
   user data gets cleared. */
dict = PyDict_Copy(interp->builtins);
if (dict == NULL)
PyErr_Clear();
PyDict_Clear(interp->builtins);
if (PyDict_Update(interp->builtins, interp->builtins_copy))
PyErr_Clear();
Py_XDECREF(dict);

And removing dict from this sequence seems to have fixed one of the 
issues, yielding 14k per iteration.


Simple program: Good idea.  We will try that -- right now it's 
embedded in a more complex environment, but we have tried to strip it 
down to a very simple sequence.


The next item on our list is memory that is not getting freed after 
running simple string.  It's in the parsertok sequence -- it seems 
that the syntax tree is not getting cleared -- but this opinion is 
preliminary.


Best,

Matt

On 1/13/2016 5:10 PM, Victor Stinner wrote:

Hi,

2016-01-13 20:32 GMT+01:00 Matthew Paulson:

I've spent some time performing memory leak analysis while using Python in an 
embedded configuration.

Hum, did you try tracemalloc?

https://docs.python.org/dev/library/tracemalloc.html
https://pytracemalloc.readthedocs.org/


Is there someone in the group that would like to discuss this topic.  There 
seems to be other leaks as well.  I'm new to Python-dev, but willing to help or 
work with someone who is more familiar with these areas than I.

Are you able to reproduce the leak with a simple program?

Victor




--

___
Python-Dev mailing list
Python-Dev@python.org <mailto:Python-Dev@python.org>
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/abarnert%40yahoo.com


--
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Discussion related to memory leaks requested

2016-01-14 Thread Matthew Paulson

Hi All:

I've created a simple program to make sure I wasn't lying to you all ;->

Here it is:

for (ii = 0; ii < 100; ii++)
{
Py_Initialize();

if ((code = Py_CompileString(p, "foo", Py_file_input)) == NULL)
printf("PyRun_SimpleString() failed\n");
else
{
if (PyRun_SimpleString(p) == -1)
printf("PyRun_SimpleString() failed\n");

Py_CLEAR(code);
}

Py_Finalize();
}

This sequence causes about 10k growth per iteration and after many 
cycles, there's no indication that any pooling logic is helping. Our 
"useful" example is slightly more complex, and therefore may explain why 
I was seeing about 16k per iteration.


Unless I've done something obviously wrong, I tend to believe Benjamin's 
claim that this issue is well known.


Suggestion: I have had great success with similar problems in the past 
by using a pools implementation sitting on top of what I call a "block 
memory allocator".   The bottom (block) allocator grabs large blocks 
from the heap and then doles them out to the pools layer, which in turn 
doles them out to the requester.  When client memory is freed -- it is 
NOT -- rather it's added to the pool which contains like-sized blocks -- 
call it an "organized free list". This is a very, very fast way to 
handle high allocation frequency patterns.  Finally, during shutdown, 
the pool simply vaporizes and the block allocator returns a the (fewer) 
large blocks back to the heap.  This avoids thrashing the heap, forcing 
it to coalesce inefficiently and also avoids heap fragmentation, which 
can cause unwanted growth as well...


Note that this would be a "hard-reset" of all allocated memory, and any 
global data in the text segment would also have to be cleared, but it 
would provide a fast, clean way to ensure that each invocation was 100% 
clean.


I don't claim to understand all the intricacies of the many way python 
can be embedded, but as I said, this strategy has worked very well for 
me in the past building servers written in C that have to stay up for 
months at a time.


Happy to discuss further, if anyone has any interest.

Best,

Matt




On 1/14/2016 4:45 AM, Nick Coghlan wrote:

On 14 January 2016 at 15:42, Benjamin Peterson  wrote:

This is a "well-known" issue. Parts of the interpreter (and especially,
extension modules) cheerfully stash objects in global variables with no
way to clean them up. Fixing this is a large project, which probably
involves implementing PEP 489.

The actual multi-phase extension module import system from 489 was
implemented for 3.5, but indeed, the modules with stashed global state
haven't been converted yet.

I didn't think we loaded any of those by default, though...

Cheers,
Nick.



--
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com