[Python-Dev] Discussion related to memory leaks requested
Hi: I've spent some time performing memory leak analysis while using Python in an embedded configuration. The pattern is: Py_Initialize(); ... run empty python source file ... Py_Finalize(); I've identified several suspect areas including dictionary maitenace in import.c:~ 414 /* Clear the modules dict. */ PyDict_Clear(modules); /* Restore the original builtins dict, to ensure that any user data gets cleared. */ dict = PyDict_Copy(interp->builtins); if (dict == NULL) PyErr_Clear(); PyDict_Clear(interp->builtins); if (PyDict_Update(interp->builtins, interp->builtins_copy)) PyErr_Clear(); Py_XDECREF(dict); /* Clear module dict copies stored in the interpreter state */ Is there someone in the group that would like to discuss this topic. There seems to be other leaks as well. I'm new to Python-dev, but willing to help or work with someone who is more familiar with these areas than I. Thanks, Matt -- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Discussion related to memory leaks requested
Hi Victor: No, I'm using the new heap analysis functions in DS2015. We think we have found one issue. In the following sequence, dict has no side effects, yet it is used -- unless someone can shed light on why dict is used in this case: /* Clear the modules dict. */ PyDict_Clear(modules); /* Restore the original builtins dict, to ensure that any user data gets cleared. */ dict = PyDict_Copy(interp->builtins); if (dict == NULL) PyErr_Clear(); PyDict_Clear(interp->builtins); if (PyDict_Update(interp->builtins, interp->builtins_copy)) PyErr_Clear(); Py_XDECREF(dict); And removing dict from this sequence seems to have fixed one of the issues, yielding 14k per iteration. Simple program: Good idea. We will try that -- right now it's embedded in a more complex environment, but we have tried to strip it down to a very simple sequence. The next item on our list is memory that is not getting freed after running simple string. It's in the parsertok sequence -- it seems that the syntax tree is not getting cleared -- but this opinion is preliminary. Best, Matt On 1/13/2016 5:10 PM, Victor Stinner wrote: Hi, 2016-01-13 20:32 GMT+01:00 Matthew Paulson : I've spent some time performing memory leak analysis while using Python in an embedded configuration. Hum, did you try tracemalloc? https://docs.python.org/dev/library/tracemalloc.html https://pytracemalloc.readthedocs.org/ Is there someone in the group that would like to discuss this topic. There seems to be other leaks as well. I'm new to Python-dev, but willing to help or work with someone who is more familiar with these areas than I. Are you able to reproduce the leak with a simple program? Victor -- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Discussion related to memory leaks requested
Hi Andrew: These are all good points, and I defer to your experience -- I am new to python internals, but the fact remains that after multiple iterations of our embedded test case, we are seeing continued allocations (DS2015) and growth of the working set (windows task manager). If your are pooling resources on the free list, wouldn't you expect these items to get reused and for things to stabilize after a while? We're not seeing that. I think Victor's suggestion of a very simple test case is probably the best idea. I'll try to put that together in the next few days and if it also demonstrates the problem, then I'll submit it here. Thanks for your time and help. Best, Matt On 1/13/2016 6:45 PM, Andrew Barnert wrote: On Jan 13, 2016, at 14:49, Matthew Paulson <mailto:paul...@busiq.com>> wrote: Hi Victor: No, I'm using the new heap analysis functions in DS2015. Isn't that going to report any memory that Python's higher level allocators hold in their freelists as leaked, even though it isn't leaked? We think we have found one issue. In the following sequence, dict has no side effects, yet it is used -- unless someone can shed light on why dict is used in this case: Where do you see an issue here? The dict will have one ref, so the decref at the end should return it to the freelist. Also, it looks like there _is_ a side effect here. When you add a bunch of elements to a dict, it grows. When you delete a bunch of elements, it generally doesn't shrink. But when you clear the dict, it does shrink. So, copying it to a temporary dict, clearing it, updating it from the temporary dict, and then releasing the temporary dict should force it to shrink. So, the overall effect should be that you have a smaller hash table for the builtins dict, and a chunk of memory sitting on the freelists ready to be reused. If your analyzer is showing the freelists as leaked, this will look like a net leak rather than a net recovery, but that's just a problem in the analyzer. Of course I could be wrong, but I think the first step is to rule out the possibility that you're measuring the wrong thing... /* Clear the modules dict. */ PyDict_Clear(modules); /* Restore the original builtins dict, to ensure that any user data gets cleared. */ dict = PyDict_Copy(interp->builtins); if (dict == NULL) PyErr_Clear(); PyDict_Clear(interp->builtins); if (PyDict_Update(interp->builtins, interp->builtins_copy)) PyErr_Clear(); Py_XDECREF(dict); And removing dict from this sequence seems to have fixed one of the issues, yielding 14k per iteration. Simple program: Good idea. We will try that -- right now it's embedded in a more complex environment, but we have tried to strip it down to a very simple sequence. The next item on our list is memory that is not getting freed after running simple string. It's in the parsertok sequence -- it seems that the syntax tree is not getting cleared -- but this opinion is preliminary. Best, Matt On 1/13/2016 5:10 PM, Victor Stinner wrote: Hi, 2016-01-13 20:32 GMT+01:00 Matthew Paulson: I've spent some time performing memory leak analysis while using Python in an embedded configuration. Hum, did you try tracemalloc? https://docs.python.org/dev/library/tracemalloc.html https://pytracemalloc.readthedocs.org/ Is there someone in the group that would like to discuss this topic. There seems to be other leaks as well. I'm new to Python-dev, but willing to help or work with someone who is more familiar with these areas than I. Are you able to reproduce the leak with a simple program? Victor -- ___ Python-Dev mailing list Python-Dev@python.org <mailto:Python-Dev@python.org> https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/abarnert%40yahoo.com -- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Discussion related to memory leaks requested
Hi All: I've created a simple program to make sure I wasn't lying to you all ;-> Here it is: for (ii = 0; ii < 100; ii++) { Py_Initialize(); if ((code = Py_CompileString(p, "foo", Py_file_input)) == NULL) printf("PyRun_SimpleString() failed\n"); else { if (PyRun_SimpleString(p) == -1) printf("PyRun_SimpleString() failed\n"); Py_CLEAR(code); } Py_Finalize(); } This sequence causes about 10k growth per iteration and after many cycles, there's no indication that any pooling logic is helping. Our "useful" example is slightly more complex, and therefore may explain why I was seeing about 16k per iteration. Unless I've done something obviously wrong, I tend to believe Benjamin's claim that this issue is well known. Suggestion: I have had great success with similar problems in the past by using a pools implementation sitting on top of what I call a "block memory allocator". The bottom (block) allocator grabs large blocks from the heap and then doles them out to the pools layer, which in turn doles them out to the requester. When client memory is freed -- it is NOT -- rather it's added to the pool which contains like-sized blocks -- call it an "organized free list". This is a very, very fast way to handle high allocation frequency patterns. Finally, during shutdown, the pool simply vaporizes and the block allocator returns a the (fewer) large blocks back to the heap. This avoids thrashing the heap, forcing it to coalesce inefficiently and also avoids heap fragmentation, which can cause unwanted growth as well... Note that this would be a "hard-reset" of all allocated memory, and any global data in the text segment would also have to be cleared, but it would provide a fast, clean way to ensure that each invocation was 100% clean. I don't claim to understand all the intricacies of the many way python can be embedded, but as I said, this strategy has worked very well for me in the past building servers written in C that have to stay up for months at a time. Happy to discuss further, if anyone has any interest. Best, Matt On 1/14/2016 4:45 AM, Nick Coghlan wrote: On 14 January 2016 at 15:42, Benjamin Peterson wrote: This is a "well-known" issue. Parts of the interpreter (and especially, extension modules) cheerfully stash objects in global variables with no way to clean them up. Fixing this is a large project, which probably involves implementing PEP 489. The actual multi-phase extension module import system from 489 was implemented for 3.5, but indeed, the modules with stashed global state haven't been converted yet. I didn't think we loaded any of those by default, though... Cheers, Nick. -- ___ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com