[issue39061] Garbage Collection optimizations cause "memory leak"

2019-12-16 Thread Maxime Istasse


New submission from Maxime Istasse :

When working on a self-referencing object in the young generation and the 
middle-generation collection kicks in, that object is directly moved to the old 
generation. (if I understood this well: 
https://github.com/python/cpython/blob/d68b592dd67cb87c4fa862a8d3b3fd0a7d05e113/Modules/gcmodule.c#L1192)
Then, it won't be freed until the old generation is collected, which happens to 
be much later. (because of this: 
https://github.com/python/cpython/blob/d68b592dd67cb87c4fa862a8d3b3fd0a7d05e113/Modules/gcmodule.c#L1388)

It happens to cause huge memory leaks if the self-referencing objects occupies 
a lot of RAM, which should be expected.

This is of course the kind of problem that I expect with garbage collection 
with bad parameters.

However, I also expected that playing with threshold0 could have been 
sufficient to solve it. However, the fact that we move the object to old 
generation every time the middle collection pops in forces the problem to 
happen once in a while, and in the end reaching very high memory consumption.

I think the best and simplest solution would be to move the objects one 
generation at a time. This would avoid the heavy but short-lived objects to 
make it to the old generation.

--
components: Interpreter Core
files: late_gc.py
messages: 358469
nosy: mistasse
priority: normal
severity: normal
status: open
title: Garbage Collection optimizations cause "memory leak"
type: resource usage
versions: Python 3.7
Added file: https://bugs.python.org/file48780/late_gc.py

___
Python tracker 
<https://bugs.python.org/issue39061>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39061] Garbage Collection optimizations cause "memory leak"

2019-12-16 Thread Maxime Istasse


Maxime Istasse  added the comment:

TLDR; a short-lived object can make it directly from young generation to old 
generation if middle generation collection kicks in while it is not freeable 
yet. Old generation is very rarely collected. Several of those objects, if they 
imply cyclic references, can therefore stack there and use a lot of RAM if big 
objects are attached to them. (if no cyclic refs, refcount goes to 0 and 
everything is OK)

This seems to happen in 3.8 as well, most likely in old versions as well. To 
me, those conditions shouldn't be exceptional enough to be ignored. 
I'm beginning to work on a fix, no guarantee yet though...

--
versions:  -Python 3.7

___
Python tracker 
<https://bugs.python.org/issue39061>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39061] Garbage Collection optimizations cause "memory leak"

2019-12-16 Thread Maxime Istasse


Maxime Istasse  added the comment:

> The fact that the GC will take longer time does not qualify this as memory 
> leaks. A memory leak is by definition memory that cannot be reclaimed and in 
> this case, once the collection of the old generation happens it will be 
> collected, therefore is not a "leak" per se.

I completely agree, I did not know what to call that. It's just that I was 
really believing to write GC-friendly code, but that my assumptions were very 
wrong. The result of my investigation seems so counter-intuitive that I tend to 
believe this is a bug introduced by a quick implementation shortcut and an 
optimization.

I wouldn't have reported it if I had been able to mitigate it by setting the gc 
parameters. My first idea was to set a threshold0 such that I'm certain I don't 
keep references for that amount of time. But the one I'm using at the moment of 
the second generation collection will go in the old generation, it is the only 
one but it will for sure. As those are the only "new" objects that go to the 
old generation, it really takes a long time for it to grow sufficiently to get 
collected.

> This has also other downsides, like objects that won't be collected will 
> suffer more traversals and collections, that can be impactful in performance, 
> so is not that simple.

I don't think it will be that impactful because of traversals and collections. 
I think it was mostly convenient to merge the generations in the "collect" 
function, and that not merging them will be a bit more tedious.

If there is a second gen collection every 10 young gen collection, then it just 
introduces some more objects in the next second gen collection every second gen 
collection rather than putting them in the old generation directly.

To me, it is unintended that all the objects that are reachable during a second 
gen collection are put in the old generation. There is a high probability we 
have some short-lived objects there. It wouldn't be as problematic to traverse 
them once more in the next second gen. I tend to believe it is the purpose of 
the old gen not to be reachable in less than 2 passes.

Hopefully, for my personal, non artificial case, there are other assumptions I 
can make so I used weakrefs. I have one parent with children pointing to it, 
they just point with weakrefs now, and I know I always keep a reference to the 
parent or are OK to let them all go as refcount(parent) == 0. I'm very grateful 
I don't have to do gc.collect, which indeed was the next option.

Thank you for taking the problem seriously, and for the time you may dedicate 
to it. No need to be quick, I just wanted to raise that question. I am of 
course interested in the bad consequences it could bring (at the core of such a 
broadly used language, I would expect there are some), but at the same time, it 
is such a rare event and very localized and counter-intuitive in the 
implementation that it would surprise me.

--

___
Python tracker 
<https://bugs.python.org/issue39061>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com