On Tue, 12 Nov 2013, Ondřej Bílka wrote:
I am trying to get something to actually work and be accepted in
gcc. That may mean being conservative.
That also may mean that you will cover only cases where it is not needed.
A malloc will have a small per-thread cache for small requests that does
not need any locking. A performance difference will be quite small and
there may be a define which causes inlining constant size mallocs.
Sizes from 256 bytes are interesting case.
I have to disagree here. When the allocated size is large enough, the cost
of malloc+free often becomes small compared to whatever work you are doing
in that array. It is when the size is very small that speeding up
malloc+free is essential. And you are underestimating the cost of those
small allocations.
I started on this because of an application that spends more than half of
its time in malloc+free and where (almost) no allocation is larger than
100 bytes. Changing the code to not use malloc/free but other allocation
strategies is very complicated because it would break abstraction layers.
I used various workarounds that proved rather effective, but I would have
loved for that to be unnecessary.
One difficulty with using a stack is that lifetimes cannot partially
overlap, whereas with malloc+free they can.
Which you need to solve anyway if you want to do conversion. You need to
pick a properly overlapping malloc+free area that is best to given
criteria (like that it has maximal expected memory usage.)
Here I am extending all lifetimes to the whole function (and implicitly
reusing storage in loops), simplistic I know.
Using the main stack has the advantage that I don't have to think of
deallocation, it happens automatically.
And what is logic of limiting sizes?
Main stack is rather small by default. And it is nice if other variables
on the stack are not spread over too many memory pages.
Note that a leaf function have higher priority for stack that nonleaf
ones as when you do stack allocation early it may kill lot of leaf
allocations because of stack concerns.
Sorry, I have trouble understanding your sentence. Do you just mean that
if there is a threshold it should be higher in leaf functions?
And using a decl instead of alloca means that even if
malloc+free was in a loop, not deallocating won't make me grow the
stack use linearly in the number of iterations of the loop.
Actually this looks like a orthogonal optimalization of memory reuse,
Replacing malloc+free with alloca (without save/restore of the stack)
would be a pessimization.
you can transform
x = malloc(42);
free(x);
y = malloc(42);
to
x = malloc(42);
y = x;
It migth be feasible to teach PRE to transform loops with repeated
allocation to a initial allocation and reuse.
There are already several PRs in bugzilla. For instance PR 38318 is about
transforming:
for(...){
malloc
...
free
}
into:
malloc
for(...){
...
}
free
I don't remember if your example of reusing a previous allocation is
listed anywhere, you may want to add it if not. Those would all be very
useful, please try to implement one of them. You may also be interested in
optimizing the C++ dynarray class once a basic implementation is
committed.
I understand the topic of optimizing allocations is very large, and
powerful techniques could be developed for it. However, those bugs have
been open for years and nobody has bothered going further that removing
free(malloc(n)). So I think we should go with the small case that has
known usecases. And when you come up with a more general framework that
makes this small case irrelevant, I will gladly welcome it and we can
easily remove the small case.
--
Marc Glisse