On Tue, 12 Nov 2013, Ondřej Bílka wrote:

I am trying to get something to actually work and be accepted in
gcc. That may mean being conservative.

That also may mean that you will cover only cases where it is not needed.

A malloc will have a small per-thread cache for small requests that does
not need any locking. A performance difference will be quite small and
there may be a define which causes inlining constant size mallocs.

Sizes from 256 bytes are interesting case.

I have to disagree here. When the allocated size is large enough, the cost of malloc+free often becomes small compared to whatever work you are doing in that array. It is when the size is very small that speeding up malloc+free is essential. And you are underestimating the cost of those small allocations.

I started on this because of an application that spends more than half of its time in malloc+free and where (almost) no allocation is larger than 100 bytes. Changing the code to not use malloc/free but other allocation strategies is very complicated because it would break abstraction layers. I used various workarounds that proved rather effective, but I would have loved for that to be unnecessary.

One difficulty with using a stack is that lifetimes cannot partially
overlap, whereas with malloc+free they can.

Which you need to solve anyway if you want to do conversion. You need to
pick a properly overlapping malloc+free area that is best to given
criteria (like that it has maximal expected memory usage.)

Here I am extending all lifetimes to the whole function (and implicitly reusing storage in loops), simplistic I know.

Using the main stack has the advantage that I don't have to think of deallocation, it happens automatically.

And what is logic of limiting sizes?

Main stack is rather small by default. And it is nice if other variables on the stack are not spread over too many memory pages.

Note that a leaf function have higher priority for stack that nonleaf ones as when you do stack allocation early it may kill lot of leaf allocations because of stack concerns.

Sorry, I have trouble understanding your sentence. Do you just mean that if there is a threshold it should be higher in leaf functions?

And using a decl instead of alloca means that even if
malloc+free was in a loop, not deallocating won't make me grow the
stack use linearly in the number of iterations of the loop.

Actually this looks like a orthogonal optimalization of memory reuse,

Replacing malloc+free with alloca (without save/restore of the stack) would be a pessimization.

you can transform

x = malloc(42);
free(x);
y = malloc(42);

to

x = malloc(42);
y = x;

It migth be feasible to teach PRE to transform loops with repeated
allocation to a initial allocation and reuse.

There are already several PRs in bugzilla. For instance PR 38318 is about transforming:

for(...){
  malloc
  ...
  free
}

into:

malloc
for(...){
  ...
}
free

I don't remember if your example of reusing a previous allocation is listed anywhere, you may want to add it if not. Those would all be very useful, please try to implement one of them. You may also be interested in optimizing the C++ dynarray class once a basic implementation is committed.


I understand the topic of optimizing allocations is very large, and powerful techniques could be developed for it. However, those bugs have been open for years and nobody has bothered going further that removing free(malloc(n)). So I think we should go with the small case that has known usecases. And when you come up with a more general framework that makes this small case irrelevant, I will gladly welcome it and we can easily remove the small case.

--
Marc Glisse

Reply via email to