https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67435
Bug ID: 67435 Summary: Large performance drop on apparently unrelated changes (probable cause : strange inlining side-effect) Product: gcc Version: 4.8.4 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: yann.collet.73 at gmail dot com Target Milestone: --- Some weird effect with gcc (tested version : 4.8.4). I've got a performance oriented code, which runs pretty fast. Its speed depends for a large part on inlining many small functions. There is no inline statement. All functions are either normal or static. Automatic inlining decision is solely within compiler's realm, which has worked fine so far (functions to inline are very small, typically from 1 to 5 lines). Since inlining across multiple .c files is difficult (-flto is not yet widely available), I've kept a lot of small functions into a single `.c` file, into which I'm also developing a codec, and its associated decoder. It's "relatively" large by my standard (about ~2000 lines, although a lot of them are mere comments and blank lines), but breaking it into smaller parts opens new problems, so I would prefer to avoid that, if that is possible. Encoder and Decoder are related, since they are inverse operations. But from a programming perspective, they are completely separated, sharing nothing in common, except a few typedef and very low-level functions (such as reading from unaligned memory position). The strange effect is this one : I recently added a new function fnew to the encoder side. It's a new "entry point". It's not used nor called from anywhere within the .c file. The simple fact that it exists makes the performance of the decoder function fdec drops substantially, by more than 20%, which is way too much to be ignored. Now, keep in mind that encoding and decoding operations are completely separated, they share almost nothing, save some minor typedef (u32, u16 and such) and associated operations (read/write). When defining the new encoding function fnew as static, performance of the decoder fdec increases back to normal. Since fnew isn't called from the .c, I guess it's the same as if it was not there (dead code elimination). If static fnew is now called from the encoder side, performance of fdec remains good. But as soon as fnew is modified, fdec performance just drops substantially. Presuming fnew modifications crossed a threshold, I increased the following gcc parameter : --param max-inline-insns-auto=60 (by default, its value is supposed to be 40.) And it worked : performance of fdec is now back to normal. But I guess this game will continue forever with each little modification of fnew or anything else similar, requiring further tweak on some customized advance parameter. So I want to avoid that. I tried another variant : I'm adding another completely useless function, just to play with. Its content is strictly exactly a copy-paste of fnew, but the name of the function is obviously different, so let's call it wtf. When wtf exists (on top of fnew), it doesn't matter if fnew is static or not, nor what is the value of max-inline-insns-auto : performance of fdec is just back to normal. Even though wtf is not used nor called from anywhere... :'( All these effects look plain weird. There is no logical reason for some little modification in function fnew to have knock-on effect on completely unrelated function fdec, which only relation is to be in the same file. I'm trying to understand what could be going on, in order to develop the codec more reliably. For the time being, any modification in function A can have large ripple effects (positive or negative) on completely unrelated function B, making each step a tedious process with random outcome. A developer's nightmare.