https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84328
Bug ID: 84328 Summary: [6 Regression] -finline-small-functions and inline keyword lead to slowdown since version 6 Product: gcc Version: 8.0.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: xyzdr4gon333 at googlemail dot com Target Milestone: --- Created attachment 43393 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43393&action=edit optimizeFlags.cpp I have a function looking like this: unsigned int interleaveTwoZeros( unsigned int n ) { n&= 0x000003ff; n = (n ^ (n << 16)) & 0xFF0000FF; n = (n ^ (n << 8)) & 0x0300F00F; n = (n ^ (n << 4)) & 0x030C30C3; n = (n ^ (n << 2)) & 0x09249249; return n; } On g++ < 6 it takes 5.7s for all optimization levels >= O1. Since g++6 it takes 6.2s when the -finline-small-functions option is specified or if the `inline` keyword is added in front of `interleaveTwoZeros`. Very interestingly, this bug also disappears when changing the function body!? What I mean is, this function: unsigned int interleaveZeros( unsigned int n ) { n &= 0x0000ffff; n = (n | (n << 8)) & 0x00FF00FF; n = (n | (n << 4)) & 0x0F0F0F0F; n = (n | (n << 2)) & 0x33333333; n = (n | (n << 1)) & 0x55555555; return n; } is exactly as fast as interleaveTwoZeros, but it isn't being slowed down by the inlining bug which appears since version 6, which seems to mean, that the the change of the constants doesn't lead to any change to the internal logic, but somehow still influences the change done by inlining. Here are the full benchmarks on my system as done with: for function in '' '-DTWO_ZEROS_VERSION' '-DTWO_ZEROS_VERSION -DMANUAL_INLINE'; do for GPP in g++-4.9 g++-5 g++-6 g++-7 g++-8; do $GPP --version | head -1; for flag in -O1 '-O1 -finline-small-functions'; do echo -n "$flag " $GPP $flag $function -std=c++11 optimizeFlags.cpp && ./a.out done done done interleaveZeros: 4.9.4 -O1 5.67675s 4.9.4 -O1 -finline-small-functions 5.65597s 5.5.0 -O1 5.63532s 5.5.0 -O1 -finline-small-functions 5.66475s 6.4.0 -O1 5.64871s 6.4.0 -O1 -finline-small-functions 5.74504s 7.3.0 -O1 5.70723s 7.3.0 -O1 -finline-small-functions 5.7509s 8.0.1 -O1 5.73126s 8.0.1 -O1 -finline-small-functions 5.65887s interleaveTwoZeros: 4.9.4 -O1 5.68634s 4.9.4 -O1 -finline-small-functions 5.67831s 5.5.0 -O1 5.70178s 5.5.0 -O1 -finline-small-functions 5.67027s 6.4.0 -O1 5.77438s 6.4.0 -O1 -finline-small-functions 6.16534s -> 10% slower! 7.3.0 -O1 5.74391s 7.3.0 -O1 -finline-small-functions 6.15133s -> 10% slower! 8.0.1 -O1 5.76954s 8.0.1 -O1 -finline-small-functions 6.13896s -> 10% slower! inline interleaveTwoZeros: 4.9.4 -O1 5.6749s 4.9.4 -O1 -finline-small-functions 5.64078s 5.5.0 -O1 5.73546s 5.5.0 -O1 -finline-small-functions 5.7754s 6.4.0 -O1 6.1316s -> 10% slower! 6.4.0 -O1 -finline-small-functions 6.13555s -> 10% slower! 7.3.0 -O1 6.12899s -> 10% slower! 7.3.0 -O1 -finline-small-functions 6.15963s -> 10% slower! 8.0.1 -O1 6.17762s -> 10% slower! 8.0.1 -O1 -finline-small-functions 6.15857s -> 10% slower!