https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80479

jreiser at bitwagon dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jreiser at bitwagon dot com

--- Comment #12 from jreiser at bitwagon dot com ---
Working well with valgrind(memcheck) might be worth more than a slight increase
in speed.  How much faster [measured] is the inline version in contrast to
calling strcmp() closed subroutine, and over what distributions of inputs?  I
see the inline version use 10 registers (3,4,7,8,9,10,26,28,30,31) and at least
332 bytes of instructions, assuming at least one instruction at .L10 [not
shown] and 7 repetitions of the block at .L22 (for bytes 9 through 64 in 8-byte
chunks.)  At first glance that seems to be expensive.  Could much the same
speed be obtained by re-coding the strcmp() closed subroutine to use the
technique of the inlining?  Then valgrind(memcheck) could intercept and
re-direct the whole subroutine easily [by name], avoiding tedious analysis.

"addi 31,31,1" at .L11+8 is dead.

The opcode 'xor.' might use less energy (no carry chain) than 'subf.'

Reply via email to