Hi all,

I've spent a little while porting an optimization from Python 3 to
Python 2.7 (http://bugs.python.org/issue4753).  The idea of the patch is
to improve performance by dispatching opcodes on computed labels rather
than a big switch -- and so confusing the branch predictor less.

The problem with this is that the last bit of code for each opcode ends
up being the same, so common subexpression elimination wants to coalesce
all these bits, which neatly and completely nullifies the point of the
optimization.  Playing around just building from source directly, it
seems that -fno-gcse prevents gcc from doing this, and the resulting
interpreter shows a small performance improvement over a build that does
not include the patch.

However, when I build a debian package containing the patch, I see no
improvement at all.  My theory, and I'd like you guys to tell me if this
makes sense, is that this is because the Debian package uses link time
optimization, and so even though I carefully compile ceval.c with
-fno-gcse, the common subexpression elimination happens anyway at link
time.  I've tried staring at disassembly to confirm or deny this but I
don't know ARM assembly very well and the compiled function is roughtly
10k instructions long so I didn't get very far with this (I can supply
the disassembly if someone wants to see it!).

Is there some way I can tell GCC to not compile perform CSE on a section
of code?  I guess I can make sure that the whole program, linker step
and all, is compiled with -fno-gcse but that seems a bit of a blunt
hammer.

I'd also be interested if you think this class of optimization makes
little sense on ARM and then I'll stop and find something else to do :-)

Cheers,
mwh

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to