Hey,
a pattern I see quite often in embedded libraries is to access an array
byte wise and shift the bits as needed (as this fixes endianness and
alignment issues). If I read two consecutive bytes and left-shift the
second by 8, I'd expect the compiler to optimize this to a word read on
a x64, as it is LE and supports unaligned reads.
Clang does this as expected, gcc however misses this.
Here are the examples: https://godbolt.org/z/qvCCNs
Thus my questions:
Why is this so hard optimize? As it's quite a common pattern I'd expect
that there would be at least some hand-coded special case optimizer.
(This isn't criticism - I'm honestly curious.) Or is there a reason gcc
shouldn't optimize this / Why it doesn't matter that this is missed?
Is there a way to write such code that gcc optimizes?
From a performance point of view: If I actually need two consecutive
bytes, wouldn't it be better to load them as word and split them at the
register level?
Cheers
Morty
--
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg
Telefon: +49 (0)911 180778-50
E-Mail: moritz.stru...@redheads.de | Web: www.redheads.de
Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843