Hey,

a pattern I see quite often in embedded libraries is to access an array byte wise and shift the bits as needed (as this fixes endianness and alignment issues). If I read two consecutive bytes and left-shift the second by 8, I'd expect the compiler to optimize this to a word read on a x64, as it is LE and supports unaligned reads.
Clang does this as expected, gcc however misses this.

Here are the examples: https://godbolt.org/z/qvCCNs

Thus my questions:

Why is this so hard optimize? As it's quite a common pattern I'd expect that there would be at least some hand-coded special case optimizer. (This isn't criticism - I'm honestly curious.) Or is there a reason gcc shouldn't optimize this / Why it doesn't matter that this is missed?

Is there a way to write such code that gcc optimizes?

From a performance point of view: If I actually need two consecutive bytes, wouldn't it be better to load them as word and split them at the register level?

Cheers
Morty

--
Redheads Ltd. Softwaredienstleistungen
Schillerstr. 14
90409 Nürnberg

Telefon: +49 (0)911 180778-50
E-Mail: moritz.stru...@redheads.de | Web: www.redheads.de

Geschäftsführer: Andreas Hanke
Sitz der Gesellschaft: Lauf
Amtsgericht Nürnberg HRB 22681
Ust-ID: DE 249436843

Reply via email to