Re: Missed optimization with endian and alignment independent memory access on x64

Alexander Monakov Thu, 06 Feb 2020 10:20:25 -0800

On Thu, 6 Feb 2020, Moritz Strübe wrote:
> Why is this so hard optimize? As it's quite a common pattern I'd expect that
> there would be at least some hand-coded special case optimizer. (This isn't
> criticism - I'm honestly curious.) Or is there a reason gcc shouldn't optimize
> this / Why it doesn't matter that this is missed?


The compiler would need to exploit the fact that signed overflow is undefined,
or deduce it cannot happen.
Imagine what happens in a more general case if i is INT_MAX
(so without undefined overflow i+1 would be INT_MIN):

int f(unsigned char *ptr, int i)
{
    return ptr[i] | ptr[i+1] << 8;
}

With 64-bit address space this might access two bytes 4GB apart.

But you're right that it's a missed optimization in GCC, so you can file it
to the GCC Bugzilla.

> Is there a way to write such code that gcc optimizes?

Simply write a function that accepts one pointer:

int load_16be(unsigned char *ptr)
{
    return ptr[0] << 8 | ptr[1];
}

and use it as load_16be(data+i) or load_16be(&data[i]).

> From a performance point of view: If I actually need two consecutive bytes,
> wouldn't it be better to load them as word and split them at the register
> level?

The question is not entirely clear to me, but usually the answer is that
it depends on the microarchitecture and details of the computations that
need to be done with loaded values. Often you'd need more than one instruction
to "split" the wide load, so it wouldn't be profitable.

Alexander

Re: Missed optimization with endian and alignment independent memory access on x64

Reply via email to