Prakash Punnoor wrote:
Why is movaps (SSE, floating point data) instead of movdqa (SSE2. integer
data) used as store? Bug or feature? Even with -O0 compiled it is used.
Testing further: The -march=k8 seems to cause this. Leaving it out, movdqa is
used, so I guess it is a feature.
This is a feature, X86_TUNE_SSE_TYPELESS_STORES. It is faster on K8.
Regarding MMX stuff, I have just committed the patch to 4.4 that should
fix your (and similar) problems. Unfortunatelly, it won't be backported,
as this is kind of tricky register allocator fine-tuning.
Uros.