https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
Andrew Pinski changed:
What|Removed |Added
See Also||https://gcc.gnu.org/bugzill
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
--- Comment #6 from Andrew Pinski ---
GCC 11 produces:
```
_Z3fooPiS_:
.LFB0:
.cfi_startproc
vmovdqu (%rdi), %ymm2
vmovdqu 32(%rdi), %ymm3
vpmulld (%rsi), %ymm2, %ymm1
vpmulld 32(%rsi), %ymm3, %ymm0
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
Vedran Miletic changed:
What|Removed |Added
CC||rivanvx at gmail dot com
--- Comment #5
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
Igor Zamyatin changed:
What|Removed |Added
CC||izamyatin at gmail dot com
---
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
--- Comment #3 from Richard Biener 2013-03-21
15:11:14 UTC ---
Well, while true we don't adjust tuning based on that. Use -march=core-avx2
instead.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
--- Comment #2 from Ondrej Bilka 2013-03-21 14:53:26
UTC ---
On Thu, Mar 21, 2013 at 01:30:42PM +, rguenth at gcc dot gnu.org wrote:
>
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
>
>
>
> --- Comment #1 from Richard
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56676
--- Comment #1 from Richard Biener 2013-03-21
13:30:42 UTC ---
I believe we split unaligned loads by default because that's faster for generic
tuning.