https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013
--- Comment #2 from Devin Hussey ---
Scratch that. There is a somewhat easy way to fix this following psABI AND
using MMX with SSE.
Upon calling a function, we can have the following sequence
func:
movdq2q mm0, xmm0
movq mm1, [esp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110013
--- Comment #1 from Devin Hussey ---
As a side note, the official psABI does say that function call parameters use
MM0-MM2, if Clang follows its own rules then it means that the supposed
stability of the ABI is meaningless.
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
Closely related to bug 86541, which was fixed on x64 only.
On 32-bit, GCC passes any vector_size(8) vectors to external functions in MMX
registers, similar to how it passes 16
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
--- Comment #4 from Devin Hussey ---
Makes sense because the multiplier is what, 5 cycles on an A53?
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103781
--- Comment #2 from Devin Hussey ---
Yeah my bad, I meant SLP, I get them mixed up all the time.
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
As of GCC 11, the AArch64 backend is very greedy in trying to vectorize
mulv2di3. However, there is no mulv2di3 routine so it
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103641
--- Comment #19 from Devin Hussey ---
> The new costs on AArch64 have a vector multiplication cost of 4, which is
> very reasonable.
Would this include multv2di3 by any chance?
Because another thing I noticed is that GCC is also trying to mul
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
Created attachment 51966
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51966&action=edit
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93418
--- Comment #8 from Devin Hussey ---
Seems to work.
~ $ ~/gcc-test/bin/x86_64-pc-cygwin-gcc.exe -mavx2 -O3 _mm_sllv_bug.c
~ $ ./a.exe
Without optimizations (correct result): 8000 fff8
With optimizations (incorrect resu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93418
Devin Hussey changed:
What|Removed |Added
Build||2020-01-24 0:00
--- Comment #5 from Devin
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93418
--- Comment #3 from Devin Hussey ---
I think I found the culprit commit.
Haven't set up a GCC build tree yet, though.
https://github.com/gcc-mirror/gcc/commit/a51c4926712307787d133ba50af8c61393a9229b
Component: regression
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
Regression starting in GCC 9
Currently, GCC constant propagates the AVX2 _mm_sllv family with constant
amounts to only shift by the first element instead of
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963
--- Comment #10 from Devin Hussey ---
I also want to add that aarch64 shouldn't even be spilling; it has 32 NEON
registers and with 128 byte vectors it should only use 24.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963
--- Comment #9 from Devin Hussey ---
(In reply to Andrew Pinski from comment #6)
> Try using 128 (or 256) and you might see that aarch64 falls down similarly.
yup. Oof.
test:
sub sp, sp, #560
stp x29, x30, [sp]
m
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88963
Devin Hussey changed:
What|Removed |Added
CC||husseydevin at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
--- Comment #4 from Devin Hussey ---
I am deciding to refer to goodmul as ssemul from now on. I think it is a better
name.
I am also wondering if Aarch64 gets a benefit from this vs. scalarizing if the
value is already in a NEON register. I don'
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85048
Devin Hussey changed:
What|Removed |Added
CC||husseydevin at gmail dot com
--- Comment
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052
--- Comment #7 from Devin Hussey ---
Wait, silly me, this isn't about optimizations, this is about patterns.
It does the same thing it was doing for this code:
typedef unsigned u32x2 __attribute__((vector_size(8)));
typedef unsigned long long u
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85052
--- Comment #6 from Devin Hussey ---
The patch seems to be working.
typedef unsigned u32x2 __attribute__((vector_size(8)));
typedef unsigned long long u64x2 __attribute__((vector_size(16)));
u64x2 cvt(u32x2 in)
{
return __builtin_convertvec
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698
--- Comment #10 from Devin Hussey ---
Well what about a special type attribute or some kind of transparent_union like
thing for Intel's types? It seems that Intel's intrinsics are the main (only)
platform that uses generic types.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698
--- Comment #7 from Devin Hussey ---
I mean, sure, but how about this?
What about meeting in the middle?
-fno-lax-vector-conversions generates errors like it does now.
-flax-vector-conversions shuts GCC up.
No flag causes warnings on -Wpedanti
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705
Devin Hussey changed:
What|Removed |Added
Status|RESOLVED|UNCONFIRMED
Resolution|INVALID
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88670
Bug 88670 depends on bug 88705, which changed state.
Bug 88705 Summary: [ARM][Generic Vector Extensions] float32x4/float64x2 vector
operator overloads scalarize on NEON
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88705
What|Remov
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
For some reason, GCC scalarizes float32x4_t and float64x2_t on ARM32 NEON when
using vector extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698
--- Comment #5 from Devin Hussey ---
Well, if we are aiming for strict compliance, might as well throw out every GCC
extension in existence (including vector extensions), those aren't strictly
compliant to the C/C++ standard. /s
The whole point
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88698
--- Comment #2 from Devin Hussey ---
What I am saying is that I think -flax-vector-conversions should be default, or
we should only have minimal warnings instead of errors.
That will make generic vectors much easier to use.
It is to be noted th
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
GCC is far too strict about vector conversions.
Currently, mixing generic vector extensions and platform-specific intrinsics
almost always requires either a cast or -flax-vector-extensions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
--- Comment #2 from Devin Hussey ---
Update: I did the calculations, and twomul has the same cycle count as
goodmul_sse. vmul.i32 with 128-bit operands takes 4 cycles (I assumed it was
two), so just like goodmul_sse, it takes 11 cycles.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88605
--- Comment #4 from Devin Hussey ---
I also want to note that LLVM is probably a good place to look. They have been
pushing to remove as many intrinsic builtins as they can in favor of idiomatic
code.
This has multiple advantages:
1. You can op
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88510
Devin Hussey changed:
What|Removed |Added
Summary|GCC generates inefficient |GCC generates inefficient
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88605
--- Comment #2 from Devin Hussey ---
While __builtin_convertvector would improve the situation, the main issue here
is the blindness to some obvious patterns.
If I write this code, I want either pmovzdq or vmovl. I don't want to waste
time with
: normal
Priority: P3
Component: tree-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
If you want to, say, convert a u32x2 vector to a u64x2 while avoiding
intrinsics, good luck.
GCC doesn
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
Note: I use these typedefs here for brevity.
typedef uint64x2_t U64x2;
typedef uint32x2_t U32x2;
typedef uint32x2x2_t U32x2x2;
typedef
mal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: husseydevin at gmail dot com
Target Milestone: ---
I might be wrong, but it appears that GCC is too aggressive in its conversion
from multiplication to shift+add when targeting Thum
34 matches
Mail list logo