from:"freddie at witherden dot org"

[Bug tree-optimization/117510] Inner loop with static trip count breaks vectorization of outer loop

2024-11-11 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117510 --- Comment #4 from Freddie Witherden --- (In reply to Richard Biener from comment #3) > Fixed for GCC 15. Thanks! If I have cases which, when m is a compile time constant, vectorize for m small but not m large is that likely to be a separate i

[Bug tree-optimization/117510] New: Inner loop with static trip count breaks vectorization of outer loop

2024-11-08 Thread freddie at witherden dot org via Gcc-bugs

Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Target Milestone: --- Consider the following snippet: void f(int n, int m, double *a) { #pragma omp simd for (int i = 0; i <

[Bug tree-optimization/95747] [OpenMP/Builtin] nontemporal store support

2023-08-09 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95747 Freddie Witherden changed: What|Removed |Added CC||freddie at witherden dot org

[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

2021-09-20 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404 --- Comment #4 from Freddie Witherden --- Created attachment 51485 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51485&action=edit Clang assembly.

[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

2021-09-20 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404 --- Comment #3 from Freddie Witherden --- (In reply to Richard Biener from comment #2) > 32 bytes are 256 bits (ymm), 64 bytes are 512 bits (zmm). GCC does not > consider zmm vectorization because > > t.c:25:37: missed: loop does not have eno

[Bug tree-optimization/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

2021-09-18 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102404 --- Comment #1 from Freddie Witherden --- Created attachment 51481 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51481&action=edit Generated assembly.

[Bug tree-optimization/102404] New: Loop vectorized with 32 byte vectors actually uses 16 byte vectors

2021-09-18 Thread freddie at witherden dot org via Gcc-bugs

Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Target Milestone: --- Created attachment 51480 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51480&action=edit Test case C

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

2020-10-11 Thread freddie at witherden dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71414 Freddie Witherden changed: What|Removed |Added CC||freddie at witherden dot org

[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above

2020-05-22 Thread freddie at witherden dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264 --- Comment #8 from Freddie Witherden --- (In reply to rguent...@suse.de from comment #7) > > Instead of [[gnu::flatten]] you could use the > __attribute__((always_inline)) attribute on the foo function definition > if you didn't simplify the o

[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above

2020-05-22 Thread freddie at witherden dot org

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95264 --- Comment #6 from Freddie Witherden --- (In reply to Richard Biener from comment #3) > So with the [[gnu::flatten]] attributes removed -O1 needs 80 seconds to > compile and about 3GB of memory, -O2 needs around 2 minutes (same memory), > -O3 >

[Bug c++/95264] New: Infinite Loop When Compiling Templated C++ code at -O1 and above

2020-05-21 Thread freddie at witherden dot org

Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Target Milestone: --- Created attachment 48578 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48578&action=edit Preprocessed source. When att

[Bug tree-optimization/59650] New: Inefficient vector assignment code

2013-12-31 Thread freddie at witherden dot org

-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Consider the following snippet: typedef double v4d __attribute__((vector_size(32))); v4d set1(double *v) { v4d tmp = { v[0], v[1], v[2], v[3] }; return tmp

[Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot

2013-12-13 Thread freddie at witherden dot org

: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Compiling the following snippet with the 2013-12-08 shapshot of 4.9: typedef double v4d __attribute__((vector_size(32))); v4d gather(double *base, unsigned *offt

[Bug tree-optimization/58280] Missed Opportunity for Aligned Vectorized Load

2013-08-30 Thread freddie at witherden dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58280 --- Comment #5 from Freddie Witherden --- Thank you for this information. As an alternative would it be worth considering a pragma along the lines of: #pragma gcc aligned(32) which would confer that "in the first iteration of the loop which fol

[Bug tree-optimization/58280] Missed Opportunity for Aligned Vectorized Load

2013-08-30 Thread freddie at witherden dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58280 --- Comment #3 from Freddie Witherden --- Would it be any easier --- from an implementation standpoint --- to adopt something similar to the "__assume(predicate)" directive in ICC? This would allow one to state explicitly: __assume(ldim % 32 ==

[Bug tree-optimization/58280] New: Missed Opportunity for Aligned Vectorized Load

2013-08-30 Thread freddie at witherden dot org

: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Consider void foo(int nr, int nc, int ldim, double *__restrict a, double *__restrict b) { a = __builtin_assume_aligned(a, 32); b = __builtin_assume_aligned(b, 32

[Bug tree-optimization/57962] New: Missed Optimization for Superword Level Parallelism

2013-07-23 Thread freddie at witherden dot org

Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: freddie at witherden dot org Created attachment 30541 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30541&action=edit Sample code. GCC 4.7.3 and 4.8.1 both miss an optimization when compil

[Bug c/56787] New: 4.8.0 Vectorization Regression Compared to 4.7.2

2013-03-30 Thread freddie at witherden dot org

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56787 Bug #: 56787 Summary: 4.8.0 Vectorization Regression Compared to 4.7.2 Classification: Unclassified Product: gcc Version: 4.8.0 Status: UNCONFIRMED Severity: normal

[Bug tree-optimization/117510] Inner loop with static trip count breaks vectorization of outer loop

[Bug tree-optimization/117510] New: Inner loop with static trip count breaks vectorization of outer loop

[Bug tree-optimization/95747] [OpenMP/Builtin] nontemporal store support

[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

[Bug target/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

[Bug tree-optimization/102404] Loop vectorized with 32 byte vectors actually uses 16 byte vectors

[Bug tree-optimization/102404] New: Loop vectorized with 32 byte vectors actually uses 16 byte vectors

[Bug tree-optimization/71414] 2x slower than clang summing small float array, GCC should consider larger vectorization factor for "unrolling" reductions

[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above

[Bug c++/95264] Infinite Loop When Compiling Templated C++ code at -O1 and above

[Bug c++/95264] New: Infinite Loop When Compiling Templated C++ code at -O1 and above

[Bug tree-optimization/59650] New: Inefficient vector assignment code

[Bug tree-optimization/59501] New: Vector Gather with GCC 4.9 2013-12-08 Snapshot

[Bug tree-optimization/58280] Missed Opportunity for Aligned Vectorized Load

[Bug tree-optimization/58280] Missed Opportunity for Aligned Vectorized Load

[Bug tree-optimization/58280] New: Missed Opportunity for Aligned Vectorized Load

[Bug tree-optimization/57962] New: Missed Optimization for Superword Level Parallelism

[Bug c/56787] New: 4.8.0 Vectorization Regression Compared to 4.7.2

18 matches

Site Navigation

Mail list logo

Footer information