[Bug target/69663] New: [ARM] Implement overflow arithmetic standard names {u,}{add,sub,mul}v4

2016-02-03 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69663

Bug ID: 69663
   Summary: [ARM] Implement overflow arithmetic standard names
{u,}{add,sub,mul}v4
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

Similar to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543 for aarch64,
implement overflow arithmetic standard names for aarch32.

[Bug target/70008] New: [ARM] Reverse subtract with carry can be generated in thumb2 mode

2016-02-28 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70008

Bug ID: 70008
   Summary: [ARM] Reverse subtract with carry can be generated in
thumb2 mode
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The following pattern in arm.md allows the 'rsc' instruction to be generated in
thumb2 mode.

(define_insn "*subsi3_carryin"
  [(set (match_operand:SI 0 "s_register_operand" "=r,r")
(minus:SI (minus:SI (match_operand:SI 1 "reg_or_int_operand" "r,I")
(match_operand:SI 2 "s_register_operand" "r,r"))
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]
  "TARGET_32BIT"
  "@
   sbc%?\\t%0, %1, %2
   rsc%?\\t%0, %2, %1"

TARGET_32BIT includes the thumb2 architecture which does does support the
reverse subtract with carry (rsc) instruction.

I will post a patch upstream after testing.

[Bug target/70014] New: [ARM] Predicate does not match constraint (*subsi3_carryin_const)

2016-02-28 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70014

Bug ID: 70014
   Summary: [ARM] Predicate does not match constraint
(*subsi3_carryin_const)
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The predicate of operand 1 of the "*subsi3_carryin_const" pattern:

(define_insn "*subsi3_carryin_const"
  [(set (match_operand:SI 0 "s_register_operand" "=r")
(minus:SI (plus:SI (match_operand:SI 1 "reg_or_int_operand" "r")
   (match_operand:SI 2 "arm_not_operand" "K"))
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0]

allows for const_int but the constraint only allows for registers. The solution
is to change the predicate to disallow const_int operands. I will post a patch
upstream after testing is complete.

[Bug target/70008] [ARM] Reverse subtract with carry can be generated in thumb2 mode

2016-03-02 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70008

--- Comment #2 from Michael Collison  ---
Richard,

As discussed upstream you are correct.

[Bug rtl-optimization/69008] gcc emits unneeded memory access when passing trivial structs by value (ARM)

2016-05-17 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69008

Michael Collison  changed:

   What|Removed |Added

  Component|target  |rtl-optimization

--- Comment #3 from Michael Collison  ---
Changed to component rtl optimization. Occurs on other targets as well. On
MIPS, for example, generates two redundant store to memory at -O2:

(insn 5 23 6 (set (mem/c:SI (reg/f:SI 29 $sp) [1 t+0 S4 A32])
(reg:SI 4 $4)) bugzilla_69008.c:7 310 {*movsi_internal}
 (nil))
(insn 6 5 18 (set (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
(const_int 4 [0x4])) [1 t+4 S4 A32])
(reg:SI 5 $5)) bugzilla_69008.c:7 310 {*movsi_internal}
 (nil))
(insn 18 6 30 (use (reg/i:SI 2 $2)) bugzilla_69008.c:9 -1
 (nil))
(note 30 18 27 NOTE_INSN_EPILOGUE_BEG)
(insn 27 30 32 (clobber (reg:SI 28 $28)) bugzilla_69008.c:9 -1
 (nil))
(insn 32 27 29 (sequence [
(jump_insn 28 27 17 (simple_return) bugzilla_69008.c:9 629
{*simple_return}
 (nil)
 -> simple_return)
(insn 17 28 29 (set (reg/i:SI 2 $2)
(plus:SI (reg:SI 4 $4)
(reg:SI 5 $5))) 13 {*addsi3}
 (nil))
]) bugzilla_69008.c:9 -1
 (nil))

[Bug rtl-optimization/82597] [8 Regression] ICE at -O2 and -O3 x86_64-linux-gnu in the 32-bit mode: in extract_constrain_insn, at recog.c:2207

2017-10-19 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82597

Michael Collison  changed:

   What|Removed |Added

 CC||michael.collison at linaro dot 
org

--- Comment #4 from Michael Collison  ---
I am testing a patch that adds constrain the operands in compare-elim.c. Note
however that the test case has bug in it that seems to trigger the ICE. Without
the bug in the test case, the ICE no longer occurs.

Specifically the function 'int a (h, i)' has two parameters that default to
int. However in the call to function 'a' in:

void j () { long k = f (d (1, e = c), g); a (k) && (b = 0); }

only one argument 'k' is passed. When I modify the test case to add another
argument the ICE does not occur.

[Bug rtl-optimization/82597] [8 Regression] ICE at -O2 and -O3 x86_64-linux-gnu in the 32-bit mode: in extract_constrain_insn, at recog.c:2207

2017-10-19 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82597

--- Comment #6 from Michael Collison  ---
Yes I am aware of that report. I have a fix that should be sent to gcc-patches
shortly.

[Bug debug/61033] New: Infinite loop in variable tracking

2014-05-01 Thread michael.collison at linaro dot org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033

Bug ID: 61033
   Summary: Infinite loop in variable tracking
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org

Created attachment 32722
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32722&action=edit
Preprocssed file to reproduce bug

Data flow analysis in variable tracking does not converge and causes an
infinite loop with the attached file and following compile options:

./cc1plus -quiet -fpreprocessed qmltextgenerator.ii -dumpbase
qmltextgenerator.ii -march=armv4t -mfloat-abi=soft -mtls-dialect=gnu
-auxbase-strip x.o -g -O2 -Wformat=1 -Werror=format-security -Wall -Wextra
-version -fstack-protector -fvisibility=hidden -fvisibility-inlines-hidden
-fPIC --param ssp-buffer-size=4 -o qmltextgenerator.s

GCC configured with --target=arm-linux-gnueabi


[Bug debug/61033] [4.8/4.9 Regression] Infinite loop in variable tracking

2015-05-20 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033

Michael Collison  changed:

   What|Removed |Added

   Assignee|michael.collison at linaro dot org |mkuvyrkov at gcc dot 
gnu.org

--- Comment #12 from Michael Collison  ---
GCC is looping infinitely in attempting to solve a data flow problem in
vt_find_locations. My working theory is that compute_bb_dataflow is incorrectly
asserting some value has changed. I have reduced my case to a candidate set of
basic blocks and a (candiate) RTL location this is continually removed/added.
The rtl location is:

dataflow difference found: removal of:
 name: D#255
   offset 0
 (mem/f/c:SI (value/u:SI 106:3955 @0x22cb2f8/0x23424a0) [3 result.d+0 S4
A32])


I understand that the argument to 'VALUE" is a cselib_val_struct but I don't
know how to map this back to a insn that is causing the problem. Any thoughts
on how I might debug thus further?


[Bug target/68223] New: [arm] arm_[su]min_cmp pattern fails

2015-11-05 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68223

Bug ID: 68223
   Summary: [arm] arm_[su]min_cmp pattern fails
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The patterns arm_smin_cmp and arm_umin_cmp which were added to optimize code
such as

#define min(x, y) ((x) <= (y)) ? (x) : (y)

unsigned int foo (unsigned int i, unsigned int x ,unsigned int y)
{
  return i < (min (x, y));
}

fail if (i == x) and both are less than y.

Three test cases in testsuite/gcc.dg/vect  

vect-reduc-7.c
vect-reduc-8.c
vect-reduc-9.c

fail execution tests configure with target armeb-none-linux-gnueabihf.

The solution is to remove the patterns from arm.md.

[Bug target/68223] [arm] arm_[su]min_cmp pattern fails

2015-11-05 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68223

Michael Collison  changed:

   What|Removed |Added

 Target||armeb-none-linux-gnueabihf

--- Comment #1 from Michael Collison  ---
Although the pattern fails under armeb-none-linux-gnueabihf it will also fail
on little endian arm targets. It just so happens that the pattern is not
utilized for the test cases that fail under big endian targets

[Bug target/68494] New: [ARM] Use vector multiply by lane

2015-11-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68494

Bug ID: 68494
   Summary: [ARM] Use vector multiply by lane
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The following test case should utilize vector multiply by a single lane.

short taps[4];

void fir_t5(int len, short * __restrict p, short *__restrict x, short
*__restrict taps)
{
  len = len & ~31;
  for (int i = 0; i < len; i++)
{
  int tmp = 0;
  for (int j = 0; j < NTAPS; j++)
{
  tmp += x[i - j] * taps[j];
}

  p[i] = tmp;
}
}

[Bug target/68494] [ARM] Use vector multiply by lane

2015-11-24 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68494

--- Comment #2 from Michael Collison  ---
Sorry here is the updated test case.

#define NTAPS 4

short taps[NTAPS];

void fir_t5(int len, short * __restrict p, short *__restrict x, short
*__restrict taps)
{
  len = len & ~31;
  for (int i = 0; i < len; i++)
{
  int tmp = 0;
  for (int j = 0; j < NTAPS; j++)
{
  tmp += x[i - j] * taps[j];
}

  p[i] = tmp;
}
}



We currently generate a vdup of the scalar taps[j] in the inner loop. Ideally
we do not use the vdup and insted use a vmul using a lane.

[Bug target/68532] New: [ARM] Incorrect code result on arm big endian

2015-11-24 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68532

Bug ID: 68532
   Summary: [ARM] Incorrect code result on arm big endian
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The following test case produces the incorrect answer on arm big endian with
the following options:  -O2 -ftree-vectorize -fno-vect-cost-model 
-mcpu=cortex-a8 -mfpu=neon

Result should be '960', armbe generate '992' as the result.

#include 
#include 

#define SIZE 128
unsigned short alignas(16) in[SIZE];

__attribute__ ((noinline)) int
test (unsigned short diff, unsigned short *in, int x)
{
for (int j = 0; j < SIZE; j+=8)
  diff += in[j] * x;
return diff;
}

int main()
{
for (int i = 0; i

[Bug debug/61033] [4.8/4.9 Regression] Infinite loop in variable tracking

2014-08-04 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033

--- Comment #7 from Michael Collison  ---
Charlie,

I still feel that the var tracking pass should be able to protect itself 
from an infinite loop.

On 8/4/2014 11:43 AM, cbaylis at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61033
>
> --- Comment #6 from cbaylis at gcc dot gnu.org ---
>
> git bisect points to r211625 as the revision which fixes/hides this bug on
> trunk.
>
> 2014-06-13  Richard Biener  
>
>  * tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
>  Rewrite to propagate the VN result into all uses where
>  possible and to remove stmts becoming dead because of that.
>  (eliminate): Generalize stmt removal handling, remove in
>  reverse dominator order to support proper debug stmt
>  generation.  Update stmts before removing stmts.
>  * tree-ssa-propagate.c (propagate_tree_value): Remove
>  bogus assert.
>


[Bug rtl-optimization/63365] New: [ARM] Incorrect copy propagation for vclz intrinsic

2014-09-24 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63365

Bug ID: 63365
   Summary: [ARM] Incorrect copy propagation for vclz intrinsic
   Product: gcc
   Version: 4.9.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org

Created attachment 33558
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33558&action=edit
Test case demonstrating incorrect code generation for vclz

Compiling and executing the attached testcase at -O2 produces incorrect code
and execution results. Instead of code being generated for the vclz instruction
the compiler propagates an incorrect constant value for what it believes will
be the result of the instruction. The attached test case is part of a proposed
set of DejaGnu tests for neon intrinsics.

--
Target: arm-linux-gnueabihf

Configured with:
'/home/michael-collison/neon-intrinsic-build/snapshots/gcc.git~neon-intrinsic-christophe/configure'
--with-bugurl=https://bugs.launchpad.net/gcc-linaro
--with-mpc=/home/michael-collison/neon-intrinsic-build/builds/destdir/x86_64-unknown-linux-gnu
--with-mpfr=/home/michael-collison/neon-intrinsic-build/builds/destdir/x86_64-unknown-linux-gnu
--with-gmp=/home/michael-collison/neon-intrinsic-build/builds/destdir/x86_64-unknown-linux-gnu
--with-gnu-as --with-gnu-ld --disable-libstdcxx-pch --disable-libmudflap
--with-cloog=no --with-ppl=no --with-isl=no --disable-nls --enable-multiarch
--disable-multilib --enable-c99 --with-arch=armv7-a --with-fpu=vfpv3-d16
--with-float=hard --with-mode=thumb
--with-sysroot=/opt/linaro/sysroot-linaro_eglibc-2_19-arm-clean-linux-gnueabihf
--with-build-sysroot=/home/michael-collison/neon-intrinsic-build/sysroots/arm-clean-linux-gnueabihf
--enable-lto --enable-linker-build-id --enable-long-long
--enable-languages=c,c++,fortran,go,lto
--with-bugurl=https://bugs.launchpad.net/gcc-linaro --with-pkgversion='Linaro
GCC 2014.08' --build=x86_64-unknown-linux-gnu --host=x86_64-unknown-linux-gnu
--target=arm-clean-linux-gnueabihf
--prefix=/home/michael-collison/neon-intrinsic-build/builds/destdir/x86_64-unknown-linux-gnu

Thread model: posix

gcc version 4.10.0 20140812 (experimental) (Linaro GCC 2014.08)


[Bug target/67322] [Aarch64] Exploit Wide Add operations when appropriate

2015-12-06 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67322

--- Comment #1 from Michael Collison  ---
Fixed via:
URL: https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=230853

r230853 | collison | 2015-11-24 23:51:55 -0700 (Tue, 24 Nov 2015) | 15 lines

2015-11-24  Michael Collison  

* config/aarch64/aarch64-simd.md (widen_ssum, widen_usum)
(aarch64_w_internal): New patterns
* config/aarch64/iterators.md (Vhalf, VDBLW): New mode attributes.
* gcc.target/aarch64/saddw-1.c: New test.
* gcc.target/aarch64/saddw-2.c: New test.
* gcc.target/aarch64/uaddw-1.c: New test.
* gcc.target/aarch64/uaddw-2.c: New test.
* gcc.target/aarch64/uaddw-3.c: New test.
* lib/target-support.exp
(check_effective_target_vect_widen_sum_hi_to_si_pattern):
Add aarch64 to list of support targets.

[Bug target/67322] [Aarch64] Exploit Wide Add operations when appropriate

2015-12-06 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67322

Michael Collison  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Michael Collison  ---
Resolved via:

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=230853

[Bug target/68494] [ARM] Use vector multiply by lane

2015-12-06 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68494

--- Comment #3 from Michael Collison  ---
Previous discussion thread here:

https://gcc.gnu.org/ml/gcc/2013-09/msg00061.html

[Bug target/68543] [AArch64] Implement overflow arithmetic standard names {u,}{add,sub,mul}v4 and/or negv3

2015-12-10 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543

--- Comment #2 from Michael Collison  ---
Great idea I will look into this.

On 12/10/2015 4:02 AM, ktkachov at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543
>
> --- Comment #1 from ktkachov at gcc dot gnu.org ---
> Maybe we can avoid defining custom expanders if we define
> WORD_REGISTER_OPERATIONS for aarch64.
> It's defined for arm and the documentation hints that it should be true for
> most RISC targets.
>
> Then the default fallback codegen for the given example is much improved:
> foo:
>  uxthx0, w0
>  uxthx1, w1
>  mul x0, x0, x1
>  cmp x0, x0, sxtw
>  bne .L10
>  ret
> .L10:
>  stp x29, x30, [sp, -16]!
>  add x29, sp, 0
>  bl  abort
>
>
> However, we need to investigate the other codegen effects that come with
> WORD_REGISTER_OPERATIONS, in particular to make sure that the aarch64 patterns
> cope with the slightly different strategies of using subregs and sign/zero
> extends in combine
>

[Bug target/68543] [AArch64] Implement overflow arithmetic standard names {u,}{add,sub,mul}v4 and/or negv3

2015-12-11 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543

--- Comment #4 from Michael Collison  ---
Okay thanks. After looking into the topic I did not see the direct 
connection either.

On 12/11/2015 7:21 AM, ktkachov at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68543
>
> --- Comment #3 from ktkachov at gcc dot gnu.org ---
> After some discussion on IRC, WORD_REGISTER_OPERATIONS seems wrong for aarch64
> since 32-bit operations i.e. in SImode operate like normal 32-bit operations
> because they use the 32-bit W-form of the registers. Thus they don't behave
> like word_mode operations, because word_mode is DImode on aarch64.
> So we may want to look at implementing the standard names after all
>

[Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable

2016-01-13 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #8 from Michael Collison  ---
Hi Richard,

I tried this with trunk and was unable to generate the vld3. What vectorizer
options did you use?

[Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable

2016-01-14 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #11 from Michael Collison  ---
Andrew,

It may be the case that is not a win on all microarchitectures however I think
we should allow the vectorizer to (optionally) generate the vld3 and deal with
the differences via the cost models.

[Bug middle-end/67320] New: Incorrect standard names for wide addition

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67320

Bug ID: 67320
   Summary: Incorrect standard names for wide addition
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

The internals documentation shows the standard names for widen addition as:

ssum_widenm3
usum_widenm3

In fact the standard names used by the compiler are:

widen_ssumm3
widen_usumm3


[Bug middle-end/67320] Incorrect standard names for wide addition

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67320

--- Comment #1 from Michael Collison  ---
Created attachment 36241
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36241&action=edit
Patch for widening addition doc

Proposed patch to fix wide addition documentation errors.


[Bug other/57195] Mode attributes with specific mode iterator can not be used as mode iterators in *.md files

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57195

Michael Collison  changed:

   What|Removed |Added

 CC||michael.collison at linaro dot 
org

--- Comment #3 from Michael Collison  ---
I recently ran into this problem. This patch appears to resolve the issue:

diff --git a/gcc/read-md.c b/gcc/read-md.c
index 9f158ec..df5748f 100644
--- a/gcc/read-md.c
+++ b/gcc/read-md.c
@@ -399,16 +399,24 @@ read_name (struct md_name *name)
 {
   int c;
   size_t i;
+  int in_angle_bracket;

   c = read_skip_spaces ();

   i = 0;
+  in_angle_bracket = 0;
   while (1)
 {
+  if (c == '<')
+   in_angle_bracket = 1;
+
+  if (c == '>')
+   in_angle_bracket = 0;
+
   if (c == ' ' || c == '\n' || c == '\t' || c == '\f' || c == '\r'
  || c == EOF)
break;
-  if (c == ':' || c == ')' || c == ']' || c == '"' || c == '/'
+  if (((c == ':') and (in_angle_bracket == 0)) || c == ')' || c == ']' ||
c == '"' || c == '/'
  || c == '(' || c == '[')
{
  unread_char (c);


[Bug target/67321] New: [ARM] Exploit Wide Add operations when appropriate

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67321

Bug ID: 67321
   Summary: [ARM] Exploit Wide Add operations when appropriate
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

Wide add operations are not always being generated for mixed mode adds as shown
by the following test case:

int wadd_test(int len, void * dummy, short * __restrict x)
{
  len = len & ~31;
  int result = 0;
  __asm volatile ("");
  for (int i = 0; i < len; i++)
result += x[i];
  return result;
}


[Bug target/67321] [ARM] Exploit Wide Add operations when appropriate

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67321

--- Comment #1 from Michael Collison  ---
Created attachment 36242
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36242&action=edit
Patch to generate arm vaddw


[Bug target/67322] New: [Aarch64] Exploit Wide Add operations when appropriate

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67322

Bug ID: 67322
   Summary: [Aarch64] Exploit Wide Add operations when appropriate
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

Wide add operations are not always being generated for mixed mode adds as shown
by the following test case:

int wadd_test(int len, void * dummy, short * __restrict x)
{
  len = len & ~31;
  int result = 0;
  __asm volatile ("");
  for (int i = 0; i < len; i++)
result += x[i];
  return result;
}


[Bug tree-optimization/67323] New: Use non-unit stride loads by preference when applicable

2015-08-22 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

Bug ID: 67323
   Summary: Use non-unit stride loads by preference when
applicable
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: michael.collison at linaro dot org
  Target Milestone: ---

On arm targets the following code fails to generate a vld3:

struct pixel {
  char r,g,b;
};

void 
t2(int len, struct pixel * __restrict p, struct pixel * __restrict x)
{
  len = len & ~31;
  for (int i = 0; i < len; i++){
  p[i].r = x[i].r * 2;
  p[i].g = x[i].g * 3;
  p[i].b = x[i].b * 4;
  }
}

Yes the same code with line 11 changed to:

p[i].g = x[i].g;

does generate a vld3.


[Bug other/57195] Mode attributes with specific mode iterator can not be used as mode iterators in *.md files

2015-08-23 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57195

--- Comment #5 from Michael Collison  ---
On 08/23/2015 04:50 AM, segher at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57195
>
> --- Comment #4 from Segher Boessenkool  ---
> Hello Michael,
Hi Segher,
>
> Patches should be sent to gcc-patches@.
I did send the patch upstream:

https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01366.html
>
> If you do, either make in_angle_bracket a bool, or actually count
> the nesting level; and you probably want to handle the case where
> there are more closing than opening brackets.
>
> And one of your lines is much too long ;-)

Okay thanks. I'll address that once I get feedback on the other parts of 
the patch
>


[Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable

2015-08-25 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #2 from Michael Collison  ---
Richard,

Should I create a test case that fails until you resolve this in GCC 6?

On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> Richard Biener  changed:
>
> What|Removed |Added
> 
>   Status|UNCONFIRMED |ASSIGNED
> Last reconfirmed||2015-08-25
>   CC|richard.guenther at gmail dot com  |rguenth at gcc dot 
> gnu.org
>   Depends on||66721
> Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
> gnu.org
>   Ever confirmed|0   |1
>
> --- Comment #1 from Richard Biener  ---
> Confirmed.  We go down the SLP path here because the vectorizer thinks that
> SLP is always cheaper than using interleaving (which generally is true
> if there were not targets which can do the load plus interleave with
> load-lanes ...).
>
> I think this may be a regression as well because I enhanced SLP to apply
> to way more cases.
>
> Note that my plan is to make the vectorizer consider both (well, not really,
> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
> on costs which route to go.
>
>
> Referenced Bugs:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs


[Bug tree-optimization/67323] Use non-unit stride loads by preference when applicable

2015-08-25 Thread michael.collison at linaro dot org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323

--- Comment #4 from Michael Collison  ---
Hi Richard,

No I do not have a fix now. Thanks for the info on the policy.

On 08/25/2015 03:05 AM, rguenther at suse dot de wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>
> --- Comment #3 from rguenther at suse dot de  ---
> On Tue, 25 Aug 2015, michael.collison at linaro dot org wrote:
>
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>
>> --- Comment #2 from Michael Collison  ---
>> Richard,
>>
>> Should I create a test case that fails until you resolve this in GCC 6?
> If you can provide one that I can check in together with a fix that
> would be nice.  Having it in the tree now and FAILing isn't according
> to our policies.
>
>> On 08/25/2015 02:14 AM, rguenth at gcc dot gnu.org wrote:
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67323
>>>
>>> Richard Biener  changed:
>>>
>>>  What|Removed |Added
>>> 
>>>Status|UNCONFIRMED |ASSIGNED
>>>  Last reconfirmed||2015-08-25
>>>CC|richard.guenther at gmail dot com  |rguenth at gcc 
>>> dot gnu.org
>>>Depends on||66721
>>>  Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc 
>>> dot gnu.org
>>>Ever confirmed|0   |1
>>>
>>> --- Comment #1 from Richard Biener  ---
>>> Confirmed.  We go down the SLP path here because the vectorizer thinks that
>>> SLP is always cheaper than using interleaving (which generally is true
>>> if there were not targets which can do the load plus interleave with
>>> load-lanes ...).
>>>
>>> I think this may be a regression as well because I enhanced SLP to apply
>>> to way more cases.
>>>
>>> Note that my plan is to make the vectorizer consider both (well, not really,
>>> but this bug shows I maybe should try), SLP and non-SLP, and evaluate based
>>> on costs which route to go.
>>>
>>>
>>> Referenced Bugs:
>>>
>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66721
>>> [Bug 66721] [6 regression] gcc.target/i386/pr61403.c FAILs
>>