from:"y.usishchev at samsung dot com"

[Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array

2013-09-26 Thread y.usishchev at samsung dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57223

Usishchev Yury  changed:

   What|Removed |Added

 CC||y.usishchev at samsung dot com

--- Comment #3 from Usishchev Yury  ---
I'm testing it on current trunk, and second loop is not vectorized both with
floating point and integer types.
For floating point types it is not vectorized due to control flow in loop:

 :
// ...
if (t_56 > _61)
  goto ;
else
  goto ;
 :
 :
# iftmp.2_7 = PHI <_61(16), t_56(15)>

This can be optimized to MIN_EXPR in phiopt pass, but is not because of NaNs:

tree-ssa-phiopt.c:876:
  /* The optimization may be unsafe due to NaNs.  */
  if (HONOR_NANS (TYPE_MODE (type)))
return false;

If compiled with -ffinite-math-only second loop still is not vectorised:

not_always_good.c:16:7: note: not vectorized: latch block not empty.

(same occurs with integer types). Latch block it that case is:

 :
  pretmp_176 = *prephitmp_173;
  goto ;

the statement here is generated in pre pass.

For vectorization to work we can either not generate it in pre or move it into
head of the loop in vectorizer.

Right now i'm trying to find how to prevent pre from generating statements in
empty latch blocks.

[Bug target/66433] New: Arm NEON postincrement optimization missed

2015-06-05 Thread y.usishchev at samsung dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66433

Bug ID: 66433
   Summary: Arm NEON postincrement optimization missed
   Product: gcc
   Version: 6.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: y.usishchev at samsung dot com
  Target Milestone: ---

Created attachment 35701
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35701&action=edit
test with vld and vst

GCC from trunk, configured with --target=armv7l-tizen-linux-gnueabi with
options "-O2 -mfpu=neon" on attached testcase does not generate autoincrement
for vld/vst instructions.

auto-inc-dec pass ignores possibilities of optimization vld/vst instructions:
for code

for () { //some loop
  s0_32x4 = vld1q_u32(s);
  s1_32x4 = vld1q_u32(s+4);
  s+=8;
  ...
}

gcc generates

vld1.32 {d6-d7}, [r1]
add.w   r4, r1, #16
addsr1, #32
vld1.32 {d28-d29}, [r4]

instead of

vld1.32 {d6-d7}, [r1]!
vld1.32 {d28-d29}, [r1]!

This is caused by presumably wrong cost estimation:
vld1.32 instruction without increment costs 4, but with increment its cost is
16 (gcc/config/arm/arm.c:9415):

case MEM:
  if (REG_P (XEXP (x, 0)))
  *cost = COSTS_N_INSNS (1);
  ...
  else
  *cost = COSTS_N_INSNS (ARM_NUM_REGS (mode));

[Bug target/66433] Arm NEON postincrement optimization missed

2015-06-05 Thread y.usishchev at samsung dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66433

--- Comment #1 from Usishchev Yury  ---
Created attachment 35702
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35702&action=edit
patch with fix

Attached patch that, in my opinion, fixes the issue

[Bug tree-optimization/57223] Auto-vectorization fails for nested multiple loops depending on type of array

[Bug target/66433] New: Arm NEON postincrement optimization missed

[Bug target/66433] Arm NEON postincrement optimization missed

3 matches

Site Navigation

Mail list logo

Footer information