https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84037

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
I have opened PR84102 for the missed optimizations in this particular loop.  I
believe now the interesting one is the other.

  30.25%  a.out    a.out             [.] __solv_cap_MOD_fourir2d
  24.83%  a.out    a.out             [.] __solv_cap_MOD_fourir
  24.83%  a.out    a.out             [.] __solv_cap_MOD_fourir2dx
  18.78%  a.out    [unknown]         [k] 0xffffffff813366e7

and the 551 loops are in the innermost nest of fourir() which is called/inlined
to fourir*.  It's actually not array expressions but the different inline
copies we vectorize.  Cost model for that one:

capacita2.f90:551:0: note: Cost model analysis:
  Vector inside of loop cost: 3756
  Vector prologue cost: 64
  Vector epilogue cost: 1712
  Scalar iteration cost: 516
  Scalar outside cost: 4
  Vector outside cost: 1776
  prologue iterations: 0
  epilogue iterations: 4
  Calculated minimum iters for profitability: 0
capacita2.f90:551:0: note:   Runtime profitability threshold = 8
capacita2.f90:551:0: note:   Static estimate profitability threshold = 11545608

the loop is hybrid SLP, VF is 8

capacita2.f90:551:0: note: improved number of alias checks from 36 to 6

so here it's also dependence analysis breaking down because the step is
unknown:

(compute_affine_dependence
  stmt_a: _170 = REALPART_EXPR <*a.0_107[_54]>;
  stmt_b: _196 = REALPART_EXPR <*a.0_107[_73]>;
(analyze_overlapping_iterations
  (chrec_a = 0)
  (chrec_b = 0)
  (overlap_iterations_a = [0])
  (overlap_iterations_b = [0]))
(analyze_overlapping_iterations
  (chrec_a = {((integer(kind=8)) j0_119 + 1) * iftmp.476_91, +,
iftmp.476_91}_3)
  (chrec_b = {((integer(kind=8)) j1_124 + 1) * iftmp.476_91, +,
iftmp.476_91}_3)
(analyze_siv_subscript
  siv test failed: unimplemented)
  (overlap_iterations_a = not known)
  (overlap_iterations_b = not known))
) -> dependence analysis failed

the only thing we know about iftmp.476_91 is that it isn't zero (but I'm sure
dependence analysis doesn't use that fact).  I think this specific dependence
should be computable...?  [by just dividing both chrecs by iftmp.476_91?]

Reply via email to