http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53346

Uros Bizjak <ubizjak at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2012-05-17
     Ever Confirmed|0                           |1

--- Comment #3 from Uros Bizjak <ubizjak at gmail dot com> 2012-05-17 18:29:12 
UTC ---
Confirmed, -O2 vs. -O2 -ftree-vectorize on x86_64:

-O2 -ftree-vectorize:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 43.83      9.73     9.73       64     0.15     0.15  cptrf2_
 40.68     18.76     9.03     6685     0.00     0.00  trs2a2.2054
  7.70     20.47     1.71       64     0.03     0.03  gentrs_
  1.49     20.80     0.33       64     0.01     0.01  cptrf1_
  1.40     21.11     0.31        1     0.31    12.33  matsim_
  1.40     21.42     0.31     6685     0.00     0.00  invima.2045
  1.13     21.67     0.25       64     0.00     0.00  cmpcpt_

-O2:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls   s/call   s/call  name    
 55.20      9.20     9.20     6685     0.00     0.00  trs2a2.2054
 23.40     13.10     3.90       64     0.06     0.06  cptrf2_
 10.38     14.83     1.73       64     0.03     0.03  gentrs_
  2.58     15.26     0.43       64     0.01     0.01  cptrf1_
  2.34     15.65     0.39     6685     0.00     0.00  invima.2045
  1.98     15.98     0.33        1     0.33     6.58  matsim_
  1.14     16.17     0.19       64     0.00     0.00  cmpcpt_

cptrf2_ runtime increased for almost 6 seconds!

The only vectorization is in:

3530: LOOP VECTORIZED.
rnflow.f90:3510: note: vectorized 1 loops in function.

Which corresponds to:

! ______________________________________________________________________
      real, dimension (1:nxtr), intent (in)     :: xxtrt ! extrema
      integer, intent (in)                      :: nxtr  ! leur nombre
      integer, dimension (1:nxtr), intent (out) :: ixtrt ! indices
      integer, intent (out)                     :: kerr  ! code d'erreur
! ______________________________________________________________________
!
      kerr = 0
      ixtrt = 0                  <<<<<<<<<<<<<< HERE

This vectorization results in zeroing of certain memory area:

    pxor    %xmm0, %xmm0
    leaq    (%rdx,%r8,4), %r8
    xorl    %esi, %esi
    .p2align 4,,10
    .p2align 3
.L183:
    addq    $1, %rsi
    movdqa    %xmm0, (%r8)
    addq    $16, %r8
    cmpq    %rsi, %r11
    ja    .L183

And this causes 6 second difference ?!

Reply via email to