https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118125

Martin Jambor <jamborm at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2025-01-10
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Martin Jambor <jamborm at gcc dot gnu.org> ---
I am able to reproduce the issue on a Zen3 machine (with -flto -Ofast
-march=native), the runtime grows from 193s to 225 (i.e. those 16%)
and goes back if I disable the condition introduced by the commit
causing this.

Perf collected profile of the fast version is (functions taking more
than 0.6% of run-time):

# Samples: 824K of event 'cycles:Pu'
# Event count (approx.): 748211409931
#
# Overhead       Samples  Shared Object                   Symbol                
# ........  ............  ................... 
.....................................................................................................................................................................................................................................................................................................................................................................................................................................................
#
    31.15%        255964  parest_r_peak.mine   [.]
_ZNK12METomography5Slave8internal16SparseDirectSPEC5solveIdEEvRN6dealii6VectorIT_EE
    29.78%        244660  parest_r_peak.mine   [.]
_ZNK6dealii9SparseILUIdE5vmultIdEEvRNS_6VectorIT_EERKS5_
     6.02%         49434  parest_r_peak.mine   [.]
_ZNK6dealii12SparseMatrixIdE17precondition_SSORIdEEvRNS_6VectorIT_EERKS5_dRKSt6vectorIjSaIjEE
     4.90%         40240  parest_r_peak.mine   [.]
_ZNK6dealii12SparseMatrixIdE5vmultINS_6VectorIdEES4_EEvRT_RKT0_
     3.36%         27587  parest_r_peak.mine   [.]
_ZN6dealii8FESystemILi3ELi3EE10initializeEv.constprop.0
     2.52%         20717  parest_r_peak.mine   [.]
_ZNK6dealii15SparsityPatternclEjj
     2.38%         19500  parest_r_peak.mine   [.]
_ZN12METomography5Slave5SlaveILi3EE12GlobalMatrix15assemble_matrixERKN6dealii18TriaActiveIteratorINS4_15DoFCellAccessorINS4_10DoFHandlerILi3ELi3EEEEEEERNS0_8internal13AssemblerDataILi3EEE
     1.20%          9875  libc.so.6            [.] __memset_avx2_unaligned_erms
     1.01%          8294  parest_r_peak.mine   [.]
_ZNK6dealii16ConstraintMatrix8condenseIdNS_11BlockVectorIdEEEEvRNS_17BlockSparseMatrixIT_EERT0_
     0.74%          6146  parest_r_peak.mine   [.]
_ZNSt8_Rb_treeIjjSt9_IdentityIjESt4lessIjESaIjEE16_M_insert_uniqueERKj
     0.62%          5077  parest_r_peak.mine   [.]
_ZN12METomography13ForwardSolver35block_build_matrix_and_rhs_threadedILi3EEEvRKN6dealii10DoFHandlerIXT_EXT_EEESt4pairIjjES7_IPKNS2_10QuadratureIXT_EEEPKNS9_IXmiT_Li1EEEEERNS2_17BlockSparseMatrixIdEERNS2_11BlockVectorIdEERNS2_7Threads16DummyThreadMutexERKSt7complexIdERKS7_IPKNS2_8FunctionIXT_EEESX_ERSW_
     0.61%          5060  libstdc++.so.6.0.34  [.]
_ZSt18_Rb_tree_incrementPSt18_Rb_tree_node_base
     0.60%          4966  libc.so.6            [.] _int_malloc


whereas profiling the slow version gives:

# Samples: 951K of event 'cycles:Pu'
# Event count (approx.): 864364832562
#
# Overhead       Samples  Shared Object       Symbol                            
# ........  ............  .................. 
.....................................................................................................................................................................................................................................................................................................................................................................................................................................................
#
    40.00%        379558  parest_r_peak.mine  [.]
_ZNK12METomography5Slave8internal16SparseDirectSPEC5solveIdEEvRN6dealii6VectorIT_EE
    26.09%        247593  parest_r_peak.mine  [.]
_ZNK6dealii9SparseILUIdE5vmultIdEEvRNS_6VectorIT_EERKS5_
     5.19%         49279  parest_r_peak.mine  [.]
_ZNK6dealii12SparseMatrixIdE17precondition_SSORIdEEvRNS_6VectorIT_EERKS5_dRKSt6vectorIjSaIjEE
     4.25%         40281  parest_r_peak.mine  [.]
_ZNK6dealii12SparseMatrixIdE5vmultINS_6VectorIdEES4_EEvRT_RKT0_
     2.91%         27561  parest_r_peak.mine  [.]
_ZN6dealii8FESystemILi3ELi3EE10initializeEv.constprop.0
     2.18%         20658  parest_r_peak.mine  [.]
_ZNK6dealii15SparsityPatternclEjj
     2.03%         19244  parest_r_peak.mine  [.]
_ZN12METomography5Slave5SlaveILi3EE12GlobalMatrix15assemble_matrixERKN6dealii18TriaActiveIteratorINS4_15DoFCellAccessorINS4_10DoFHandlerILi3ELi3EEEEEEERNS0_8internal13AssemblerDataILi3EEE
     1.06%         10100  libc.so.6           [.] __memset_avx2_unaligned_erms
     0.89%          8375  parest_r_peak.mine  [.]
_ZNK6dealii16ConstraintMatrix8condenseIdNS_11BlockVectorIdEEEEvRNS_17BlockSparseMatrixIT_EERT0_
     0.64%          6125  parest_r_peak.mine  [.]
_ZNSt8_Rb_treeIjjSt9_IdentityIjESt4lessIjESaIjEE16_M_insert_uniqueERKj


The interesting thing is that I believe the change introduced in
r15-6110-g92e0e0f8177530 can only affect inliner decisions (both
heuristics and things like it redirecting various calls to
__builtin_unreachable, which apparently is happening more often now)
but the inlining decisions for the hottest function with the big
sample increase are the same in both cases.

I'll try to pin down where exactly the value range propagation leads
to a slowdown.

Reply via email to