http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59659

Markus Trippelsdorf <trippels at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |trippels at gcc dot gnu.org

--- Comment #5 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #3)
> I think this is somewhat related to e.g.:
> struct A { A (); A (int); ~A (); };
> void bar (A *);
> 
> #define T10(N) N, N, N, N, N, N, N, N, N, N
> #define T100(N) T10(N), T10(N), T10(N), T10(N), T10(N), \
> T10(N), T10(N), T10(N), T10(N), T10(N)
> #define T1000(N) T100(N), T100(N), T100(N), T100(N), T100(N), \
>  T100(N), T100(N), T100(N), T100(N), T100(N)
> #define T10000(N) T1000(N), T1000(N), T1000(N), T1000(N), T1000(N), \
>   T1000(N), T1000(N), T1000(N), T1000(N), T1000(N)
> #define T100000(N) T10000(N), T10000(N), T10000(N), T10000(N), T10000(N), \
>    T10000(N), T10000(N), T10000(N), T10000(N), T10000(N)
> 
> void
> foo ()
> {
>   A a[] = { 1, 2, T1000 (3), T10000 (4), T1000 (3), 2, 1 };
>   bar (a);
> }
> 
> also taking long time to compile and generating enormous amount of code (and
> when replacing T10000 (4) with T100000 (4) it is even much worse).

I don't think they are directly related.
For your testcase clang is even slower than gcc (1:51min vs. 21.8sec).
perf report shows:
  6.92%  cc1plus  cc1plus   [.] get_value_for_expr(tree_node*, bool)
  4.98%  cc1plus  cc1plus   [.] canonicalize_value(prop_value_d*)
  3.56%  cc1plus  cc1plus   [.] mark_used_flags(rtx_def*, int)
  3.01%  cc1plus  cc1plus   [.] ccp_visit_phi_node(gimple_statement_base*)
  2.97%  cc1plus  cc1plus   [.] redirect_eh_edge_1(edge_def*, basic_block_def*,
bool)
  2.87%  cc1plus  cc1plus   [.] remove_eh_landing_pad(eh_landing_pad_d*)
  2.86%  cc1plus  cc1plus   [.] vrp_meet(value_range_d*, value_range_d*)


On the testcase from Mark (with array size 10000) clang is much 
faster (0.7sec vs. 15.7sec) and perf report shows that the multiplication in
get_ref_base_and_extent is mostly responsible:
  9.95%  cc1plus  cc1plus  [.] get_ref_base_and_extent(tree_node*, long*,
long*, long*)
  7.94%  cc1plus  cc1plus  [.] mul_double_wide_with_sign(unsigned long, long,
unsigned long, long, unsigned long*, long*, unsigned long*, long*, bool)
  7.40%  cc1plus  cc1plus  [.] hash_table<vn_reference_hasher,
xcallocator>::find_slot_with_hash(vn_reference_s const*, unsigned int,
insert_option)
  4.22%  cc1plus  cc1plus  [.] component_ref_field_offset(tree_node*)
  4.18%  cc1plus  cc1plus  [.] record_store(rtx_def*, bb_info*)
  3.81%  cc1plus  cc1plus  [.] hash_table_mod2(unsigned int, unsigned int)
  3.59%  cc1plus  cc1plus  [.] array_ref_low_bound(tree_node*)

Reply via email to