date:20210911

[PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-11 Thread Jakub Jelinek via Gcc-patches

On Fri, Sep 03, 2021 at 02:47:11PM +, Qing Zhao via Gcc-patches wrote:
> > 2021-08-20  qing zhao  
> > 
> >* c-c++-common/auto-init-1.c: New test.
> >* c-c++-common/auto-init-10.c: New test.
> >* c-c++-common/auto-init-11.c: New test.
> >* c-c++-common/auto-init-12.c: New test.
> >* c-c++-common/auto-init-13.c: New test.
> >* c-c++-common/auto-init-14.c: New test.
> >* c-c++-common/auto-init-15.c: New test.
> >* c-c++-common/auto-init-16.c: New test.
> >* c-c++-common/auto-init-2.c: New test.
> >* c-c++-common/auto-init-3.c: New test.
> >* c-c++-common/auto-init-4.c: New test.
> >* c-c++-common/auto-init-5.c: New test.
> >* c-c++-common/auto-init-6.c: New test.
> >* c-c++-common/auto-init-7.c: New test.
> >* c-c++-common/auto-init-8.c: New test.
> >* c-c++-common/auto-init-9.c: New test.
> >* c-c++-common/auto-init-esra.c: New test.
> >* c-c++-common/auto-init-padding-1.c: New test.
> >* c-c++-common/auto-init-padding-2.c: New test.
> >* c-c++-common/auto-init-padding-3.c: New test.

This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.

The main problem is hardcoding type sizes and structure layout expectations
that are valid only on some lp64 targets.
On ilp32 long and pointer are 32-bit, and there are targets that are neither
ilp32 nor lp64 and there even other sizes can't be taken for granted.
Also, long double depending on target and options is either 8, 12 or 16 byte
(the first one when it is the same as double, the second e.g. for ia32
extended long double (which is under the hood 10 byte), the last either
the same hw type on x86_64 or IBM double double or IEEE quad).
In the last test, one problem is that unsigned long is on ilp32 32-bit
instead of 64-bit, but even just changing to long long is not enough,
as long long in structures on ia32 is only 4 byte aligned instead of 8.

Tested on x86_64-linux -m32/-m64, ok for trunk?

Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
deal with that right now, just try
make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
i386.exp=auto-init*'
Guess some of those tests should be restricted to lp64 in there, others
where it might be easier to check all of lp64, x32 and ia32 code generation
could have different matches.  Wonder also about the aarch64 tests, there is
also -mabi=ilp32...
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
"0xfefefefefefefefe" 3
+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
"0xfefefefe" 2
+FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t%xmm0, 
%xmm0 3
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"0xfefefefe" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
"[0xfefefefefefefefe]" 1
+FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times .long\\t0 14
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
"[0xfefefefefefefefe]" 1
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
[0]) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
[0]))" 3
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
"0xfefefefe" 1
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 2
+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
"[0xfefefefefefefefe]" 2
+FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand 
"const_int 0 [0]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand 
"const_int 0 [0]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-2.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\\t\$16,
+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
+FAIL: gcc.target/i386/auto-init-padding-4.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-5.c scan-rtl-dump-times expand 
"const_int 0 [0]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-6.c scan-rtl-dump-times expand 
"0xfffe]) repeated x16" 1
+FAIL: gcc.target/i386/auto-init-padding-7.c scan-assemb

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

2021-09-11 Thread Richard Biener via Gcc-patches

On September 10, 2021 11:27:16 PM GMT+02:00, Segher Boessenkool 
 wrote:
>On Fri, Sep 10, 2021 at 08:36:12PM +0200, Richard Biener wrote:
>> On September 10, 2021 6:24:50 PM GMT+02:00, Segher Boessenkool 
>>  wrote:
>> >Yes, we should not call TRULY_NOOP_TRUNCATION_MODES_P for any random two
>> >modes: such a truncation needs to have a meaning at all, for the
>> >question to make any sense.  Maybe we can add an assert to this macro to
>> >root out nonsensical callers?
>> >
>> >Btw.  We have
>> >#define TRULY_NOOP_TRUNCATION_MODES_P(MODE1, MODE2) \
>> >  (targetm.truly_noop_truncation (GET_MODE_PRECISION (MODE1), \
>> >  GET_MODE_PRECISION (MODE2)))
>> >which is not optimal, either: does truncating DFmode to HFmode behave
>> >the same as truncating DImode to HImode, on every target?  On *any*
>> >target, even?!
>> 
>> When is it for any non-scalar integral mode? I suspect this was only 
>> meaningful for integer modes from the start. On x86 i387 math any float mode 
>> truncation is noop (with not doing actual truncation to inf). 
>
>And trunc on floating point modes was added later?  Yeah, sounds like a
>good theory.
>
>So we should have an assertion in TNTMP that both modes are integral?
>(Scalar of course).

No, but in the context we are talking about (bitfield extraction) obviously 
integer truncate is referred to, a FP truncate doesn't make sense here. So 
either this piece needs to ask for integer modes and then also pun to them or 
restrict the operation to integer modes. For source int but destination FP 
there might be other ways to validate whether the target can do this, but TNTMP 
is not it. 

Richard. 

>
>Segher

[PATCH] Refactor jump_thread_path_registry.

2021-09-11 Thread Aldy Hernandez via Gcc-patches

In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.

The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.

The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
  jt_path_registry ();
  virtual ~jt_path_registry ();
  bool register_jump_thread (vec *);
  bool thread_through_all_blocks (bool peel_loop_headers);
  jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
  vec *allocate_thread_path ();
protected:
  vec *> m_paths;
  unsigned long m_num_threaded_edges;
private:
  virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
  fwd_jt_path_registry ();
  ~fwd_jt_path_registry ();
  void remove_jump_threads_including (edge);
private:
  bool update_cfg (bool peel_loop_headers) override;
  void mark_threaded_blocks (bitmap threaded_blocks);
  bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
  bool thread_block (basic_block, bool noloop_only);
  bool thread_through_loop_header (class loop *loop,
   bool may_peel_loop_headers);
  class redirection_data *lookup_redirection_data (edge e, enum insert_option);
  hash_table *m_removed_edges;
  hash_table *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
  bool update_cfg (bool peel_loop_headers) override;
  void adjust_paths_after_duplication (unsigned curr_path_num);
  bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
  unsigned n_region, unsigned current_path_no);
  bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader_registry): Use
back_jt_path_registry.
* tree-ssa-threadedge.c (jump_threader::jump_threader): Use
fwd_jt_path_registry.
* tree-ssa-threadedge.h (class jump_threader): Same..
* tree-ssa-threadupdate.c
(jump_thread_path_registry::jump_thread_path_regist

[PATCH] Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

2021-09-11 Thread Roger Sayle


This patch catches another place in the middle-end where it's possible
to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the
benefit of later RTL optimizations.  This adds the same logic to
expr.c's convert_move as recently added to convert_modes.

On nvptx-none, the simple test program:

short foo (char c) { return c; }

currently generates three instructions:

mov.u32 %r23, %ar0;
cvt.u16.u32 %r24, %r23;
cvt.s32.s16 %value, %r24;

with this patch, we now generate just one:

mov.u32 %value, %ar0;


This patch should look familiar, it's almost identical to the recent patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with
the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html
[Apologies again for this breakage on affected (non-x86_64) targets; I hope
having that fixed within a few hours (before many folks even noticed a
problem) minimized the inconvenience].

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and on a cross-compiler to
nvptx-none, with no new failures in its testsuite.  OK for mainline?


2021-09-11  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.

Roger
--

diff --git a/gcc/expr.c b/gcc/expr.c
index 17f2c2f..e0bcbcc 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -236,8 +236,27 @@ convert_move (rtx to, rtx from, int unsignedp)
  >= GET_MODE_PRECISION (to_int_mode))
   && SUBREG_CHECK_PROMOTED_SIGN (from, unsignedp))
 {
+  scalar_int_mode int_orig_mode;
+  scalar_int_mode int_inner_mode;
+  machine_mode orig_mode = GET_MODE (from);
+
   from = gen_lowpart (to_int_mode, SUBREG_REG (from));
   from_mode = to_int_mode;
+
+  /* Preserve SUBREG_PROMOTED_VAR_P if the new mode is wider than
+the original mode, but narrower than the inner mode.  */
+  if (GET_CODE (from) == SUBREG
+ && is_a  (orig_mode, &int_orig_mode)
+ && GET_MODE_PRECISION (to_int_mode)
+> GET_MODE_PRECISION (int_orig_mode)
+ && is_a  (GET_MODE (SUBREG_REG (from)),
+&int_inner_mode)
+ && GET_MODE_PRECISION (int_inner_mode)
+> GET_MODE_PRECISION (to_int_mode))
+   {
+ SUBREG_PROMOTED_VAR_P (from) = 1;
+ SUBREG_PROMOTED_SET (from, unsignedp);
+   }
 }
 
   gcc_assert (GET_CODE (to) != SUBREG || !SUBREG_PROMOTED_VAR_P (to));

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

2021-09-11 Thread Hongtao Liu via Gcc-patches

On Sat, Sep 11, 2021 at 4:25 PM Richard Biener via Gcc-patches
 wrote:
>
> On September 10, 2021 11:27:16 PM GMT+02:00, Segher Boessenkool 
>  wrote:
> >On Fri, Sep 10, 2021 at 08:36:12PM +0200, Richard Biener wrote:
> >> On September 10, 2021 6:24:50 PM GMT+02:00, Segher Boessenkool 
> >>  wrote:
> >> >Yes, we should not call TRULY_NOOP_TRUNCATION_MODES_P for any random two
> >> >modes: such a truncation needs to have a meaning at all, for the
> >> >question to make any sense.  Maybe we can add an assert to this macro to
> >> >root out nonsensical callers?
> >> >
> >> >Btw.  We have
> >> >#define TRULY_NOOP_TRUNCATION_MODES_P(MODE1, MODE2) \
> >> >  (targetm.truly_noop_truncation (GET_MODE_PRECISION (MODE1), \
> >> >  GET_MODE_PRECISION (MODE2)))
> >> >which is not optimal, either: does truncating DFmode to HFmode behave
> >> >the same as truncating DImode to HImode, on every target?  On *any*
> >> >target, even?!
> >>
> >> When is it for any non-scalar integral mode? I suspect this was only 
> >> meaningful for integer modes from the start. On x86 i387 math any float 
> >> mode truncation is noop (with not doing actual truncation to inf).
> >
> >And trunc on floating point modes was added later?  Yeah, sounds like a
> >good theory.
> >
> >So we should have an assertion in TNTMP that both modes are integral?
> >(Scalar of course).
>
> No, but in the context we are talking about (bitfield extraction) obviously 
> integer truncate is referred to, a FP truncate doesn't make sense here. So 
> either this piece needs to ask for integer modes and then also pun to them or 
> restrict the operation to integer modes. For source int but destination FP 
> there might be other ways to validate whether the target can do this, but 
> TNTMP is not it.
There is no contradiction between integer truncate and target is
FLOAT_MODE_P, extv only cares about the bits extraction, look at code
below, there's convert_extracted_bit_field which is supposed to handle
non scalar integral mode.
The key here is the paradoxical subreg (subreg:ext_mode (target)) is
not necessarily legal, and it is problematic to call gen_lowpart
directly without guaranteeing this, I still prefer to add a
validate_subreg(extmode, tmode) condition.
>
> Richard.
>
> >
> >Segher
>


-- 
BR,
Hongtao

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

2021-09-11 Thread Hongtao Liu via Gcc-patches

On Sat, Sep 11, 2021 at 5:51 PM Hongtao Liu  wrote:
>
> On Sat, Sep 11, 2021 at 4:25 PM Richard Biener via Gcc-patches
>  wrote:
> >
> > On September 10, 2021 11:27:16 PM GMT+02:00, Segher Boessenkool 
> >  wrote:
> > >On Fri, Sep 10, 2021 at 08:36:12PM +0200, Richard Biener wrote:
> > >> On September 10, 2021 6:24:50 PM GMT+02:00, Segher Boessenkool 
> > >>  wrote:
> > >> >Yes, we should not call TRULY_NOOP_TRUNCATION_MODES_P for any random two
> > >> >modes: such a truncation needs to have a meaning at all, for the
> > >> >question to make any sense.  Maybe we can add an assert to this macro to
> > >> >root out nonsensical callers?
> > >> >
> > >> >Btw.  We have
> > >> >#define TRULY_NOOP_TRUNCATION_MODES_P(MODE1, MODE2) \
> > >> >  (targetm.truly_noop_truncation (GET_MODE_PRECISION (MODE1), \
> > >> >  GET_MODE_PRECISION (MODE2)))
> > >> >which is not optimal, either: does truncating DFmode to HFmode behave
> > >> >the same as truncating DImode to HImode, on every target?  On *any*
> > >> >target, even?!
> > >>
> > >> When is it for any non-scalar integral mode? I suspect this was only 
> > >> meaningful for integer modes from the start. On x86 i387 math any float 
> > >> mode truncation is noop (with not doing actual truncation to inf).
> > >
> > >And trunc on floating point modes was added later?  Yeah, sounds like a
> > >good theory.
> > >
> > >So we should have an assertion in TNTMP that both modes are integral?
> > >(Scalar of course).
> >
> > No, but in the context we are talking about (bitfield extraction) obviously 
> > integer truncate is referred to, a FP truncate doesn't make sense here. So 
> > either this piece needs to ask for integer modes and then also pun to them 
> > or restrict the operation to integer modes. For source int but destination 
> > FP there might be other ways to validate whether the target can do this, 
> > but TNTMP is not it.
> There is no contradiction between integer truncate and target is
> FLOAT_MODE_P, extv only cares about the bits extraction, look at code
> below, there's convert_extracted_bit_field which is supposed to handle
> non scalar integral mode.
> The key here is the paradoxical subreg (subreg:ext_mode (target)) is
> not necessarily legal, and it is problematic to call gen_lowpart
> directly without guaranteeing this, I still prefer to add a
> validate_subreg(extmode, tmode) condition.
It seems aarch64 use bitfield extraction for

_Float16
foo (int a)
{
  union {
int a;
_Float16 b;
  }c;
  c.a = a;
  return c.b;
}

(insn 2 4 3 2 (set (reg/v:SI 93 [ a ])
(reg:SI 0 x0 [ a ]))
"../gcc/intel-innersource/master/gcc/testsuite/gcc.target/i386/float16-5.c":5:1
52 {*movsi_aarch64}
 (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (subreg:DI (reg:HF 94) 0)
(zero_extract:DI (subreg:DI (reg/v:SI 93 [ a ]) 0)
(const_int 16 [0x10])
(const_int 0 [0])))
"../gcc/intel-innersource/master/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
742 {*extzvdi}
 (nil))
(insn 7 6 11 2 (set (reg:HF 92 [  ])
(reg:HF 94))
"../gcc/intel-innersource/master/gcc/testsuite/gcc.target/i386/float16-5.c":11:11
59 {*movhf_aarch64}
 (nil))

> >
> > Richard.
> >
> > >
> > >Segher
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-11 Thread Richard Biener via Gcc-patches

On September 11, 2021 10:03:20 AM GMT+02:00, Jakub Jelinek  
wrote:
>On Fri, Sep 03, 2021 at 02:47:11PM +, Qing Zhao via Gcc-patches wrote:
>> > 2021-08-20  qing zhao  
>> > 
>> >* c-c++-common/auto-init-1.c: New test.
>> >* c-c++-common/auto-init-10.c: New test.
>> >* c-c++-common/auto-init-11.c: New test.
>> >* c-c++-common/auto-init-12.c: New test.
>> >* c-c++-common/auto-init-13.c: New test.
>> >* c-c++-common/auto-init-14.c: New test.
>> >* c-c++-common/auto-init-15.c: New test.
>> >* c-c++-common/auto-init-16.c: New test.
>> >* c-c++-common/auto-init-2.c: New test.
>> >* c-c++-common/auto-init-3.c: New test.
>> >* c-c++-common/auto-init-4.c: New test.
>> >* c-c++-common/auto-init-5.c: New test.
>> >* c-c++-common/auto-init-6.c: New test.
>> >* c-c++-common/auto-init-7.c: New test.
>> >* c-c++-common/auto-init-8.c: New test.
>> >* c-c++-common/auto-init-9.c: New test.
>> >* c-c++-common/auto-init-esra.c: New test.
>> >* c-c++-common/auto-init-padding-1.c: New test.
>> >* c-c++-common/auto-init-padding-2.c: New test.
>> >* c-c++-common/auto-init-padding-3.c: New test.
>
>This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.
>
>The main problem is hardcoding type sizes and structure layout expectations
>that are valid only on some lp64 targets.
>On ilp32 long and pointer are 32-bit, and there are targets that are neither
>ilp32 nor lp64 and there even other sizes can't be taken for granted.
>Also, long double depending on target and options is either 8, 12 or 16 byte
>(the first one when it is the same as double, the second e.g. for ia32
>extended long double (which is under the hood 10 byte), the last either
>the same hw type on x86_64 or IBM double double or IEEE quad).
>In the last test, one problem is that unsigned long is on ilp32 32-bit
>instead of 64-bit, but even just changing to long long is not enough,
>as long long in structures on ia32 is only 4 byte aligned instead of 8.
>
>Tested on x86_64-linux -m32/-m64, ok for trunk?

Ok. 

Thanks, 
Richard. 

>Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
>deal with that right now, just try
>make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
>i386.exp=auto-init*'
>Guess some of those tests should be restricted to lp64 in there, others
>where it might be easier to check all of lp64, x32 and ia32 code generation
>could have different matches.  Wonder also about the aarch64 tests, there is
>also -mabi=ilp32...
>+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
>"0xfefefefefefefefe" 3
>+FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
>"0xfefefefe" 2
>+FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t%xmm0, 
>%xmm0 3
>+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>"0xfefefefe" 1
>+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
>"[0xfefefefefefefefe]" 1
>+FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times .long\\t0 14
>+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 2
>+FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
>"[0xfefefefefefefefe]" 1
>+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
>[0]) repeated x16" 2
>+FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
>[0]))" 3
>+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
>"0xfefefefe" 1
>+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 2
>+FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
>"[0xfefefefefefefefe]" 2
>+FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand 
>"const_int 0 [0]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand 
>"const_int 0 [0]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-2.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler movl\\t\$16,
>+FAIL: gcc.target/i386/auto-init-padding-3.c scan-assembler rep stosq
>+FAIL: gcc.target/i386/auto-init-padding-4.c scan-rtl-dump-times expand 
>"0xfffe]) repeated x16" 1
>+FAIL: gcc.target/i386/auto-init-padding-5.c scan-rtl-dump-times expand 
>"const_int 0 [0]) repeate

Re: [PATCH] Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

2021-09-11 Thread Jeff Law via Gcc-patches





On 9/11/2021 3:28 AM, Roger Sayle wrote:

This patch catches another place in the middle-end where it's possible
to preserve the SUBREG_PROMOTED_VAR_P annotation on a subreg to the
benefit of later RTL optimizations.  This adds the same logic to
expr.c's convert_move as recently added to convert_modes.

On nvptx-none, the simple test program:

short foo (char c) { return c; }

currently generates three instructions:

mov.u32 %r23, %ar0;
cvt.u16.u32 %r24, %r23;
cvt.s32.s16 %value, %r24;

with this patch, we now generate just one:

mov.u32 %value, %ar0;


This patch should look familiar, it's almost identical to the recent patch
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578331.html but with
the fix https://gcc.gnu.org/pipermail/gcc-patches/2021-August/578519.html
[Apologies again for this breakage on affected (non-x86_64) targets; I hope
having that fixed within a few hours (before many folks even noticed a
problem) minimized the inconvenience].

This patch has been tested on x86_64-pc-linux-gnu with "make bootstrap"
and "make -k check" with no new failures, and on a cross-compiler to
nvptx-none, with no new failures in its testsuite.  OK for mainline?


2021-09-11  Roger Sayle  

gcc/ChangeLog
* expr.c (convert_move): Preserve SUBREG_PROMOTED_VAR_P when
creating a (wider) partial subreg from a SUBREG_PROMOTED_VAR_P
subreg.

OK

Jeff

Re: [PATCH] Refactor jump_thread_path_registry.

2021-09-11 Thread Jeff Law via Gcc-patches





On 9/11/2021 2:35 AM, Aldy Hernandez wrote:

In an attempt to refactor thread_through_all_blocks(), I've realized
that there is a mess of code dealing with coexisting forward and
backward thread types.  However, this is an impossible scenario, as
the registry contains either forward/old-style threads, or backward
threads (EDGE_FSM_THREADs), never both.
Correct.  There was a time when they both could show up, but that was 
fixed years ago.




The fact that both types of threads cannot coexist, simplifies the
code considerably.  For that matter, it splits things up nicely
because there are some common bits that can go into a base class, and
some differing code that can go into derived classes.

Excellent.



Diving things in this way makes it very obvious which parts belong in
the old-style copier and which parts belong to the generic copier.
Doing all this provided some nice cleanups, as well as fixing a latent
bug in adjust_paths_after_duplication.
Yea, I expected this could clean up the logic a bit and make everything 
easier to follow.  Thanks for tackling it, it's been on my TODO for a 
long time.




The diff is somewhat hard to read, so perhaps looking at the final
output would be easier.

A general overview of what this patch achieves can be seen by just
looking at this simplified class layout:

// Abstract class for the jump thread registry.

class jt_path_registry
{
public:
   jt_path_registry ();
   virtual ~jt_path_registry ();
   bool register_jump_thread (vec *);
   bool thread_through_all_blocks (bool peel_loop_headers);
   jump_thread_edge *allocate_thread_edge (edge e, jump_thread_edge_type t);
   vec *allocate_thread_path ();
protected:
   vec *> m_paths;
   unsigned long m_num_threaded_edges;
private:
   virtual bool update_cfg (bool peel_loop_headers) = 0;
};

// Forward threader path registry using a custom BB copier.

class fwd_jt_path_registry : public jt_path_registry
{
public:
   fwd_jt_path_registry ();
   ~fwd_jt_path_registry ();
   void remove_jump_threads_including (edge);
private:
   bool update_cfg (bool peel_loop_headers) override;
   void mark_threaded_blocks (bitmap threaded_blocks);
   bool thread_block_1 (basic_block, bool noloop_only, bool joiners);
   bool thread_block (basic_block, bool noloop_only);
   bool thread_through_loop_header (class loop *loop,
bool may_peel_loop_headers);
   class redirection_data *lookup_redirection_data (edge e, enum insert_option);
   hash_table *m_removed_edges;
   hash_table *m_redirection_data;
};

// Backward threader path registry using a generic BB copier.

class back_jt_path_registry : public jt_path_registry
{
private:
   bool update_cfg (bool peel_loop_headers) override;
   void adjust_paths_after_duplication (unsigned curr_path_num);
   bool duplicate_thread_path (edge entry, edge exit, basic_block *region,
   unsigned n_region, unsigned current_path_no);
   bool rewire_first_differing_edge (unsigned path_num, unsigned edge_num);
};

That is, the forward and backward bits have been completely split,
while deriving from a base class for the common functionality.

Most everything is mechanical, but there are a few gotchas:

a) back_jt_path_registry::update_cfg(), which contains the backward
threading specific bits, is rather simple, since most of the code in
the original thread_through_all_blocks() only applied to the forward
threader: removed edges, mark_threaded_blocks,
thread_through_loop_header, the copy tables (*).

(*) The back threader has its own copy tables in
duplicate_thread_path.

b) In some cases, adjust_paths_after_duplication() was commoning out
so many blocks that it was removing the initial EDGE_FSM_THREAD
marker.  I've fixed this.

c) AFAICT, when run from the forward threader,
thread_through_all_blocks() attempts to remove threads starting with
an edge already seen, but it would never see anything because the loop
doing the checking only has a visited_starting_edges.contains(), and
no corresponding visited_starting_edges.add().  The add() method in
thread_through_all_blocks belongs to the backward threading bits, and
as I've explained, both types cannot coexist.  I've removed the checks
in the forward bits since they don't appear to do anything.  If this
was an oversight, and we want to avoid threading already seen edges in
the forward threader, I can move this functionality to the base class.

Ultimately I would like to move all the registry code to
tree-ssa-threadregistry.*.  I've avoided this in this patch to aid in
review.

My apologies for this longass explanation, but I want to make sure
we're covering all of our bases.

Tested on x86-64 Linux by a very tedious process of moving chunks
around, running "make check-gcc RUNTESTFLAGS=tree-ssa.exp", and
repeating ad-nauseum.  And of course, by running a full bootstrap and
tests.

OK?

p.s. In a follow-up patch I will rename the confusing EDGE_FSM_THREAD
type.
So another thing to consider is tha

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

2021-09-11 Thread Qing Zhao via Gcc-patches

Hi, Jakub,

Thanks a lot for your patch to the testing cases in c-c++-common. 
Actually I had a local fix too, and I planed to submitted it after I fixed all 
the failure for dg.target/i386 failures on different platforms..;-)


> On Sep 11, 2021, at 3:03 AM, Jakub Jelinek  wrote:
> 
> On Fri, Sep 03, 2021 at 02:47:11PM +, Qing Zhao via Gcc-patches wrote:
>>> 2021-08-20  qing zhao  
>>> 
>>>   * c-c++-common/auto-init-1.c: New test.
>>>   * c-c++-common/auto-init-10.c: New test.
>>>   * c-c++-common/auto-init-11.c: New test.
>>>   * c-c++-common/auto-init-12.c: New test.
>>>   * c-c++-common/auto-init-13.c: New test.
>>>   * c-c++-common/auto-init-14.c: New test.
>>>   * c-c++-common/auto-init-15.c: New test.
>>>   * c-c++-common/auto-init-16.c: New test.
>>>   * c-c++-common/auto-init-2.c: New test.
>>>   * c-c++-common/auto-init-3.c: New test.
>>>   * c-c++-common/auto-init-4.c: New test.
>>>   * c-c++-common/auto-init-5.c: New test.
>>>   * c-c++-common/auto-init-6.c: New test.
>>>   * c-c++-common/auto-init-7.c: New test.
>>>   * c-c++-common/auto-init-8.c: New test.
>>>   * c-c++-common/auto-init-9.c: New test.
>>>   * c-c++-common/auto-init-esra.c: New test.
>>>   * c-c++-common/auto-init-padding-1.c: New test.
>>>   * c-c++-common/auto-init-padding-2.c: New test.
>>>   * c-c++-common/auto-init-padding-3.c: New test.
> 
> This fails on many targets, e.g. i686-linux or x86_64-linux with -m32.
> 
> The main problem is hardcoding type sizes and structure layout expectations
> that are valid only on some lp64 targets.
> On ilp32 long and pointer are 32-bit, and there are targets that are neither
> ilp32 nor lp64 and there even other sizes can't be taken for granted.
> Also, long double depending on target and options is either 8, 12 or 16 byte
> (the first one when it is the same as double, the second e.g. for ia32
> extended long double (which is under the hood 10 byte), the last either
> the same hw type on x86_64 or IBM double double or IEEE quad).

Is there any testing directive keyword to distinguish this, I see: 
(https://gcc.gnu.org/onlinedocs/gccint/Effective-Target-Keywords.html)

longdouble128
Target has 128-bit long double.


> In the last test, one problem is that unsigned long is on ilp32 32-bit
> instead of 64-bit, but even just changing to long long is not enough,
> as long long in structures on ia32 is only 4 byte aligned instead of 8.
> 
> Tested on x86_64-linux -m32/-m64, ok for trunk?
> 
> Note, the gcc.dg/i386/auto-init* tests fail also, just don't have time to
> deal with that right now, just try
> make check-gcc RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
> i386.exp=auto-init*'
> Guess some of those tests should be restricted to lp64 in there, others
> where it might be easier to check all of lp64, x32 and ia32 code generation
> could have different matches.
Yes, I am working on this right now, will fix this the beginning of next week.

>  Wonder also about the aarch64 tests, there is
> also -mabi=ilp32...
I expected that aarch64 tests with ilp32 might have similar issue.

> +FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
> "0xfefefefefefefefe" 3
> +FAIL: gcc.target/i386/auto-init-2.c scan-rtl-dump-times expand 
> "0xfefefefe" 2
> +FAIL: gcc.target/i386/auto-init-3.c scan-assembler-times pxor\\t%xmm0, 
> %xmm0 3
> +FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "0xfefefefe" 1
> +FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "0xfffe]) repeated x16" 1
> +FAIL: gcc.target/i386/auto-init-4.c scan-rtl-dump-times expand 
> "[0xfefefefefefefefe]" 1
> +FAIL: gcc.target/i386/auto-init-5.c scan-assembler-times .long\\t0 14
> +FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
> "0xfffe]) repeated x16" 2
> +FAIL: gcc.target/i386/auto-init-6.c scan-rtl-dump-times expand 
> "[0xfefefefefefefefe]" 1
> +FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
> [0]) repeated x16" 2
> +FAIL: gcc.target/i386/auto-init-7.c scan-rtl-dump-times expand "const_int 0 
> [0]))" 3
> +FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
> "0xfefefefe" 1
> +FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
> "0xfffe]) repeated x16" 2
> +FAIL: gcc.target/i386/auto-init-8.c scan-rtl-dump-times expand 
> "[0xfefefefefefefefe]" 2
> +FAIL: gcc.target/i386/auto-init-padding-1.c scan-rtl-dump-times expand 
> "const_int 0 [0]) repeated x16" 1
> +FAIL: gcc.target/i386/auto-init-padding-10.c scan-rtl-dump-times expand 
> "0xfffe]) repeated x16" 1
> +FAIL: gcc.target/i386/auto-init-padding-11.c scan-rtl-dump-times expand 
> "const_int 0 [0]) repeated x16" 1
> +FAIL: gcc.target/i386/auto-init-padding-12.c scan-rtl-dump-times expand 
>

Re: [PATCH] Refactor jump_thread_path_registry.

2021-09-11 Thread Aldy Hernandez via Gcc-patches




So another thing to consider is that the threaders initially record 
their paths in different directions.  Forward threading records starting 
at the first block, backward from the final block.  At some point (I no 
longer remember where) we invert the backwards threader's path to fit 
the model that the registry wanted and I think we then convert the path 
into the structure that the generic copier wants at some later point.


Yeah, that's in back_threader_registry::register_path, and then we go 
back and massage things again in back_jt_path_registry::update_cfg().




In theory with the two better separated we might not need to do the 
conversions of the backwards threader's paths anymore.


Good point.  I'll think about it, as we need some coherent way of 
keeping track of paths that can we can share, as I'm about to add yet 
another one in the jt_state business in tree-ssa-threadege.h.


ps.  Don't forget about gcc.dg/torture/pr55107 regression I sent you 
yesterday.  It's showing up on a wide range of targets.


Yeah, thanks for mentioning it.  I'm putting it aside until Monday when 
I'm actually getting paid to chase tricky bugs :).


Thanks for the review.
Aldy

Go patch committed: Don't pad zero-sized trailing field in results struct

2021-09-11 Thread Ian Lance Taylor via Gcc-patches

This patch to the Go frontend avoids padding zero-sized trailing
fields in the struct created to hold result parameters.  This avoids a
miscompilation when a function returns multiple results and the last
result is zero-sized (this is not a useful way to write a function but
it can happen reasonably if on some targets the last result does have
a size).  Nothing can take the address of a field in a result struct,
so the padding is not required.  This fixes GCC PR 101994.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
0d3eabf88aa766056f3e0061a7d369368629a451
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index ff41af787b1..f4816816500 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-21b30eddc59d92a07264c3b21eb032d6c303d16f
+c11d9f8275f2bbe9b05cdd815c79ac331f78e15c
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/gcc/go/gofrontend/types.cc b/gcc/go/gofrontend/types.cc
index e76600daab9..cd692506efc 100644
--- a/gcc/go/gofrontend/types.cc
+++ b/gcc/go/gofrontend/types.cc
@@ -5050,6 +5050,7 @@ Function_type::get_backend_fntype(Gogo* gogo)
  Struct_type* st = Type::make_struct_type(sfl,
   this->location());
  st->set_is_struct_incomparable();
+ st->set_is_results_struct();
  ins.first->second = st->get_backend(gogo);
}
  bresult_struct = ins.first->second;
@@ -6458,7 +6459,7 @@ get_backend_struct_fields(Gogo* gogo, Struct_type* type, 
bool use_placeholder,
 saw_nonzero = true;
 }
   go_assert(i == fields->size());
-  if (saw_nonzero && lastsize == 0)
+  if (saw_nonzero && lastsize == 0 && !type->is_results_struct())
 {
   // For nonzero-sized structs which end in a zero-sized thing, we add
   // an extra byte of padding to the type. This padding ensures that
diff --git a/gcc/go/gofrontend/types.h b/gcc/go/gofrontend/types.h
index ca1ab49c57e..0c5180668ea 100644
--- a/gcc/go/gofrontend/types.h
+++ b/gcc/go/gofrontend/types.h
@@ -2501,7 +2501,8 @@ class Struct_type : public Type
   Struct_type(Struct_field_list* fields, Location location)
 : Type(TYPE_STRUCT),
   fields_(fields), location_(location), all_methods_(NULL),
-  is_struct_incomparable_(false), has_padding_(false)
+  is_struct_incomparable_(false), has_padding_(false),
+  is_results_struct_(false)
   { }
 
   // Return the field NAME.  This only looks at local fields, not at
@@ -2632,6 +2633,17 @@ class Struct_type : public Type
   set_has_padding()
   { this->has_padding_ = true; }
 
+  // Return whether this is a results struct created to hold the
+  // results of a function that returns multiple results.
+  bool
+  is_results_struct() const
+  { return this->is_results_struct_; }
+
+  // Record that this is a results struct.
+  void
+  set_is_results_struct()
+  { this->is_results_struct_ = true; }
+
   // Write the hash function for this type.
   void
   write_hash_function(Gogo*, Function_type*);
@@ -2742,6 +2754,9 @@ class Struct_type : public Type
   // True if this struct's backend type has padding, due to trailing
   // zero-sized field.
   bool has_padding_;
+  // True if this is a results struct created to hold the results of a
+  // function that returns multiple results.
+  bool is_results_struct_;
 };
 
 // The type of an array.

[PATCH] testsuite: Fix c-c++-common/auto-init-* tests

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

[PATCH] Refactor jump_thread_path_registry.

[PATCH] Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

Re: [PATCH 2/2] validate_subreg before call gen_lowpart to avoid ICE.

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

Re: [PATCH] Also preserve SUBREG_PROMOTED_VAR_P in expr.c's convert_move.

Re: [PATCH] Refactor jump_thread_path_registry.

Re: [PATCH] testsuite: Fix c-c++-common/auto-init-* tests

Re: [PATCH] Refactor jump_thread_path_registry.

Go patch committed: Don't pad zero-sized trailing field in results struct

12 matches

Site Navigation

Mail list logo

Footer information