modified_between_p does not check for volatile memory

2020-10-06 Thread Tucker Kern via Gcc
TL;DR

In GCC 9.3, I believe modified_between_p should return 1 if the memory
reference is volatile. My layman's understanding of volatile memory is that
it could change at any time, and thus could be modified between any two
instructions.

Possible patch:
diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
index 01af063a222..a395df0627b 100644
--- a/gcc/rtlanal.c
+++ b/gcc/rtlanal.c
@@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn
*start, const rtx_insn *end)
   return 1;

 case MEM:
+  if (MEM_VOLATILE_P(x))
+return 1;
   if (modified_between_p (XEXP (x, 0), start, end))
  return 1;
   if (MEM_READONLY_P (x))

Details
I'm writing a splitter to optimize bit field conditionals (e.g. testing a
bit of memory mapped I/O). The non-optimized pattern consists of a SET,
AND,  LSHIFTRT + CMP, JMP pattern which I know can be simplified to an
AND + JMP as the AND operation will set the necessary CC bits. To achieve
this optimization in the combine pass a custom general_operand has been
defined which allows volatile memory.

However in certain situations I've found the combine pass is generating
invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP)
patterns. One such situation is when GCC decides to reuse the result of the
comparison for the return value.

This issue seems intertwined with the combine + volatile_ok issue that has
been discussed a number of times in the past. The backend is question is
highly embedded and significant performance gains are made by enabling
volatile references during combine.

Optimized tree of the test program. In this program the bit field value is
saved in _1 to be reused in the return value. Why it makes this choice is
beyond me.
   _1;
  _Bool _3;

   [local count: 1044213930]:
  _1 ={v} MEM[(volatile struct register_t *)5B].done;
  if (_1 != 0)
goto ; [11.00%]
  else
goto ; [89.00%]

   [local count: 1073741824]:
  // Additional code trimmed for brevity...

   [local count: 114863532]:
  # _3 = PHI <0(4), _1(5)>
  return _3;

This produces the following pre-combine RTL.
(insn 9 6 10 3 (parallel [
(set (reg:QI 17 [  ])
(lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile
struct register_t *)5B]+0 S1 A16])
(const_int 3 [0x3])))
(clobber (reg:CC 7 FLAG))
]) "fail.c":22:26 19 {*lshrqi3_im}
 (expr_list:REG_DEAD (reg/f:QI 18)
(expr_list:REG_UNUSED (reg:CC 7 FLAG)
(nil
(insn 10 9 12 3 (parallel [
(set (reg:QI 17 [  ])
(and:QI (reg:QI 17 [  ])
(const_int 1 [0x1])))
(clobber (reg:CC 7 FLAG))
]) "fail.c":22:26 10 {andqi3}
 (expr_list:REG_UNUSED (reg:CC 7 FLAG)
(nil)))
(jump_insn 12 10 20 3 (parallel [
(set (pc)
(if_then_else (eq (reg:QI 17 [  ])
(const_int 0 [0]))
(label_ref:QI 11)
(pc)))
(clobber (scratch:QI))
]) "fail.c":22:9 41 {*cbranchqi4_im}
 (int_list:REG_BR_PROB 955630228 (nil))
 -> 11)

Which when combined generates patterns that will produce invalid code. GCC
has assumed the memory value will not change between insn 10 & 12.
(insn 10 9 12 3 (parallel [
(set (reg:QI 17 [  ])
(and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
MEM[(volatile struct register_t *)5B]+0 S1 A16])
(const_int 3 [0x3]))
(const_int 1 [0x1])))
(clobber (reg:CC 7 FLAG))
]) "fail.c":22:18 10 {andqi3}
 (expr_list:REG_UNUSED (reg:CC 7 FLAG)
(nil)))
(jump_insn 12 10 20 3 (parallel [
(set (pc)
(if_then_else (eq (zero_extract:QI (mem/v:QI (const_int 5
[0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16])
(const_int 1 [0x1])
(const_int 3 [0x3]))
(const_int 0 [0]))
(label_ref:QI 11)
(pc)))
(clobber (scratch:QI))
]) "fail.c":22:9 43 {*cbranchqi4_zero_extract}
 (int_list:REG_BR_PROB 955630228 (nil))
 -> 11)

With the above patch, the combination of insn 10 & 12 is rejected and the
if_then_else switches on retval.
(insn 10 9 12 3 (parallel [
(set (reg:QI 17 [  ])
(and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
MEM[(volatile struct register_t *)5B]+0 S1 A16])
(const_int 3 [0x3]))
(const_int 1 [0x1])))
(clobber (reg:CC 7 FLAG))
]) "fail.c":22:18 10 {andqi3}
 (expr_list:REG_UNUSED (reg:CC 7 FLAG)
(nil)))
(jump_insn 12 10 20 3 (parallel [
(set (pc)
(if_then_else (eq (reg:QI 17 [  ])
(const_int 0 [0]))
(label_ref:QI 11)
(pc)))
(clobber (scratch:QI))
]) "fail.c":22:9 41 {*cbranc

Re: modified_between_p does not check for volatile memory

2020-10-15 Thread Tucker Kern via Gcc
Hi Richard,

Thanks for getting back to me and your insight. I've implemented a
TARGET_CANNOT_COPY_INSN_P that rejects volatile memory and it appears to be
doing the trick!

It will take some time to determine if there are any other side effects but
I can move forward with this. Much appreciated!

-Tucker

On Tue, Oct 13, 2020 at 3:00 AM Richard Sandiford 
wrote:

> Tucker Kern via Gcc  writes:
> > TL;DR
> >
> > In GCC 9.3, I believe modified_between_p should return 1 if the memory
> > reference is volatile. My layman's understanding of volatile memory is
> that
> > it could change at any time, and thus could be modified between any two
> > instructions.
>
> That's true, but in RTL, volatile accesses provide no timing guarantee,
> even in an abstract sense.  E.g. suppose we traced the execution of a
> function call based on the instructions that exist immediately after
> the expand pass.  If a volatile access occurs in the Nth instruction
> of this trace, there's no guarantee that the access will still occur
> in the Nth instruction of traces for later passes.  Unrelated
> instructions can be added and removed at any time.
>
> For example, a volatile access should not stop us from swapping:
>
>   (set (reg:SI R0) (const_int 0))
>
> and:
>
>   (set (mem/v:SI (reg:SI R1)) (const_int 0))
>
> So…
>
> > Possible patch:
> > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c
> > index 01af063a222..a395df0627b 100644
> > --- a/gcc/rtlanal.c
> > +++ b/gcc/rtlanal.c
> > @@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn
> > *start, const rtx_insn *end)
> >return 1;
> >
> >  case MEM:
> > +  if (MEM_VOLATILE_P(x))
> > +return 1;
> >if (modified_between_p (XEXP (x, 0), start, end))
> >   return 1;
> >if (MEM_READONLY_P (x))
>
> …I think the current code handles volatile correctly.  The issue
> is whether any instruction in (START, END) might modify the same
> memory as X.  Most of the real work is done by the alias machinery,
> which understands (or is supposed to understand) the special rules
> around volatile memory and how it interacts with other memory accesses:
>
>   for (insn = NEXT_INSN (start); insn != end; insn = NEXT_INSN (insn))
> if (memory_modified_in_insn_p (x, insn))
>   return 1;
>
> > Details
> > I'm writing a splitter to optimize bit field conditionals (e.g. testing a
> > bit of memory mapped I/O). The non-optimized pattern consists of a SET,
> > AND,  LSHIFTRT + CMP, JMP pattern which I know can be simplified to an
> > AND + JMP as the AND operation will set the necessary CC bits. To achieve
> > this optimization in the combine pass a custom general_operand has been
> > defined which allows volatile memory.
> >
> > However in certain situations I've found the combine pass is generating
> > invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP)
> > patterns. One such situation is when GCC decides to reuse the result of
> the
> > comparison for the return value.
> >
> > This issue seems intertwined with the combine + volatile_ok issue that
> has
> > been discussed a number of times in the past. The backend is question is
> > highly embedded and significant performance gains are made by enabling
> > volatile references during combine.
> >
> > Optimized tree of the test program. In this program the bit field value
> is
> > saved in _1 to be reused in the return value. Why it makes this choice is
> > beyond me.
> >_1;
> >   _Bool _3;
> >
> >[local count: 1044213930]:
> >   _1 ={v} MEM[(volatile struct register_t *)5B].done;
> >   if (_1 != 0)
> > goto ; [11.00%]
> >   else
> > goto ; [89.00%]
> >
> >[local count: 1073741824]:
> >   // Additional code trimmed for brevity...
> >
> >[local count: 114863532]:
> >   # _3 = PHI <0(4), _1(5)>
> >   return _3;
> >
> > This produces the following pre-combine RTL.
> > (insn 9 6 10 3 (parallel [
> > (set (reg:QI 17 [  ])
> > (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1
> MEM[(volatile
> > struct register_t *)5B]+0 S1 A16])
> > (const_int 3 [0x3])))
> > (clobber (reg:CC 7 FLAG))
> > ]) "fail.c":22:26 19 {*lshrqi3_im}
> >  (expr_list:REG_DEAD (reg/f:QI 18)
> > (expr_list:REG_UNUSED (reg:CC 7 FLAG)
> > (nil
> > (insn 10 9 12 3 (parallel [
> > (set (reg:QI 17 [  ])
> >

Adjust offset of array reference in named address space

2021-01-08 Thread Tucker Kern via Gcc
Hi,

I'm implementing named addresses spaces for a Harvard architecture machine
to support copying data from instruction memory to data memory. This is
achieved via a special instruction. e.g. think AVR and progmem/__flash.

However, the instruction memory is narrower than the data memory (12 vs 16
bits) on this machine. So a single data word is split across 2 instruction
words. When copied from IMEM to DMEM the two parts are combined via SHIFT +
OR patterns.

This is all working fine for regular variables (i.e. int som_var), but it
falls apart for array references (i.e. some_array[1]). Since the data is
stored across 2 IMEM words, I need to scale the calculated offset of each
array reference by 2. e.g. array[0] is actually stored in imem[0] & imem[1]
and array[1] is stored in imem[2] & imem[3].

e.g.
static __imem int imem_array[2];
return imem_array[1];

// needs to generate a symbol reference like
&imem_array.869+2

Similarly if the array index was a function parameter, I need to scale the
parameter by 2.
__imem int imem_array[2];
int some_func(int a)
{
  // a needs to be scaled by 2 when generating RTL/ASM
  return imem_array[a];
}

I haven't found any target hooks that would allow me to override the offset
calculation. Originally I thought I could handle it in a splitter but this
approach didn't work for the function parameter example as I ended up
scaling the entire address instead of just the offset.

I had another thought of using a combo of
TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark it as
adjusted but I don't think this combo will work in the end.

Is there any way to achieve this?

Thanks,
Tucker


Re: Adjust offset of array reference in named address space

2021-01-12 Thread Tucker Kern via Gcc
> So you are saying that a 16bit data word in IMEM is actually two 12bit
> data words (supposedly only the lower 8 bits used in each) and thus the
> array contains "padding"?

Effectively yes. The assembler handles dividing constants into their LSB
and MSB components. I have insn patterns and splitters defined that emit
the correct instructions to read and "pack" the value into a register or
generic memory location.

All I really need at this point is a means to augment how addresses (e.g.
array offsets or struct members) are calculated in a non-generic address
space. This doesn't feel like a far fetched idea as GCC currently supports
address space specific legitimization and modes.

On Mon, Jan 11, 2021 at 12:50 AM Richard Biener 
wrote:

> On Sat, Jan 9, 2021 at 12:24 AM Tucker Kern via Gcc 
> wrote:
> >
> > Hi,
> >
> > I'm implementing named addresses spaces for a Harvard architecture
> machine
> > to support copying data from instruction memory to data memory. This is
> > achieved via a special instruction. e.g. think AVR and progmem/__flash.
> >
> > However, the instruction memory is narrower than the data memory (12 vs
> 16
> > bits) on this machine. So a single data word is split across 2
> instruction
> > words. When copied from IMEM to DMEM the two parts are combined via
> SHIFT +
> > OR patterns.
> >
> > This is all working fine for regular variables (i.e. int som_var), but it
> > falls apart for array references (i.e. some_array[1]). Since the data is
> > stored across 2 IMEM words, I need to scale the calculated offset of each
> > array reference by 2. e.g. array[0] is actually stored in imem[0] &
> imem[1]
> > and array[1] is stored in imem[2] & imem[3].
>
> So you are saying that a 16bit data word in IMEM is actually two 12bit
> data words (supposedly only the lower 8 bits used in each) and thus the
> array contains "padding"?  That's not really supported and is also not
> the scope of named address spaces.  I'd suggest you go down the route
> of providing intrinsics for the transfer of data instead which could resort
> to target specific builtin functions.
>
> > e.g.
> > static __imem int imem_array[2];
> > return imem_array[1];
> >
> > // needs to generate a symbol reference like
> > &imem_array.869+2
> >
> > Similarly if the array index was a function parameter, I need to scale
> the
> > parameter by 2.
> > __imem int imem_array[2];
> > int some_func(int a)
> > {
> >   // a needs to be scaled by 2 when generating RTL/ASM
> >   return imem_array[a];
> > }
> >
> > I haven't found any target hooks that would allow me to override the
> offset
> > calculation. Originally I thought I could handle it in a splitter but
> this
> > approach didn't work for the function parameter example as I ended up
> > scaling the entire address instead of just the offset.
> >
> > I had another thought of using a combo of
> > TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and
> > TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark it as
> > adjusted but I don't think this combo will work in the end.
> >
> > Is there any way to achieve this?
> >
> > Thanks,
> > Tucker
>


Re: Adjust offset of array reference in named address space

2021-01-13 Thread Tucker Kern via Gcc
Richard,

As always thanks for the insight. I'll take a look at those functions as
well.

-Tucker

On Tue, Jan 12, 2021 at 8:37 AM Richard Biener 
wrote:

> On Tue, Jan 12, 2021 at 4:29 PM Tucker Kern  wrote:
> >
> > > So you are saying that a 16bit data word in IMEM is actually two 12bit
> > > data words (supposedly only the lower 8 bits used in each) and thus the
> > > array contains "padding"?
> >
> > Effectively yes. The assembler handles dividing constants into their LSB
> and MSB components. I have insn patterns and splitters defined that emit
> the correct instructions to read and "pack" the value into a register or
> generic memory location.
> >
> > All I really need at this point is a means to augment how addresses
> (e.g. array offsets or struct members) are calculated in a non-generic
> address space. This doesn't feel like a far fetched idea as GCC currently
> supports address space specific legitimization and modes.
>
> I think you'd need to hook this up in structure layouting which then
> runs into the issue that
> the address space is a qualifier and GCC internals expect layouts to
> be compatible between
> qualified and unqualified types.  Also all target hooks in the layout
> code mostly deal with
> bitfields only.
>
> The offset is computed via get_inner_reference from expand_expr_real_*
> so I don't
> see a good way to handle this correctly.  But maybe Joseph has an idea.
>
> Richard.
>
> > On Mon, Jan 11, 2021 at 12:50 AM Richard Biener <
> richard.guent...@gmail.com> wrote:
> >>
> >> On Sat, Jan 9, 2021 at 12:24 AM Tucker Kern via Gcc 
> wrote:
> >> >
> >> > Hi,
> >> >
> >> > I'm implementing named addresses spaces for a Harvard architecture
> machine
> >> > to support copying data from instruction memory to data memory. This
> is
> >> > achieved via a special instruction. e.g. think AVR and
> progmem/__flash.
> >> >
> >> > However, the instruction memory is narrower than the data memory (12
> vs 16
> >> > bits) on this machine. So a single data word is split across 2
> instruction
> >> > words. When copied from IMEM to DMEM the two parts are combined via
> SHIFT +
> >> > OR patterns.
> >> >
> >> > This is all working fine for regular variables (i.e. int som_var),
> but it
> >> > falls apart for array references (i.e. some_array[1]). Since the data
> is
> >> > stored across 2 IMEM words, I need to scale the calculated offset of
> each
> >> > array reference by 2. e.g. array[0] is actually stored in imem[0] &
> imem[1]
> >> > and array[1] is stored in imem[2] & imem[3].
> >>
> >> So you are saying that a 16bit data word in IMEM is actually two 12bit
> >> data words (supposedly only the lower 8 bits used in each) and thus the
> >> array contains "padding"?  That's not really supported and is also not
> >> the scope of named address spaces.  I'd suggest you go down the route
> >> of providing intrinsics for the transfer of data instead which could
> resort
> >> to target specific builtin functions.
> >>
> >> > e.g.
> >> > static __imem int imem_array[2];
> >> > return imem_array[1];
> >> >
> >> > // needs to generate a symbol reference like
> >> > &imem_array.869+2
> >> >
> >> > Similarly if the array index was a function parameter, I need to
> scale the
> >> > parameter by 2.
> >> > __imem int imem_array[2];
> >> > int some_func(int a)
> >> > {
> >> >   // a needs to be scaled by 2 when generating RTL/ASM
> >> >   return imem_array[a];
> >> > }
> >> >
> >> > I haven't found any target hooks that would allow me to override the
> offset
> >> > calculation. Originally I thought I could handle it in a splitter but
> this
> >> > approach didn't work for the function parameter example as I ended up
> >> > scaling the entire address instead of just the offset.
> >> >
> >> > I had another thought of using a combo of
> >> > TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and
> >> > TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark
> it as
> >> > adjusted but I don't think this combo will work in the end.
> >> >
> >> > Is there any way to achieve this?
> >> >
> >> > Thanks,
> >> > Tucker
>