modified_between_p does not check for volatile memory
TL;DR In GCC 9.3, I believe modified_between_p should return 1 if the memory reference is volatile. My layman's understanding of volatile memory is that it could change at any time, and thus could be modified between any two instructions. Possible patch: diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c index 01af063a222..a395df0627b 100644 --- a/gcc/rtlanal.c +++ b/gcc/rtlanal.c @@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn *start, const rtx_insn *end) return 1; case MEM: + if (MEM_VOLATILE_P(x)) +return 1; if (modified_between_p (XEXP (x, 0), start, end)) return 1; if (MEM_READONLY_P (x)) Details I'm writing a splitter to optimize bit field conditionals (e.g. testing a bit of memory mapped I/O). The non-optimized pattern consists of a SET, AND, LSHIFTRT + CMP, JMP pattern which I know can be simplified to an AND + JMP as the AND operation will set the necessary CC bits. To achieve this optimization in the combine pass a custom general_operand has been defined which allows volatile memory. However in certain situations I've found the combine pass is generating invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP) patterns. One such situation is when GCC decides to reuse the result of the comparison for the return value. This issue seems intertwined with the combine + volatile_ok issue that has been discussed a number of times in the past. The backend is question is highly embedded and significant performance gains are made by enabling volatile references during combine. Optimized tree of the test program. In this program the bit field value is saved in _1 to be reused in the return value. Why it makes this choice is beyond me. _1; _Bool _3; [local count: 1044213930]: _1 ={v} MEM[(volatile struct register_t *)5B].done; if (_1 != 0) goto ; [11.00%] else goto ; [89.00%] [local count: 1073741824]: // Additional code trimmed for brevity... [local count: 114863532]: # _3 = PHI <0(4), _1(5)> return _3; This produces the following pre-combine RTL. (insn 9 6 10 3 (parallel [ (set (reg:QI 17 [ ]) (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16]) (const_int 3 [0x3]))) (clobber (reg:CC 7 FLAG)) ]) "fail.c":22:26 19 {*lshrqi3_im} (expr_list:REG_DEAD (reg/f:QI 18) (expr_list:REG_UNUSED (reg:CC 7 FLAG) (nil (insn 10 9 12 3 (parallel [ (set (reg:QI 17 [ ]) (and:QI (reg:QI 17 [ ]) (const_int 1 [0x1]))) (clobber (reg:CC 7 FLAG)) ]) "fail.c":22:26 10 {andqi3} (expr_list:REG_UNUSED (reg:CC 7 FLAG) (nil))) (jump_insn 12 10 20 3 (parallel [ (set (pc) (if_then_else (eq (reg:QI 17 [ ]) (const_int 0 [0])) (label_ref:QI 11) (pc))) (clobber (scratch:QI)) ]) "fail.c":22:9 41 {*cbranchqi4_im} (int_list:REG_BR_PROB 955630228 (nil)) -> 11) Which when combined generates patterns that will produce invalid code. GCC has assumed the memory value will not change between insn 10 & 12. (insn 10 9 12 3 (parallel [ (set (reg:QI 17 [ ]) (and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16]) (const_int 3 [0x3])) (const_int 1 [0x1]))) (clobber (reg:CC 7 FLAG)) ]) "fail.c":22:18 10 {andqi3} (expr_list:REG_UNUSED (reg:CC 7 FLAG) (nil))) (jump_insn 12 10 20 3 (parallel [ (set (pc) (if_then_else (eq (zero_extract:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16]) (const_int 1 [0x1]) (const_int 3 [0x3])) (const_int 0 [0])) (label_ref:QI 11) (pc))) (clobber (scratch:QI)) ]) "fail.c":22:9 43 {*cbranchqi4_zero_extract} (int_list:REG_BR_PROB 955630228 (nil)) -> 11) With the above patch, the combination of insn 10 & 12 is rejected and the if_then_else switches on retval. (insn 10 9 12 3 (parallel [ (set (reg:QI 17 [ ]) (and:QI (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 MEM[(volatile struct register_t *)5B]+0 S1 A16]) (const_int 3 [0x3])) (const_int 1 [0x1]))) (clobber (reg:CC 7 FLAG)) ]) "fail.c":22:18 10 {andqi3} (expr_list:REG_UNUSED (reg:CC 7 FLAG) (nil))) (jump_insn 12 10 20 3 (parallel [ (set (pc) (if_then_else (eq (reg:QI 17 [ ]) (const_int 0 [0])) (label_ref:QI 11) (pc))) (clobber (scratch:QI)) ]) "fail.c":22:9 41 {*cbranc
Re: modified_between_p does not check for volatile memory
Hi Richard, Thanks for getting back to me and your insight. I've implemented a TARGET_CANNOT_COPY_INSN_P that rejects volatile memory and it appears to be doing the trick! It will take some time to determine if there are any other side effects but I can move forward with this. Much appreciated! -Tucker On Tue, Oct 13, 2020 at 3:00 AM Richard Sandiford wrote: > Tucker Kern via Gcc writes: > > TL;DR > > > > In GCC 9.3, I believe modified_between_p should return 1 if the memory > > reference is volatile. My layman's understanding of volatile memory is > that > > it could change at any time, and thus could be modified between any two > > instructions. > > That's true, but in RTL, volatile accesses provide no timing guarantee, > even in an abstract sense. E.g. suppose we traced the execution of a > function call based on the instructions that exist immediately after > the expand pass. If a volatile access occurs in the Nth instruction > of this trace, there's no guarantee that the access will still occur > in the Nth instruction of traces for later passes. Unrelated > instructions can be added and removed at any time. > > For example, a volatile access should not stop us from swapping: > > (set (reg:SI R0) (const_int 0)) > > and: > > (set (mem/v:SI (reg:SI R1)) (const_int 0)) > > So… > > > Possible patch: > > diff --git a/gcc/rtlanal.c b/gcc/rtlanal.c > > index 01af063a222..a395df0627b 100644 > > --- a/gcc/rtlanal.c > > +++ b/gcc/rtlanal.c > > @@ -1308,6 +1308,8 @@ modified_between_p (const_rtx x, const rtx_insn > > *start, const rtx_insn *end) > >return 1; > > > > case MEM: > > + if (MEM_VOLATILE_P(x)) > > +return 1; > >if (modified_between_p (XEXP (x, 0), start, end)) > > return 1; > >if (MEM_READONLY_P (x)) > > …I think the current code handles volatile correctly. The issue > is whether any instruction in (START, END) might modify the same > memory as X. Most of the real work is done by the alias machinery, > which understands (or is supposed to understand) the special rules > around volatile memory and how it interacts with other memory accesses: > > for (insn = NEXT_INSN (start); insn != end; insn = NEXT_INSN (insn)) > if (memory_modified_in_insn_p (x, insn)) > return 1; > > > Details > > I'm writing a splitter to optimize bit field conditionals (e.g. testing a > > bit of memory mapped I/O). The non-optimized pattern consists of a SET, > > AND, LSHIFTRT + CMP, JMP pattern which I know can be simplified to an > > AND + JMP as the AND operation will set the necessary CC bits. To achieve > > this optimization in the combine pass a custom general_operand has been > > defined which allows volatile memory. > > > > However in certain situations I've found the combine pass is generating > > invalid patterns when it merges a SET and IF_THEN_ELSE (CMP + JMP) > > patterns. One such situation is when GCC decides to reuse the result of > the > > comparison for the return value. > > > > This issue seems intertwined with the combine + volatile_ok issue that > has > > been discussed a number of times in the past. The backend is question is > > highly embedded and significant performance gains are made by enabling > > volatile references during combine. > > > > Optimized tree of the test program. In this program the bit field value > is > > saved in _1 to be reused in the return value. Why it makes this choice is > > beyond me. > >_1; > > _Bool _3; > > > >[local count: 1044213930]: > > _1 ={v} MEM[(volatile struct register_t *)5B].done; > > if (_1 != 0) > > goto ; [11.00%] > > else > > goto ; [89.00%] > > > >[local count: 1073741824]: > > // Additional code trimmed for brevity... > > > >[local count: 114863532]: > > # _3 = PHI <0(4), _1(5)> > > return _3; > > > > This produces the following pre-combine RTL. > > (insn 9 6 10 3 (parallel [ > > (set (reg:QI 17 [ ]) > > (lshiftrt:QI (mem/v:QI (const_int 5 [0x5]) [1 > MEM[(volatile > > struct register_t *)5B]+0 S1 A16]) > > (const_int 3 [0x3]))) > > (clobber (reg:CC 7 FLAG)) > > ]) "fail.c":22:26 19 {*lshrqi3_im} > > (expr_list:REG_DEAD (reg/f:QI 18) > > (expr_list:REG_UNUSED (reg:CC 7 FLAG) > > (nil > > (insn 10 9 12 3 (parallel [ > > (set (reg:QI 17 [ ]) > >
Adjust offset of array reference in named address space
Hi, I'm implementing named addresses spaces for a Harvard architecture machine to support copying data from instruction memory to data memory. This is achieved via a special instruction. e.g. think AVR and progmem/__flash. However, the instruction memory is narrower than the data memory (12 vs 16 bits) on this machine. So a single data word is split across 2 instruction words. When copied from IMEM to DMEM the two parts are combined via SHIFT + OR patterns. This is all working fine for regular variables (i.e. int som_var), but it falls apart for array references (i.e. some_array[1]). Since the data is stored across 2 IMEM words, I need to scale the calculated offset of each array reference by 2. e.g. array[0] is actually stored in imem[0] & imem[1] and array[1] is stored in imem[2] & imem[3]. e.g. static __imem int imem_array[2]; return imem_array[1]; // needs to generate a symbol reference like &imem_array.869+2 Similarly if the array index was a function parameter, I need to scale the parameter by 2. __imem int imem_array[2]; int some_func(int a) { // a needs to be scaled by 2 when generating RTL/ASM return imem_array[a]; } I haven't found any target hooks that would allow me to override the offset calculation. Originally I thought I could handle it in a splitter but this approach didn't work for the function parameter example as I ended up scaling the entire address instead of just the offset. I had another thought of using a combo of TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark it as adjusted but I don't think this combo will work in the end. Is there any way to achieve this? Thanks, Tucker
Re: Adjust offset of array reference in named address space
> So you are saying that a 16bit data word in IMEM is actually two 12bit > data words (supposedly only the lower 8 bits used in each) and thus the > array contains "padding"? Effectively yes. The assembler handles dividing constants into their LSB and MSB components. I have insn patterns and splitters defined that emit the correct instructions to read and "pack" the value into a register or generic memory location. All I really need at this point is a means to augment how addresses (e.g. array offsets or struct members) are calculated in a non-generic address space. This doesn't feel like a far fetched idea as GCC currently supports address space specific legitimization and modes. On Mon, Jan 11, 2021 at 12:50 AM Richard Biener wrote: > On Sat, Jan 9, 2021 at 12:24 AM Tucker Kern via Gcc > wrote: > > > > Hi, > > > > I'm implementing named addresses spaces for a Harvard architecture > machine > > to support copying data from instruction memory to data memory. This is > > achieved via a special instruction. e.g. think AVR and progmem/__flash. > > > > However, the instruction memory is narrower than the data memory (12 vs > 16 > > bits) on this machine. So a single data word is split across 2 > instruction > > words. When copied from IMEM to DMEM the two parts are combined via > SHIFT + > > OR patterns. > > > > This is all working fine for regular variables (i.e. int som_var), but it > > falls apart for array references (i.e. some_array[1]). Since the data is > > stored across 2 IMEM words, I need to scale the calculated offset of each > > array reference by 2. e.g. array[0] is actually stored in imem[0] & > imem[1] > > and array[1] is stored in imem[2] & imem[3]. > > So you are saying that a 16bit data word in IMEM is actually two 12bit > data words (supposedly only the lower 8 bits used in each) and thus the > array contains "padding"? That's not really supported and is also not > the scope of named address spaces. I'd suggest you go down the route > of providing intrinsics for the transfer of data instead which could resort > to target specific builtin functions. > > > e.g. > > static __imem int imem_array[2]; > > return imem_array[1]; > > > > // needs to generate a symbol reference like > > &imem_array.869+2 > > > > Similarly if the array index was a function parameter, I need to scale > the > > parameter by 2. > > __imem int imem_array[2]; > > int some_func(int a) > > { > > // a needs to be scaled by 2 when generating RTL/ASM > > return imem_array[a]; > > } > > > > I haven't found any target hooks that would allow me to override the > offset > > calculation. Originally I thought I could handle it in a splitter but > this > > approach didn't work for the function parameter example as I ended up > > scaling the entire address instead of just the offset. > > > > I had another thought of using a combo of > > TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and > > TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark it as > > adjusted but I don't think this combo will work in the end. > > > > Is there any way to achieve this? > > > > Thanks, > > Tucker >
Re: Adjust offset of array reference in named address space
Richard, As always thanks for the insight. I'll take a look at those functions as well. -Tucker On Tue, Jan 12, 2021 at 8:37 AM Richard Biener wrote: > On Tue, Jan 12, 2021 at 4:29 PM Tucker Kern wrote: > > > > > So you are saying that a 16bit data word in IMEM is actually two 12bit > > > data words (supposedly only the lower 8 bits used in each) and thus the > > > array contains "padding"? > > > > Effectively yes. The assembler handles dividing constants into their LSB > and MSB components. I have insn patterns and splitters defined that emit > the correct instructions to read and "pack" the value into a register or > generic memory location. > > > > All I really need at this point is a means to augment how addresses > (e.g. array offsets or struct members) are calculated in a non-generic > address space. This doesn't feel like a far fetched idea as GCC currently > supports address space specific legitimization and modes. > > I think you'd need to hook this up in structure layouting which then > runs into the issue that > the address space is a qualifier and GCC internals expect layouts to > be compatible between > qualified and unqualified types. Also all target hooks in the layout > code mostly deal with > bitfields only. > > The offset is computed via get_inner_reference from expand_expr_real_* > so I don't > see a good way to handle this correctly. But maybe Joseph has an idea. > > Richard. > > > On Mon, Jan 11, 2021 at 12:50 AM Richard Biener < > richard.guent...@gmail.com> wrote: > >> > >> On Sat, Jan 9, 2021 at 12:24 AM Tucker Kern via Gcc > wrote: > >> > > >> > Hi, > >> > > >> > I'm implementing named addresses spaces for a Harvard architecture > machine > >> > to support copying data from instruction memory to data memory. This > is > >> > achieved via a special instruction. e.g. think AVR and > progmem/__flash. > >> > > >> > However, the instruction memory is narrower than the data memory (12 > vs 16 > >> > bits) on this machine. So a single data word is split across 2 > instruction > >> > words. When copied from IMEM to DMEM the two parts are combined via > SHIFT + > >> > OR patterns. > >> > > >> > This is all working fine for regular variables (i.e. int som_var), > but it > >> > falls apart for array references (i.e. some_array[1]). Since the data > is > >> > stored across 2 IMEM words, I need to scale the calculated offset of > each > >> > array reference by 2. e.g. array[0] is actually stored in imem[0] & > imem[1] > >> > and array[1] is stored in imem[2] & imem[3]. > >> > >> So you are saying that a 16bit data word in IMEM is actually two 12bit > >> data words (supposedly only the lower 8 bits used in each) and thus the > >> array contains "padding"? That's not really supported and is also not > >> the scope of named address spaces. I'd suggest you go down the route > >> of providing intrinsics for the transfer of data instead which could > resort > >> to target specific builtin functions. > >> > >> > e.g. > >> > static __imem int imem_array[2]; > >> > return imem_array[1]; > >> > > >> > // needs to generate a symbol reference like > >> > &imem_array.869+2 > >> > > >> > Similarly if the array index was a function parameter, I need to > scale the > >> > parameter by 2. > >> > __imem int imem_array[2]; > >> > int some_func(int a) > >> > { > >> > // a needs to be scaled by 2 when generating RTL/ASM > >> > return imem_array[a]; > >> > } > >> > > >> > I haven't found any target hooks that would allow me to override the > offset > >> > calculation. Originally I thought I could handle it in a splitter but > this > >> > approach didn't work for the function parameter example as I ended up > >> > scaling the entire address instead of just the offset. > >> > > >> > I had another thought of using a combo of > >> > TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS and > >> > TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P to scale the offset and mark > it as > >> > adjusted but I don't think this combo will work in the end. > >> > > >> > Is there any way to achieve this? > >> > > >> > Thanks, > >> > Tucker >