CSE not combining equivalent expressions.
Hello Everyone, I have the following source code static int i; static char a; char foo_gen(int); void foo_assert(char); void foo () { int *x = &i; a = foo_gen(0); a |= 1; /* 1-*/ if (*x) goto end: a | =1; /* -2--*/ foo_assert(a); end: return; } Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2. However the RTL code before the first CSE passthe RTL snippet is as follows (insn 11 9 12 0 (set (reg:SI 1 $c1) (const_int 0 [0x0])) 43 {*movsi} (nil) (nil)) (call_insn 12 11 13 0 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 39 {*call_value_direct} (nil) (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil))) (insn 13 12 14 0 (set (reg:SI 137) (reg:SI 1 $c1)) 43 {*movsi} (nil) (nil)) (insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ]) (reg:SI 137)) 43 {*movsi} (nil) (nil)) (insn 16 14 17 0 (set (reg:SI 138) (ior:SI (reg:SI 135 [ D.1217 ]) (const_int 1 [0x1]))) 63 {iorsi3} (nil) (nil)) (insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ]) (zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil) (nil)) (insn 18 17 19 0 (set (reg/f:SI 139) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8]) (subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil) (nil)) < expansion of the if condition ;; End of basic block 0, registers live: (nil) ;; Start of basic block 1, registers live: (nil) (note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 27 25 28 1 (set (reg:SI 142) (ior:SI (reg:SI 134 [ D.1219 ]) (const_int 1 [0x1]))) 63 {iorsi3} (nil) (nil)) (insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ]) (zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil) (nil)) (insn 29 28 30 1 (set (reg/f:SI 143) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8]) (subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil) (nil)) Now the problem is that the CSE pass doesnt identify that the source of the set in insn 27 is equivalent to the source of the set in insn 16. This It seems happens because of the zero_extend in insn 17. I am using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend gets generated and the result of the ior operation is immediately copied into memory. I am compiling this case with -O3. Can anybody please tell me how this problem can be overcome. TIA, Pranav
Re: CSE not combining equivalent expressions.
On 1/15/07, Richard Guenther <[EMAIL PROTECTED]> wrote: On 1/15/07, pranav bhandarkar <[EMAIL PROTECTED]> wrote: > Hello Everyone, > I have the following source code > > static int i; > static char a; > > char foo_gen(int); > void foo_assert(char); > void foo () > { >int *x = &i; >a = foo_gen(0); >a |= 1; /* 1-*/ >if (*x) goto end: >a | =1; /* -2--*/ >foo_assert(a); > end: >return; > } > > Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2. > However the RTL code before the first CSE passthe RTL snippet is as follows > > (insn 11 9 12 0 (set (reg:SI 1 $c1) >(const_int 0 [0x0])) 43 {*movsi} (nil) >(nil)) > > (call_insn 12 11 13 0 (parallel [ >(set (reg:SI 1 $c1) >(call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41] > ) [0 S4 A32]) >(const_int 0 [0x0]))) >(use (const_int 0 [0x0])) >(clobber (reg:SI 31 $link)) >]) 39 {*call_value_direct} (nil) >(nil) >(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) >(nil))) > > (insn 13 12 14 0 (set (reg:SI 137) >(reg:SI 1 $c1)) 43 {*movsi} (nil) >(nil)) > > (insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ]) >(reg:SI 137)) 43 {*movsi} (nil) >(nil)) > > (insn 16 14 17 0 (set (reg:SI 138) >(ior:SI (reg:SI 135 [ D.1217 ]) >(const_int 1 [0x1]))) 63 {iorsi3} (nil) >(nil)) > > (insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ]) >(zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil) >(nil)) > > (insn 18 17 19 0 (set (reg/f:SI 139) >(symbol_ref:SI ("a") [flags 0x2] )) 43 > {*movsi} (nil) >(nil)) > > (insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8]) >(subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil) >(nil)) > > <<<<< expansion of the if condition >>>>>>>> > > ;; End of basic block 0, registers live: > (nil) > > ;; Start of basic block 1, registers live: (nil) > (note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK) > > (insn 27 25 28 1 (set (reg:SI 142) >(ior:SI (reg:SI 134 [ D.1219 ]) >(const_int 1 [0x1]))) 63 {iorsi3} (nil) >(nil)) > > (insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ]) >(zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil) >(nil)) > > (insn 29 28 30 1 (set (reg/f:SI 143) >(symbol_ref:SI ("a") [flags 0x2] )) 43 > {*movsi} (nil) >(nil)) > > (insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8]) >(subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil) >(nil)) > > > > Now the problem is that the CSE pass doesnt identify that the source > of the set in insn 27 is equivalent to the source of the set in insn > 16. This It seems happens because of the zero_extend in insn 17. I am > using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend > gets generated and the result of the ior operation is immediately > copied into memory. I am compiling this case with -O3. Can anybody > please tell me how this problem can be overcome. CSE/FRE or VRP do not track bit operations and CSE of bits. To overcome this you need to implement such. Thanks. Another question I have is that, in this case, will the following http://gcc.gnu.org/wiki/Sign_Extension_Removal help in removal of the sign / zero extension ? Thanks in Advance, Pranav
Re: relevant files for target backends
I am wondering where to define the prototypes for functions in .c Shall the prototypes be defined in -protos.h or in .h or in .c. As far as I understand the prototypes should be defined in -protos.h, right? But if I do so several errors/warnings arise because of undeclared prototypes. Another question is where target macros should be defined. As far as I can see .c has something like such a structure: ---snip--- #define #define #define #define struct gcc_target targetm = TARGET_INITIALIZER; .h is used to define macros that give such information as the register classes, whether little endian or not, sizes of integral types etc. The file .c, like you rightly said defines the targetm structure that holds pointers to target related functions and data. Such functions are defined in the .c file. Such target hooks are #defined in the .c file. HTH, Pranav
Re: relevant files for target backends
On 1/16/07, Markus Franke <[EMAIL PROTECTED]> wrote: Thank you for your response. I understood everything you said but I am still confused about the file -protos.h. Which prototypes have to be defined there? Thanks in advance, Markus Franke IMO and from what I have gathered from the documentation, the prototypes of all the functions in the source file i.e. the .c file should go into -protos.h HTH, Pranav
Re: CSE not combining equivalent expressions.
Also this is removed for the case of integers by the CSE pass IIRC . The problem arises only for the type being a char or a short. Yes, That is true. With gcc 4.1 one of the 'or's gets eliminated for 'int'. I am putting below two sets of logs. The first just before cse_main and the second just after cse_main has returned but the trivially dead insns have not been deleted yet. Set 1: Before cse_main (note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn 11 9 12 0 (set (reg:SI 1 $c1) (const_int 0 [0x0])) 43 {*movsi} (nil) (nil)) (call_insn 12 11 13 0 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 39 {*call_value_direct} (nil) (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil))) (insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ]) (reg:SI 1 $c1)) 43 {*movsi} (nil) (nil)) (insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ]) (ior:SI (reg:SI 134 [ D.1214 ]) (const_int 1 [0x1]))) 64 {iorsi3} (nil) (nil)) (insn 16 15 17 0 (set (reg/f:SI 136) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32]) (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil) (nil)) <<>>> (note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 25 23 26 1 (set (reg/f:SI 139) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 26 25 27 1 (set (reg:SI 140) (ior:SI (reg:SI 133 [ D.1216 ]) (const_int 1 [0x1]))) 64 {iorsi3} (nil) (nil)) (insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 139) [2 a+0 S4 A32]) (reg:SI 140)) 43 {*movsi} (nil) (nil)) (code_label 29 27 30 2 2 ("end") [1 uses]) .. to function end Set 2: After cse_main (note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK) (insn 11 9 12 0 (set (reg:SI 1 $c1) (const_int 0 [0x0])) 43 {*movsi} (nil) (nil)) (call_insn 12 11 13 0 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 39 {*call_value_direct} (nil) (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil))) (insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ]) (reg:SI 1 $c1)) 43 {*movsi} (nil) (nil)) (insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ]) (ior:SI (reg:SI 134 [ D.1214 ]) (const_int 1 [0x1]))) 64 {iorsi3} (nil) (nil)) (insn 16 15 17 0 (set (reg/f:SI 136) (symbol_ref:SI ("a") [flags 0x2] )) 43 {*movsi} (nil) (nil)) (insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32]) (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil) (nil)) Expansion of the If condition >> (note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK) (insn 25 23 26 1 (set (reg/f:SI 139) (reg/f:SI 136)) 43 {*movsi} (nil) (expr_list:REG_EQUAL (symbol_ref:SI ("a") [flags 0x2] ) (nil))) (insn 26 25 27 1 (set (reg:SI 140) (ior:SI (reg:SI 134 [ D.1214 ]) (const_int 1 [0x1]))) 64 {iorsi3} (nil) (nil)) (insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32]) (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil) (nil)) (code_label 29 27 30 2 2 ("end") [1 uses]) to function end Therefore as I see it, cse_main has followed the following steps 1) found that the source of the set in insn 26 is equivalent to the source of the set in insn 15 and replaced the source in insn 26 with that from insn 15 2) Found the lhs of insn 15 to be equal to that of insn 26 and stored that instead in insn 27 thus making the result of insn 26 ( reg 140 ) unused ever again ( and insn 26 subsequently gets deleted) . However for the case of char or short, zero / sign extends are generated after the 'ior' operations and as a result the source of the second 'ior' is then not equal to the source of the first 'ior'.
Re: CSE not combining equivalent expressions.
On 1/17/07, Mircea Namolaru <[EMAIL PROTECTED]> wrote: > Thanks. Another question I have is that, in this case, will the following > > http://gcc.gnu.org/wiki/Sign_Extension_Removal > > help in removal of the sign / zero extension ? First, it seems to me that in your case: (1) a = a | 1 /* a |= 1 */ (2) a = a | 1 /* a |= 1 */ the expressions "a | 1" in (1) and (2) are different as the "a" is not the same. So there is nothing to do for CSE. If the architecture has an instruction that does both the store and the zero extension, the zero extension instructions become redundant. The sign extension algorithm is supposed to catch such cases, but I suspect that in this simple case the regular combine is enough. Mircea Thanks for the info. I went through the documentation provided by you in see.c, which I must add is very comprehensive indeed, and realised that we need an instruction that does a zero extend before a store so that that the extension instructions become redundant and can be removed. Thank you, Pranav
Re: CSE not combining equivalent expressions.
On 1/18/07, Richard Kenner <[EMAIL PROTECTED]> wrote: > I'm not immediately aware of too many cases where lowering the IL is > going to expose new opportunities to track and optimize nonzero/zero > bits stuff. Bitfield are the big one. If you have both bitfield and logical operations, you can often merge the logical operations with those used to retrieve and/or store the field. Things such as x.y |= 1; where Y is a bitfield and X non-BLKmode would be a large sequence of logical operations that can all be replaced by a single OR insn at the RTL level but presents no optimization opportunities at the tree level. I have found another case where a zero / sign extend is inhibiting optimization extern char b; extern char c; int main() { b <<= 1; b <<= 1; b <<= 1; c = b; return 0; } Here again a zero extend gets generated after every 'ashift' whereas we are interested only in the lower order 8 bits. However when i try the same thing with int instead of char i.e. there is no need for extension then the operations get converted into b<<=3 instead of 3 instructions. regards, Pranav
Use of INSN_CODE
Hi All, I am using recog_memoized in the machine dependent reorg pass. However, It is causing an ICE because unwittingly a CODE_LABEL is getting passed to it. I understand that CODE_LABEL is in the RTX_EXTRA class and intuitively It is wrong to use INSN_CODE ( which is used in recog_memoized) on CODE_LABEL simply because it is not int the RTX_INSN class. However, the internals only warn against using INSN_CODE on use, clobber, asm_input, addr_vec, addr_diff_vec. There is no mention of other members of the other members of RTX_EXTRA. or shouldnt recog_memoized have an INSN_P check in it ? Am I missing something here ? TIA, Pranav
A combiner type optimization is causing a failure
Hello all, I added a small optimization which does the following . It converts a = a + 1 if ( a > 0 ) to if ( a > -1) a is a signed int. However this is causing 920612-1.c to fail, which is reproduced below for convenience. f(j)int j;{return++j>0;} main(){ if(f((~0U)>>1)) abort(); exit(0); } The problem is that this testcase passes the number 2147483647 (int is 4 bytes for my architecture) to which if 1 is added an overflow will occur. Since I remove the increment operation in 'f' through the optimization 2147483647 never gets incremented and 'f' always returns 1 and the testcase fails. My question is that, IMO the test is checking overflow behaviour. Is it right to have such a test ? Regards, Pranav
Re: A combiner type optimization is causing a failure
On 2/22/07, Paolo Bonzini <[EMAIL PROTECTED]> wrote: > My question is that, IMO the test is checking overflow behaviour. Is > it right to have such a test ? Would you care to prepare a patch that moved it under gcc.dg, adding a { dg-options "-O2 -fno-strict-overflow" } marker (or maybe "-O2 -fno-wrapv")? But your optimization should also be conditional on whether strict overflow behavior is requested. Paolo Thanks, I will prepare a patch. A 'strict overflow behaviour' check also makes sense. Thank you, Pranav
Re: Question about Temporary Outputs
On Thu, Mar 31, 2011 at 4:04 PM, Iyer, Balaji V wrote: > Hello Everyone, > I see in GCC that when we use the flag "-f-tree-optimized" it > will dump the contents of the input file after doing all the tree-based > optimization. Is it possible for me to modify this file and then submit it > back into gcc for processing to create an executable/assembly dump? It has been a while since I have worked on GCC - back then (a couple of years ago) this was not possible. I do not have reason to believe this would have changed. Pranav
COMPONENT_REF problem ?
Hi, Is it possible for a component_ref node to have its arg 0 to be NULL ? I would think not because from tree.def I gather that arg 0 tells me what structures field this component_ref refers to. For convenience, I have pasted here what tree.def tells me about a component_ref /* Value is structure or union component. Operand 0 is the structure or union (an expression). Operand 1 is the field (a node of type FIELD_DECL). Operand 2, if present, is the value of DECL_FIELD_OFFSET, measured in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT. */ DEFTREECODE (COMPONENT_REF, "component_ref", tcc_reference, 3) I am working on a target hook wherein I use the MEM_EXPR of a mem rtx. It returns the following component_ref node unit size align 32 symtab 0 alias set -1 canonical type 0x2b359976d6c0 precision 32 min max \ pointer_to_this > arg 1 used nonlocal decl_3 SI file ../src/synth_core/QDSP6/svrreg.h line 134 col 11 size unit siz\ e align 32 offset_align 64 offset bit offset context chain used nonlocal decl_3 SI file ../src/synth_core/QDSP6/svrreg.h line 135 col 11 size unit\ size align 32 offset_align 64 offset bit offset context chain >>> Following this if I do exp = TREE_OPERAND ( comp_ref_node, 0); I get "exp" as NULL ? Is this possible ? or Is there something I am doing wrong ? or There is something fishy here with the tree node that MEM_EXPR is giving me ? Thanks, Pranav
Re: COMPONENT_REF problem ?
Richard, > If you are not working on trunk this can happen because the way > MEM_EXPRs are "canonicalized". Thanks. Yes, I am not on trunk and may not be able to move right away. I would appreciate some pointers about where I should look, If I want to fix this ? Thanks, Pranav
Re: COMPONENT_REF problem ?
> Look at > > 2009-07-14 Richard Guenther > Andrey Belevantsev > > * tree-ssa-alias.h (refs_may_alias_p_1): Declare. > (pt_solution_set): Likewise. > * tree-ssa-alias.c (refs_may_alias_p_1): Export. > * tree-ssa-structalias.c (pt_solution_set): New function. > * final.c (rest_of_clean_state): Free SSA data structures. > ... > * emit-rtl.c (component_ref_for_mem_expr): Remove. > (mem_expr_equal_p): Use operand_equal_p. > (set_mem_attributes_minus_bitpos): Do not use > component_ref_for_mem_expr. > ... > > this change. > > Richard. Great. Thanks. Pranav
Dead Store Elimination
Hi, A possible silly question about the dead store elimination pass. From the documentation it is clear that the store S1 below is removed by this pass (in dse.c) *(addr) = value1; // S1 . . *(addr) = value2 // S2 .. No read of "addr" between S1 and S2. .. = *(addr) // Load ... end_of_the_function However, consider a different example. *(addr) = value1; // S1 .. . end_of_the_function. i.e. there is no store Sn that follows S1 along any path from S1 to the end of the function and there is no read of addr following S1 either. Is the dse pass expected to remove such stores ? (I am inclined to think that it should, but I am seeing a case where dse doesnt remove such stores) . Further is the behaviour expected to be different if the "addr" is based on "fp" ? TIA, Pranav
Re: Dead Store Elimination
> Are you talking about the tree dead-store elimination pass or > the RTL one? Basically *addr = value1; cannot be removed > if addr does not point to local memory or if the pointed-to > memory escapes through a call-site that is dominated by this store. I am talking about the RTL dead-store elimination. In my case addr is based on the stack pointer and the store is to a local variable on the stack. Thanks, Pranav
Re: implementing load 8 byte instruction
On Thu, Mar 18, 2010 at 10:29 AM, roy rosen wrote: > Hi, > > I am trying to implement a simple load 8 bytes instruction. > I tried to use movdi so that it would allocate two sequential > registers for the load. > It starts well but in pass subreg1 the insns are decomposed and all DI > operands are replaced with SI. > > I understand that this is a desireable optimzation but then the load > is done using two load 4 bytes instructions. > > Does anybody has any idea what should I do? Could it be a problem with the constraints in your movdi define_insn ? Pranav
Unable to access archives on gcc-patches
Hi, I was having trouble with building gcc and found that the problem i was having had been reported earlier and a patch to fix that had been submitted in feb 2004. However when i try to access the mailing list archives i am able to reach the index page for feb 2004. the link to which is given as: http://gcc.gnu.org/ml/gcc-patches/2004-02/ However the patch that i am looking for is not accessible. the link to the patch is http://gcc.gnu.org/ml/gcc-patches/2004-02/msg00826.html Just to check, i tried accessing other patches from the index link but could not access most of them. Can anybody please help me out. Thanks in advance, Pranav -- "So far as I am able to judge, nothing has been left undone, either by man or nature, to make India the most extraordinary country that the sun visits on his rounds. Nothing seems to have been forgotten, nothing overlooked." Mark Twain
The effects of closed loop SSA and Scalar Evolution Const Prop.
Hi, I am writing about a problem I noticed with the code generated for memcpy by arm-none-eabi-gcc. Now, memcpy has three distinct loops - one that copies (4 *sizeof (long) ) bytes per iteration, one that copies sizeof (long) bytes per iteration and the last one that copies one byte per iteration. The registers used for the src and destination pointers should, IMHO, be the same across all the three loops. However, what I noticed is that after the first loop the src and dest registers arent reused but the address of the next byte to be copied is recalculated as (original src address + (number of iterations of 1st loop * 16)) . Similarly for the destination address too. See the assembly snippet below. .L3: bhi .L3 @, sub r2, r4, #16 @ D.1312, len0, mov r3, r2, lsr #4 @ D.1314, D.1312, sub r1, r2, r3, asl #4 @ len, D.1312, D.1314, add r3, r3, #1 @ tmp225, D.1314, mov r3, r3, asl #4 @ D.1320, tmp225, cmp r1, #3 @ len, add r4, r5, r3 @ aligned_src.60, src0, D.1320
Problem with Fix for PR 35163 ?
Hi, Consider the attached testcase. Working on a private port (Infact I see this problem on arm-none-eabi-gcc too). I see the following in test.c.003t.original fail = (short int) usi <= ssi; And then in test.c.025t.ssa usi.2_5 = (short int) usi_4; fail.3_6 = usi.2_5 <= ssi_2; Now ccp1 does constant propagation and we are left with usi.2_5 = -256; This causes the test to fail. Clearly the problem seems to be that since usi is unsigned short int a short int cant represent all the possible values of usi I reverted the following patch and the test passed. PR middle-end/35163 * fold-const.c (fold_widened_comparison): Use get_unwidened in value-preserving mode. Disallow final truncation. Now with the patch reverted, test.c.003t.original has fail = (int) ssi >= (int) usi; And this problem vanished. Am I missing something here ? Thanks, Pranav int fail; short fs2(void) { return 126; } unsigned short ufs1(void) { return 65280; } int main () { short ssi; unsigned short usi; ssi = fs2(); usi = ufs1(); fail = !(ssi < usi); if (fail) printf ("Failed\n"); else printf ("Successful\n"); return 0; }
Re: Problem with Fix for PR 35163 ?
> Btw, I have a fix. oh gr8. I just saw your post on the gcc-patches. Do you still want me to add this to PR35163 for the record ? Cheers! Pranav
Common Subexpression Elimination Opportunity not being exploited
Hi, I have a case where the code looks roughly like foo = i1 i2; if (test1) bar1 = i1 i2; if (test2) bar2 = i1 i2; This can get converted into reg = i1 i2 foo = reg if (test1) bar1 = reg if (test2) bar2 = reg GCC 4.3 does fine here except when the operator is "logical and" (see attached. test.c uses logical and and test1.c uses plus) It seems that shortcut evaluation of the "logical and" throws off common subexpression elimination. i.e the code generated is like foo = (i1 != 0 ) && (i2 != 0) if (test1) { if (i1 != 0) then lab2: else lab 3: lab2: if (i2 != 0) then lab 4 lab 3: temp = 0; lab 4 : temp = 1 bar1 = temp } if (test2) { same as the body of the above if condition. } The entire body of the two if stmts can be pulled out but it isnt. This kind of cse works when the operator is an arithmetic operator like +. Am I missing something here ? Is there something I can do to improve the case with the "logical and" May be this is difficult on hardware that uses the CC register ? But I am working on a private port that doesnt use the CC register. Thanks, Pranav short ione; short itwo; short ithree; short ifour; short ifive; int one_array[50]; short a[50]; void foo () { ione = gen_short(1); itwo = gen_short(1); ithree = gen_short(1); ifour = gen_short(1); ifive = gen_short(1); a[0] = ione && itwo && ithree && ifour && ifive; if (one_array[1]) a[1] = ione && itwo && ithree && ifour && ifive; if (one_array[2]) a[2] = ione && itwo && ithree && ifour && ifive; return; } short ione; short itwo; short ithree; short ifour; short ifive; int one_array[50]; short a[50]; void foo () { ione = gen_short(1); itwo = gen_short(1); ithree = gen_short(1); ifour = gen_short(1); ifive = gen_short(1); a[0] = ione + itwo + ithree + ifour + ifive; if (one_array[1]) a[1] = ione + itwo + ithree + ifour + ifive; if (one_array[2]) a[2] = ione + itwo + ithree + ifour + ifive; return; }
Re: Pipeline hazards and delay slots
Hi Mohammed, > But how can i handle instances like this? Should i be doing insertion > of nops in reorg pass? FWIW, I had worked on a port for VLIW processor about three years back and IIRC we had used the reorg pass for inserting the nops. I think if you look at the scheduler dumps you will notice that the scheduler would have, in all likelihood, accounted for the delay of 1 cycle between the "lw" and the "add" instructions. Only that you will have to put the "nop" yourself between these two instructions. cheers! Pranav
Re: How to use gcc source and try new optmization techniques
Hi, I may not have correctly understood your questions but from what I understand I think you mean to ask how you could easily plug in your optimization pass into GCC so as to test your implementation of some optimization. Well, the way to do that would be to understand the pass structure and decide where in the order of passes should your pass be inserted i.e after which and before which other pass should your optimization pass fit in. Look at passes.c to see how the order of passes is specified. Once you have told the compiler when to execute your pass (primarily through passes.c) and provided your optimization has been correctly implemented in the context of GCC you should be good to go. HTH, Pranav On Wed, Aug 20, 2008 at 7:45 PM, Rohan Sreeram <[EMAIL PROTECTED]> wrote: > Hi, > > I am a student in Utah State University researching on compilers optimization > techniques. > I wanted to know how I could use gcc for experimenting with optimization. > > Here is what I intend to do: > > 1) Understand the control flow graphs being generated by GCC, which I could > build using the -fdump-tree-cfg option. > 2) Write code that could convert CFG to a different graph. > 3) Use this new graph for optimization. > 4) Use the optimized graph to generate machine code (say Intel architecture > for now). > > Please let me know how I could leverage the GCC code for this purpose, and if > you any suggestions/comments for me you are most welcome ! :) > > Thanks, > Rohan > > > > >
Re: Extending RTL expansion and CG with a new operation
kerneltest.c:22: error: unrecognizable insn: (jump_insn 26 25 29 3 (set (pc) (create_body_after (cre (reg:DI 75) (const_int 0 [0x0])) (label_ref 13) (pc))) -1 (nil) (nil)) kerneltest.c:22: internal compiler error: in extract_insn, at recog.c:2096 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. Find a pattern that is close to this one in your md file ( i.e. the define_insn that you think should match this pattern, assuming such a define_insn exists in your md file) and then try to check the reason for recog crashing. IMO, the predicate may not be passing. regards, Pranav
Re: Execute test fails in gcc testsuite
On 7/18/07, Venkatesan Jeevanandam <[EMAIL PROTECTED]> wrote: I am working on the testsuite for a new crosscompiler hosted on x86 Platform, While performing execute test using gcc testsuite, I am getting the error message in execute test /tmp/2112-1.x0: /tmp/2112-1.x0: cannot execute binary file I know, have to use cross compiler simulator, for executing $Arch-sim /tmp/2112-1.x0 But I don't know where to mention this configuration. Since you havent mentioned anything about your .exp file in dejagnu/baseboards, I am assuming you havent already written one. Therefore you will need to write a .exp file ( e.g arm-sim.exp) file and put it in the dejagnu/baseboards directory. Then while doing make check-gcc pass it using the RUNTESTLFAG command line argument. Typically, $> make check-gcc RUNTESTFLAGS="--target_board=-sim" HTH, Pranav
CSE removing a load that is necessary
Hi All, I am working on a private port and am seeing the following problem. For a function returning a double the value is stored by the function in memory. cse removes one of the two loads (to retrieve this returned value) after the function is called. To elaborate, the following is the dump just before cse. (insn 44 43 45 2 test.c:388 (set (reg:SI 1 $c1) (reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (nil))) (insn 45 44 46 2 test.c:388 (set (reg:SI 2 $c2) (reg:SI 136 [ D.1517 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (nil))) (call_insn 46 45 49 2 test.c:388 (parallel [ (call (mem:SI (symbol_ref:SI ("__floatunsidf") [flags 0x41]) [0 S4 A32]) (const_int 0 [0x0])) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil))) (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2)) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil (insn 49 46 116 2 test.c:388 (clobber (reg:SI 179)) -1 (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (nil))) (insn 116 49 47 2 test.c:388 (clobber (reg:SI 180 [+4 ])) -1 (nil)) (insn 47 116 48 2 test.c:388 (set (reg:SI 179) (mem/c/i:SI (reg/f:SI 112 *fp*) [7 S4 A32])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (nil))) (insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ]) (mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*) (const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ])) cse modifies insn 48 as (insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ]) (reg:SI 178 [+4 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2]) (expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ])) (nil (nil and also replaces every subsequent use of (reg:SI 180 [+4 ]) with (reg:SI 178 [+4 ]) thus making the above load dead, which gets subsequently removed. This way the result of the function call is lost. My take is that insn 48 should have a REG_RETVAL note ( Infact it does have this but the note is removed by lower_subreg) and cse should be careful when REG_RETVAL and REG_EQUAL appear in the same insn. Is this the right way of going about it ? Sorry for a rather verbose post. Thanks in advance, Pranav
Re: CSE removing a load that is necessary
> Where does reg 178 come from? It does not appear in the other insns > you listed. I am sorry, my mistake. I meant to say that the dump was only a part of the entire dump of the function. reg 178 is the result of a previous call to __floatsidf and is defined by the following insn. (insn 19 18 21 2 test.c:356 (set (reg:SI 178 [+4 ]) (mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*) (const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ])) (nil > Am I reading your code correctly when it appears that the > __floatunsidf function returns a value in memory rather than via a > register? For our architecture, we have had to follow this convention of returning a double value by storing it in memory and loading it after the function call. The ABI reserves only one SI mode register for return values and hence the use of memory for returning a double. > If lower-subreg split up the load from memory, then it was correct to > remove the REG_RETVAL note. There may be a bug here in that it should > also remove the REG_EQUAL note in that case. It may be that > remove_retval_note needs to look for and remove a REG_EQUAL note. Ok, so the approach should be to fix remove_retval_note to have it remove REG_EQUAL note too rather than not call remote_retval_note at all. I will submit a patch for comments. Thanks, Pranav
Re: CSE removing a load that is necessary
Hi, > Or perhaps this could be another manifestation of the "cse gets confused by > reg_equal notes on subparts of dimode pseudos if no movdi pattern is defined > in the backend" bug[*]? Pranav, is there a movdi pattern in your backend? > There needs to be one, gcc does get it wrong if you rely on it to break > everything down to si-sized movs. Yes, It looks like a similar problem, but there seems to be no consensus on a correct solution to this problem. I couldnt find the bug number but this thread describes the exact same problem ( but with REG_EQUIV notes). http://gcc.gnu.org/ml/gcc/2001-02/msg01372.html Thanks, Pranav
ICE on valid code, cse related
Hi, I am working on a private port and getting an ICE in valid code. This mainly is because of the following ( which is a part of the entire dump of RTL of the source file) (insn 13 8 14 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 138) (const_int 0 [0x0])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 0 [0x0]) (nil))) (insn 14 13 15 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 1 $c1) (reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (insn_list:REG_LIBCALL 17 (nil (insn 15 14 16 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 2 $c2) (reg:SI 138)) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (nil))) (call_insn 16 15 18 2 /fc3/testcases/reduce/testcase-min.i:8 (parallel [ (call (mem:SI (symbol_ref:SI ("__floatsisf") [flags 0x41]) [0 S4 A32]) (const_int 0 [0x0])) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (expr_list:REG_EH_REGION (const_int -1 [0x]) (nil))) (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2)) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1)) (nil (insn 18 16 17 2 /fc3/testcases/reduce/testcase-min.i:8 (clobber (reg:SF 139)) -1 (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (nil))) (insn 17 18 19 2 /fc3/testcases/reduce/testcase-min.i:8 (set (subreg:SI (reg:SF 139) 0) (mem/c/i:SI (reg/f:SI 112 *fp*) [2 S4 A32])) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138)) (nil Note the REG_EQUAL note of insn 17. cse tries to replace reg:SI 138 with a constant and because of insn 13, the note becomes (float:SF (const_int 0)) which in turn cse converts into REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0]) and when CONST_DOUBLE_LOW is done on the above, the compiler crashes - " internal compiler error: RTL check: expected code 'const_double' and mode 'VOID', have code 'const_double' and mode 'SF' in plus_constant, at explow.c:103" i.e the compiler is crashing after converting a const_int to an SFmode value. Could this possibly be a generic issue or a problem with my backend ( as in will I need to define movsf in my backend, which isnt defined at present ) ? Regret the rather verbose post. Thanks in advance, Pranav
Re: ICE on valid code, cse related
> Who is calling CONST_DOUBLE_LOW on this value? plus_constant calls CONST_DOUBLE_LOW on this value. simplify_binary_operation_1 calls plus_constant ( while trying to simplify PLUS on (const_double:SF 0 [0x0] 0.0 [0x0.0p+0]) & (const_int -2147483648 [0x8000]) ), which in turn calls CONST_DOUBLE_LOW. Thanks, Pranav
Re: ICE on valid code, cse related
> How can we have a PLUS on a CONST_DOUBLE and a CONST_INT? That does > not make sense, as there is no MODE argument that could make this work > correctly. From your description, MODE must be some integer mode, in > which case it is wrong to be using a CONST_DOUBLE in SFmode. > > (I don't know where the bug is; I'm just trying to help pin it down.) Here it is!! The problem is that for the following insn(insn 20 19 21 2 (set (reg:SI 141) (xor:SI (subreg:SI (reg:SF 139) 0) (reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL (const_double:SF 0 [0x0] -0.0 [-0x0.0p+0]) (nil))) reg:SI 140 is known to have the constant value (const_int -2147483648 [0x8000])) and (subreg:SI (reg:SF 139) 0) is known to have the value (const_double:SF 0 [0x0] 0.0 [0x0.0p+0]) [ as described by my previous post in the same thread ] Now, simplify_binary_operation_1 in simplify-rtx.c tries to /* Canonicalize XOR of the most significant bit to PLUS. */ (simplify-rtx.c:2203) and this results in a PLUS on CONST_INT and CONST_DOUBLE. maybe there should be a better check before canonicalizing here ? Thanks, Pranav
Re: ICE on valid code, cse related
> reg:SI 140 is known to have the constant value > (const_int -2147483648 [0x8000])) Wasnt clear maybe in my previous post. reg:SI 140 is known to have this const_int value from a previous copy into it - here (insn 19 17 20 2 /fc3/scratchpad/testcase-min.i:8 (set (reg:SI 140) (const_int -2147483648 [0x8000])) 44 {*movsi} (nil)) Thanks, Pranav
Re: ICE on valid code, cse related
> (reg:SF 139) can hold the value (const_double:SF 0) but (subreg:SI > (reg:SF 139)) should be the value (const_int 0). Perhaps the problem > is how we handle a REG_EQUAL note when the destination of the set is a > SUBREG. (subreg:SI (reg:SF 139), 0) shouldnt be able to hold the value (float:SF (reg:SI 138) ? Am I right on this ? Because the expander generates the following while storing the return value (insn 17 18 19 testcase-min.i:8 (set (subreg:SI (reg:SF 139) 0) (mem/c/i:SI (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1 (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138)) (nil) The answer to this question will help me decide where to fix the problem, In the expander itself or while processing REG_EQUAL in the cse pass. Thanks, Pranav
Re: ICE on valid code, cse related
Hi, > Pranav, although there is indeed a bug in the mid-end here, from your point > of view the simple and effective workaround should be to implement a movdi > pattern (and movsf and movdf if you don't have them yet: it's an absolute > requirement to implement movMM for any modes you expect your machine to > handle) in the backend. This won't fix the underlying bug, but it'll stop it > from affecting you, and you'll get better codegen all round into the bargain > if you expand movdi early. It worked!!! I implemented the movsf pattern ( and also movdf so that the absence of a movdf also doesnt wont affect me in the future). Due to the movsf pattern, the return value is now restored with (insn 17 16 18 testcase-min.i:8 (set (reg:SF 139) (mem/c/i:SF (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1 (expr_list:REG_LIBCALL_ID (const_int 1 [0x1]) (insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138)) (nil) i.e. there is no subreg in the destination. Later in cse when the above REG_EQUAL (float:SF (reg:SI 138) note is converted into REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0] , It doesnt replace (subreg:SI (reg:SF 139) 0) in the insn (set (reg:SI 141) (xor:SI (subreg:SI (reg:SF 139) 0) (reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL (const_double:SF 0 [0x0] -0.0 [-0x0.0p+0]) (nil))) and the compiler doesnt crash :) Thanks Dave and Ian for your help!! cheers! Pranav
Re: How to add target specific dependency?
On 8/23/07, petruk_gile <[EMAIL PROTECTED]> wrote: > > Hi all .. > > I'm currently porting GCC into a new processor, and I have a problem in > instruction scheduling ... > > The case is like this: > In the machine description (*.md) file, sometimes I emit a single RTL > instruction into multiple ASM instruction. The problem is, in some case I > need to emit an operand that actually doesn't exist in its RTL > representation. For example : > > "movpqi_insn x,y" instruction will be translated as ==> "sar x, *ar15" and > "lar y, *ar15" > > Where "sar" means "Store register" and "lar" means "load register" ... > > Since GCC performs instruction scheduling in RTL form, it doesn't know that > instruction "movpqi_insn" actually reads AR15 Hence, sometimes GCC > moves an instruction that actually SHOULD NOT be moved, due to data > dependence in AR15, and it causes incorrect scheduling (This is my > analysis, please tell me if you guys think I'm wrong)... > > So, I need to: > (1) whether disable the scheduling for that particular dependency, or > (2) Inform GCC that "movpqi_insn" has an additional dependency in AR15 > > The problem is, I still don't know how can i do those 2 things ... So if any > of you have any advice, I'd be really grateful :D Ideally you could either split movpqi_insn (into a store and a load) during expansion or split the insn appropriately just before instruction scheduling (using a define_insn_and_split) . cheers! Pranav
Identifying a block copy
Hi, consider the following code, struct x { int a; int b; int c; int d; int e[120];}; struct x *a, *b; void foo ( ) { *a = *b; } Now for the stmt int the function foo a memcpy will be generated. However, this can be tail call optimized. My aim is to identify such opportunities in find_tail_calls in tree-tailcall.c. However, for the stmt *a.0_1 ={v} *b.1_2; var_can_have_subvars return zero for *a.0_1 . But a.0_1 is pointer to a structure and *a.0_1 is a structure and therefore can have subvars. The comment for var_can_have_subvars says /* Return true if V is a tree that we can have subvars for. Normally, this is any aggregate type. Also complex types which are not gimple registers can have subvars. */ IMHO, var_can_have_subvars for the above case should return true, but it doesnt because it fails the following test in var_can_have_subvars. /* Non decls or memory tags can never have subvars. */ if (!DECL_P (v) || MTAG_P (v)) return false; Am I missing something here ? TIA, Pranav
Re: Identifying a block copy
On 10/9/07, Daniel Berlin <[EMAIL PROTECTED]> wrote: > Yes > we do not create subvars for non-named memory locations. IE random > pointer dereferences. > > This is mainly because it would require a lot of time and memory in > the compiler. > > It was done because most optimizers rely solely on vdef/vuses, instead > of further disambiguating the memory dependence chains. ok, makes sense. > In any case, it's not clear why you care about subvars at all here. > If you want to identify block copies, simply look for assignments > between AGGREGATE_TYPE_P trees. Aha, this would be neater. Thanks for pointing me in the right direction. cheers! Pranav
Problem with too many virtual operands ( tree-ssa-operands.c:484)
Hi, In the attached testcase due to an ivopts modification, while rewriting the uses the compiler crashes in tree-ssa-operands.c because the number of virtual operands of the modified stmt is much greater than the thresholds controlled by OP_SIZE_{1,2,3} in tree-ssa-operands.c. I went through http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01269.html and it seemed to me that these values (OP_SIZE etc) have been quite experimentally set. I am wondering If these values should be increased. I did increase these values and the attached testcase compiled fine. I examined the resulting ivopts dump and the modifications seems valid to me. I have attached two dumps ( before ivopts - i.e 105t.cunroll and after ivopts - i.e 107t.ivopts ) . Note that post ivopts dump was generated only after changing OP_SIZE_3 to 700 ( a randomly high value ). For a quick check of the dumps, note that D.1281_6 = Hoopster_ptr_17->Magic; gets changed to D.1281_6 = MEM[index: ivtmp.840_5]; I am wondering If increasing OP_SIZE_{1,2,3} is the way to go. Partly not convinced because it means that the problem could hit again with nastier code. TIA, Pranav testcase-min.i Description: Binary data testcase-min.i.105t.cunroll Description: Binary data testcase-min.i.107t.ivopts Description: Binary data
Reload using a live register to reload into
Hi, Working on a private port I am seeing a problem with reload clobbering a live register and thus causing havoc. Consider the following snippet of the code dump. (note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (note:HI 86 85 89 5 NOTE_INSN_DELETED) (insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ]) (sign_extend:SI (subreg:HI (reg:SI 206 [ ivtmp.103 ]) 0))) 86 {extendhisi2} (nil)) (insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ]) (reg:SI 208 [ ivtmp.101 ])) 45 {*movsi} (nil)) (insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ]) (reg/v/f:SI 236 [ h ])) 45 {*movsi} (nil)) (insn:HI 270 88 91 5 cor_h.c:129 (set (reg:SI 4 $c4) (const_int 0 [0x0])) 45 {*movsi} (nil)) (call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4) (expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ]) (expr_list:REG_DEAD (reg:SI 2 $c2 [ h ]) (nil (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4)) (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ])) (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ])) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ])) (nil)) (insn:HI 92 91 285 5 cor_h.c:129 (set (reg/v:SI 230 [ s ]) (reg:SI 1 $c1)) 45 {*movsi} (expr_list:REG_DEAD (reg:SI 1 $c1) (nil))) (jump_insn:HI 285 92 286 5 (set (pc) (label_ref 118)) 8 {jump} (nil)) ;; End of basic block 5 -> ( 10) The register $c1 is used to pass the first argument to the function DotProductWithoutShift On encountering the call_insn ( insn no 91) global.c inserts a store to save the register $c16 (which contains a variable 'tot' and $c16 is a caller save register ). Hence the following insn is inserted just before the call to DotProductWithoutShift. (insn 309 270 91 5 cor_h.c:129 (set (mem/c:SI (plus:SI (reg/f:SI 29 $sp) (const_int 176 [0xb0])) [11 S4 A32]) (reg:SI 16 $c16)) 45 {*movsi} (nil)) However the index 176 is too large and (plus:SI (reg/f:SI 29 $sp) (const_int 176 [0xb0])) needs to be reloaded. $c1 gets chosen for this reload and now the dump snippet looks like (note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK) (note:HI 86 85 89 5 NOTE_INSN_DELETED) (insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ]) (sign_extend:SI (reg:HI 14 $c14 [orig:206 ivtmp.103 ] [206]))) 86 {extendhisi2} (nil)) (insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ]) (reg:SI 8 $c8 [orig:208 ivtmp.101 ] [208])) 45 {*movsi} (nil)) (insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ]) (reg/v/f:SI 22 $c22 [orig:236 h ] [236])) 45 {*movsi} (nil)) (insn:HI 270 88 329 5 cor_h.c:129 (set (reg:SI 4 $c4) (const_int 0 [0x0])) 45 {*movsi} (nil)) (insn 329 270 330 5 cor_h.c:129 (set (reg:SI 1 $c1) (const_int 176 [0xb0])) 45 {*movsi} (nil)) (insn 330 329 309 5 cor_h.c:129 (set (reg:SI 1 $c1) (plus:SI (reg:SI 1 $c1) (reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 29 $sp) (const_int 176 [0xb0])) (nil))) (insn 309 330 91 5 cor_h.c:129 (set (mem/c:SI (reg:SI 1 $c1) [11 S4 A32]) (reg:SI 16 $c16)) 45 {*movsi} (nil)) (call_insn:HI 91 309 332 5 cor_h.c:129 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 42 {*call_value_direct} (nil) (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4)) (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ])) (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ])) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ])) (nil)) (insn 332 91 333 5 (set (reg:SI 4 $c4) (const_int 176 [0xb0])) 45 {*movsi} (nil)) (insn 333 332 310 5 (set (reg:SI 4 $c4) (plus:SI (reg:SI 4 $c4) (reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV (plus:SI (reg/f:SI 29 $sp) (const_int 176 [0xb0])) (nil))) (insn 310 333 285 5 (set (reg:SI 16 $c16) (mem/c:SI (reg:SI 4 $c4) [11 S4 A32])) 45 {*movsi} (nil)) (jump_insn:HI 285 310 286 5 (set (pc) (label_ref 118)) 8 {jump} (nil)) ;; End of basic block 5 -> ( 10) clearly the register $c1 which after insn 87 has the first argument of the function DotProductWithoutShift is overwritten. The file.c.176r.greg for insn 309 says "Spilling for insn 309. Using reg 6 for reload 0" and indeed rld[0].regno is 6 and rld[0].in is (plus:SI
Re: About VLIW backend
On 06 Nov 2007 21:50:09 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote: > Li Wang <[EMAIL PROTECTED]> writes: > > > I wonder if any efforts have been made to retarget GCC to VLIW > > backend.Is there any project trying to do that? Is it included in the > > GCC mainstream? Thanks. Dr. Baumgartl, Jan Parthey and folks at the Chemnitz University of Technology had got substantial success with a port for the TMS320C6x series of VLIW processors. http://archiv.tu-chemnitz.de/pub/2004/0107/data/index.html We ( A few friends and I - college students then ) had added some improvements to this port as part of undergraduate university coursework ( project). cheers! Pranav
Re: About VLIW backend
> I am interesting in it. How about the current status, is it ongoing > developing? Is it included in GCC official release? Unfortunately our group is not actively working on that right now. Because of some reasons ( mainly the paucity of time) we couldnt release it to the GCC community then ( about 3 years back). - Pranav
Re: Reload using a live register to reload into
Hi Eric, Thanks for the response > Of course, it goes to great length to do so but there can be bugs. You didn't > specify which version of the compiler you're using though; they may have been > already fixed on the mainline. Oh, I am using quite a new version of the compiler - rev 129547, DATESTAMP 20071022. cheers! Pranav
Re: Target specific attributes to variables
> Even though the other 2 addressing modes are implemented, the > attributes could not be checked in the other 2 modes. These 2 modes > are "disp with register" and "register indirect" addressing modes. The > tree structure in these addressing modes could not be checked for > attributes using the RTX of the operand. We were unable to get any > information from other target specific attributes. Look up MEM_EXPR in the internals. You might want to use that for the register indirect case. cheers! Pranav
Re: About VLIW backend
> Did you test for large programs? Such as applications from SPEC 2006? or > the equal size of programs. Thanks. Oh no, we didnt. We stopped when we achieved fair stability purely on the basis of the number of testsuite failures ( less than 100). cheers! Pranav
Re: Reload using a live register to reload into
> OK. AFAICS there is nothing glaring in the RTL you posted so you'll have to > put a watchpoint and find out who has set reg_rtx for this particular reload. reg_rtx gets set due to a call to choose_reload_regs which in turn calls allocate_reload_reg to set reg_rtx. Also, just to confirm if I am on the right track, shouldnt the bit for reg #1 (i.e $c1) be set in live_throughout in the insn chain for insn #91 ( reproduced below for convenience ) ? (call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [ (set (reg:SI 1 $c1) (call (mem:SI (symbol_ref:SI ("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32]) (const_int 0 [0x0]))) (use (const_int 0 [0x0])) (clobber (reg:SI 31 $link)) ]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4) (expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ]) (expr_list:REG_DEAD (reg:SI 2 $c2 [ h ]) (nil (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4)) (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ])) (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ])) (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ])) (nil)) TIA, Pranav
Re: Reload using a live register to reload into
Hi, > > (call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [ > >(set (reg:SI 1 $c1) > >(call (mem:SI (symbol_ref:SI > > ("DotProductWithoutShift") [flags 0x41] > DotProductWithoutShift>) [0 S4 A32]) > >(const_int 0 [0x0]))) > >(use (const_int 0 [0x0])) > >(clobber (reg:SI 31 $link)) > >]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4) > >(expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ]) > >(expr_list:REG_DEAD (reg:SI 2 $c2 [ h ]) > >(nil > >(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4)) > >(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ])) > >(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ])) > >(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ])) > >(nil)) > > I don't think so, it should be in dead_or_set, the value contained in $c1 dies > in the insn. Yes, after going through the code more closely, I concur. The problem lies in that, $c1 isn't live_throughout, but at the point before the call insn it is live. Therefore If an instruction is inserted before the call insn as is done when a caller save instruction is inserted (by caller-save.c) and if this doesnt kill $c1 then its live_throughout should have the bit for $c1 set. This doesnt happen because while inserting the caller save insn, its live_throughout is simply set to the live_throughout of the call insn + the registers marked with REG_DEAD notes in the call insn. However since $c1 is an argument to the call it is used by the call_insn and is marked REG_DEP_TRUE ( Read after Write). Shouldnt regs in REG_DEP_TRUE be added to live_throughout. My suspicion is that the LOG_LINKS are not always up-to-date, therefore will it be better to use DF_INSN_UID_USES ? Thanks in advance, Pranav
Re: Reload using a live register to reload into
Hi, > > DF is supposed to be out of the game at this point, it has handed over the > control since global.c:build_insn_chain as far as liveness info is concerned. Oh I used DF and it worked for me. But I think that is because this is the first new instruction to be inserted and nothing really must have changed w.r.t the call_insn for the DF info to be no longer consistent. However, in the light of the above, I shouldnt be using DF. > The REG_DEP_TRUE are somewhat misleading, it's an artifact in the dump. > What you're seeing are the contents of CALL_INSN_FUNCTION_USAGE, which are > always correct, so a solution to your problem would be to scan it for uses of > registers in caller-save.c:insert_one_insn. Of course this wouldn't plug the > hole entirely but would very likely be sufficient in practice. Good idea, I'll try using CALL_INSN_FUNCTION_USAGE. cheers! Pranav
Re: How to describe function units allocation
> Hi, > For the backend TI DSP TMS320C6x, There are four types of functional > units which are .L unit, .M unit, .S unit and .D unit, and each type > consists of two units named .X1 and .X2 respectively. Namely, there are > total 8 units. Except the .M units surve only for multiply, other units > share many functions. For example, they both enable 32 bits arithmetical > operation. And in the assembly, which functional unit is used to perform > operation must be explicitly indicated. For example, ADD .S1 A0, A1, A2; > ADD .L1 A0, A1, A2; ADD .D1 A0, A1, A2 achieve the same goal by using > different units. Surely, when producing assembly, a functional unit > allocation somewhat like register allocation is needed. I wonder how can > I describe the relationship in the machine description file, and whether > I need write a functional unit allocation algorithm or it is done by a > general purpose allocation algorithm embedded in GCC, like register > allocation, I only need give some architecture descriptions? Thanks in > advance for your kind assistance. IMHO. the functional units that accompany the assembly instruction are optional. However, for c6x-gcc the reason cc1 doesnt allocate functional units is that the assembler ( as part of the c6x binutils ) does the functional unit allocation on its own. There are some notes about how the assembler does this in Extending the GNU Assembler for Texas Instruments TMS320C6x-DSP.pdf HTH, Pranav > > Regards, > Li Wang >
Re: VLIW scheduling and delayed branch
On Dec 9, 2007 2:19 AM, Thomas Sailer <[EMAIL PROTECTED]> wrote: > > Has anyone faced a similar problem before? Are there targets for which > > both VLIW and DBR are enabled? Perhaps ia64? > Ok, this was a long time back, but Yes I have faced a similar problem. We disabled delayed branch scheduling and used the machdep reorg pass. We examined the dependencies of the branch instructions moving backwards from the branch instruction and marking all the instructions ( and the containing insn bundle) that the branch depended upon. Then again, moving backwards from the branch insn, we picked the first insn bundle with all unmarked insns ( and cycle size of the bundle <= no of delay slots of a branch insn ) and put that bundle into the delay slot. This approach worked fine for the small testcases that we had, but we really didnt test this on any monstrous piece of software. We implemented this for the TMS320C6x VLIW DSP. HTH, Pranav