gcc4.6.0:combining operate+test
Hi All, I have been looking at a case in x86 architecture where gcc could generate better code for: if(a+=25) d=c; Insns for operation and test are: (insn 5 2 6 2 (set (reg:SI 62 [ a ]) (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32])) test_and.c:9 64 {*movsi_internal} (nil)) (insn 6 5 7 2 (parallel [ (set (reg:SI 60 [ a.1 ]) (plus:SI (reg:SI 62 [ a ]) (const_int 25 [0x19]))) (clobber (reg:CC 17 flags)) ]) test_and.c:9 252 {*addsi_1} (expr_list:REG_DEAD (reg:SI 62 [ a ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32]) (const_int 25 [0x19])) (nil) (insn 7 6 8 2 (set (mem/c/i:SI (symbol_ref:DI ("a") ) [2 a+0 S4 A32]) (reg:SI 60 [ a.1 ])) test_and.c:9 64 {*movsi_internal} (nil)) (insn 8 7 9 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 60 [ a.1 ]) (const_int 0 [0]))) test_and.c:9 2 {*cmpsi_ccno_1} (nil)) I noticed combine.c is not able to combine insns 6 and 8. This is because create_log_links function only creates (as far as I could understand) links between the reg setter and the first reg user, but not the other reg users. Thus, combine.c do try to combine 6 and 7, but without success. Why does not create_log_links create links between the reg setter and all the reg users ? I compiled it on powerpc and got the same results (3 instructions: operate, store, test), so this behavior affects not only x86 architectures. It seems something good to optimize. best regards, Alex Rocha Prado
RE:How to tell IRA to use misaligned DImode load?
Hi, Do you mean you support unaligned access to any DImode regular type (int64_t) ? regards, Alex Prado "H.J. Lu" wrote: Hi, On my target, SCmode is 4 byte aligned. But to load it into a register, it must be 8byte aligned. I can handle misaligned load in backend. But IRA generates misaligned load directly when SCmode is accessed as DImode. How can I tell IRA to use misaligned load for DImode? Thanks. Hi, On my target, SCmode is 4 byte aligned. But to load it into a register, it must be 8byte aligned. I can handle misaligned load in backend. But IRA generates misaligned load directly when SCmode is accessed as DImode. How can I tell IRA to use misaligned load for DImode? Thanks.
adjacent bitfields optimization
Hi All, For the fllowing code: typedef struct { int f1:1; int f2:1; int f3:1; int f4:29; } t1; typedef struct { int f1:1; int f2:1; int f3:30; } t2; t1 s1; t2 s2; void func1(void) { s1.f1 = s2.f1; s1.f2 = s2.f2; } we get (x86_64 target): movzbl s2(%rip), %edx movzbl s1(%rip), %eax movl %edx, %ecx andl $-4, %eax andl $2, %edx andl $1, %ecx orl %ecx, %eax orl %edx, %eax movb %al, s1(%rip) ret Could gcc optimize two or more operations in adjacent bitfields into one operation ? regards, Alex R. Prado
Re: adjacent bitfields optimization
Hi, Actually, I would like to ask if all this should be tree level optimization or there would be something to do at backend. I am asking because I am trying to write a new backend. thanks, Alex R. Prado Em 25/04/2011 14:47, Ian Lance Taylor < i...@google.com > escreveu: cirrus75 writes: > For the fllowing code: > > typedef struct { > int f1:1; > int f2:1; > int f3:1; > int f4:29; > } t1; > > typedef struct { > int f1:1; > int f2:1; > int f3:30; > } t2; > > t1 s1; > t2 s2; > > void func1(void) > { > s1.f1 = s2.f1; > s1.f2 = s2.f2; > } > > we get (x86_64 target): > > movzbl s2(%rip), %edx > movzbl s1(%rip), %eax > movl %edx, %ecx > andl $-4, %eax > andl $2, %edx > andl $1, %ecx > orl %ecx, %eax > orl %edx, %eax > movb %al, s1(%rip) > ret > > > Could gcc optimize two or more operations in adjacent bitfields into one > operation ? This question looks more appropriate for the mailing list gcc-h...@gcc.gnu.org rather than the mailing list gcc@gcc.gnu.org. I agree that this looks like suboptimal code. Please consider filing a missed-optimization bug report. See http://gcc.gnu.org/bugs/ . Thanks. Ian
improving combine pass
Hi All, I am trying to improve combine pass (for all backends). One approach is changing the order of some insns before combine pass starts. The first problem I have is about the REGNOTES, they need to be rebuilt after changing insn order. Does anyone know how to do that ? Does anyone know any other problem I could have by changing insn order ? thank you very much, Alex R. Prado
Re: improving combine pass
Hello Ian, One example is: insn X : "REG_X = " insn X+1 : "MEM(addr) = REG_X" insn X+2 : "REGY:CCmode compare(REG_X, const_int 0)" generated by C code (already posted by me some weeks ago): -- int a, b, c, d; int foo() { a += b; if(a) c = d; } Insns X+2 and X can usually be combined because arithmetic operation usually sets condition codes. After some hours of debug I noticed combine pass never tries to insert insn X+2 into insn insn X just because destination insn must be after the source insn. Also, combine pass only try to combine the reg setter and its first user (as far as I understood), which is not the case (comparison is the second REG_X user). I tried to make try_combine to accept an insn destination that is before (insn list order) than insn source, but after fixing a SEG FAULT on can_combine_p I noticed subst didn't the expected job on insn X+2 and insn X. If insn X+2 is placed just after insn X, combine can insert insn X into insn X+2 and generate just one insn. So, changing the insn order seems to be simpler than changing combine pass too deeply. thanks for the hint on df_ functions. Alex R. Prado Em 27/04/2011 14:43, Ian Lance Taylor < i...@google.com > escreveu: cirrus75 writes: > I am trying to improve combine pass (for all backends). One approach is changing the order of some insns before combine pass starts. The first problem I have is about the REGNOTES, they need to be rebuilt after changing insn order. Does anyone know how to do that ? It's not clear to me why changing insn order will help combine. Can you give us an example? In current mainline, the regnotes are added at the start of the combine pass by df_note_add_problem and df_analyze at the start of rest_of_handle_combine (please do not ask why it works this way). So if you reshuffle the insns in a pass before combine, and handle DF information appropriately, then you don't have to worry about the regnotes at all. Ian
Re: improving combine pass
Hi Paul, On i386 (and X86_64) RTL for insn X is generated with a "(clobber reg:CC FLAGS_REG)" instead of indicating exactly what is written on flags regs. I don't know if this could be different (as you suggested). Maybe the idea is combine "operation insns" and "test insns" later, but combine is not able to do it on some cases. here it is the insns generated by the i386 backend just before combine pass: (insn 7 6 8 2 (parallel [ (set (reg:SI 61 [ a.2 ]) (plus:SI (reg:SI 64 [ a ]) (reg:SI 63 [ b ]))) (clobber (reg:CC 17 flags)) ]) ../i386_tests/test_and.c:7 252 {*addsi_1} (expr_list:REG_DEAD (reg:SI 64 [ a ]) (expr_list:REG_DEAD (reg:SI 63 [ b ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:SI ("a") ) [2 a+0 S4 A32]) (mem/c/i:SI (symbol_ref:SI ("b") ) [2 b+0 S4 A32])) (nil)) (insn 8 7 9 2 (set (mem/c/i:SI (symbol_ref:SI ("a") ) [2 a+0 S4 A32]) (reg:SI 61 [ a.2 ])) ../i386_tests/test_and.c:7 64 {*movsi_internal} (nil)) (insn 9 8 10 2 (set (reg:CCZ 17 flags) (compare:CCZ (reg:SI 61 [ a.2 ]) (const_int 0 [0]))) ../i386_tests/test_and.c:9 2 {*cmpsi_ccno_1} (expr_list:REG_DEAD (reg:SI 61 [ a.2 ]) (nil))) Em 27/04/2011 16:20, Paul Koning < paul_kon...@dell.com > escreveu: On Apr 27, 2011, at 3:15 PM, cirrus75 wrote: > > Hello Ian, > > One example is: > > insn X : "REG_X = " > insn X+1 : "MEM(addr) = REG_X" > insn X+2 : "REGY:CCmode compare(REG_X, const_int 0)" > > generated by C code (already posted by me some weeks ago): > -- > > int a, b, c, d; > > int foo() > { > a += b; > > if(a) >c = d; > } > > Insns X+2 and X can usually be combined because arithmetic operation > usually sets condition codes. I haven't gotten into this much yet, so at the risk of showing off confusion... I thought that the CCmode stuff allows this to work right without new changes, given that the expressions that make up the RTL are written as (parallel ...) which set both the output reg and the CCmode reg (based on the expression value). So the rtl for the first insn would have that compare as part of its parallel...construct, the second insn (presumably in your example) doesn't affect the condition codes register, and the third insn should then be deleted since it's redundant. Doesn't it work like that? Am I confused about the right way? paul
how to specify instruction size for optimization
Hi, I could not understand exactly how to specify instruction size to gcc (so it can really optimize the code size when -Os is used). I would like to inform gcc that if some registers are used for certain operations, the instruction will be smaller. For example, an add which destination register is register 4 has lowest size if compared to all "add" forms. What is the easiest way to give this information to gcc ? I took a long look at the internals documentation and other ports but I'm not sure. thank you for the help, Alex Prado