gcc4.6.0:combining operate+test

2011-04-12 Thread cirrus75

 Hi All,

 I have been looking at a case in x86 architecture where gcc could generate 
better code for:

if(a+=25)
 d=c;

 
 Insns for operation and test are:


(insn 5 2 6 2 (set (reg:SI 62 [ a ])
(mem/c/i:SI (symbol_ref:DI ("a")  ) [2 a+0 
S4 A32])) test_and.c:9 64 {*movsi_internal}
 (nil))

(insn 6 5 7 2 (parallel [
(set (reg:SI 60 [ a.1 ])
(plus:SI (reg:SI 62 [ a ])
(const_int 25 [0x19])))
(clobber (reg:CC 17 flags))
]) test_and.c:9 252 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 62 [ a ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:DI ("a")  
) [2 a+0 S4 A32])
(const_int 25 [0x19]))
(nil)

(insn 7 6 8 2 (set (mem/c/i:SI (symbol_ref:DI ("a")  ) [2 a+0 S4 A32])
(reg:SI 60 [ a.1 ])) test_and.c:9 64 {*movsi_internal}
 (nil))

(insn 8 7 9 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 60 [ a.1 ])
(const_int 0 [0]))) test_and.c:9 2 {*cmpsi_ccno_1}
 (nil))


  I noticed combine.c is not able to combine insns 6 and 8. This is because 
create_log_links function only creates (as far as I could understand) links 
between the reg setter and the first reg user, but not the other reg users. 
Thus, combine.c do try to combine 6 and 7, but without success.

  Why does not create_log_links create links between the reg setter and all the 
reg users ?

  I compiled it on powerpc and got the same results (3 instructions: operate, 
store, test), so this behavior affects not only x86 architectures. It seems 
something good to optimize.
  
best regards,
Alex Rocha Prado

  

  


RE:How to tell IRA to use misaligned DImode load?

2011-04-13 Thread cirrus75

 Hi,

 Do you mean you support unaligned access to any DImode regular type (int64_t) ?

regards,
Alex Prado


"H.J. Lu"  wrote:

Hi,

On my target, SCmode is 4 byte aligned.  But to load it into
a register, it must be 8byte aligned.  I can handle misaligned
load in backend.  But IRA generates misaligned load directly
when SCmode is accessed as DImode.  How can I tell IRA to use
misaligned load for DImode?

Thanks.





Hi,

On my target, SCmode is 4 byte aligned.  But to load it into
a register, it must be 8byte aligned.  I can handle misaligned
load in backend.  But IRA generates misaligned load directly
when SCmode is accessed as DImode.  How can I tell IRA to use
misaligned load for DImode?

Thanks.




adjacent bitfields optimization

2011-04-25 Thread cirrus75

 Hi All,

 For the fllowing code:

typedef struct {
 int f1:1;
 int f2:1;
 int f3:1;
 int f4:29;
} t1;

typedef struct {
 int f1:1;
 int f2:1;
 int f3:30;
} t2;

t1 s1;
t2 s2;

void func1(void)
{
 s1.f1 = s2.f1;
 s1.f2 = s2.f2;
}

we get (x86_64 target):

 movzbl  s2(%rip), %edx
 movzbl  s1(%rip), %eax
 movl    %edx, %ecx
 andl    $-4, %eax
 andl    $2, %edx
 andl    $1, %ecx
 orl %ecx, %eax
 orl %edx, %eax
 movb    %al, s1(%rip)
 ret


Could gcc optimize two or more operations in adjacent bitfields into one 
operation ?


regards,
Alex R. Prado





Re: adjacent bitfields optimization

2011-04-26 Thread cirrus75

  Hi,

  Actually, I would like to ask if all this should be tree level optimization 
or there would be something to do at backend. I am asking because I am trying 
to write a new backend.

thanks,
Alex R. Prado


Em 25/04/2011 14:47, Ian Lance Taylor < i...@google.com > escreveu:
cirrus75  writes:

>  For the fllowing code:
>
> typedef struct {
>  int f1:1;
>  int f2:1;
>  int f3:1;
>  int f4:29;
> } t1;
>
> typedef struct {
>  int f1:1;
>  int f2:1;
>  int f3:30;
> } t2;
>
> t1 s1;
> t2 s2;
>
> void func1(void)
> {
>  s1.f1 = s2.f1;
>  s1.f2 = s2.f2;
> }
>
> we get (x86_64 target):
>
>  movzbl  s2(%rip), %edx
>  movzbl  s1(%rip), %eax
>  movl    %edx, %ecx
>  andl    $-4, %eax
>  andl    $2, %edx
>  andl    $1, %ecx
>  orl %ecx, %eax
>  orl %edx, %eax
>  movb    %al, s1(%rip)
>  ret
>
>
> Could gcc optimize two or more operations in adjacent bitfields into one 
> operation ?

This question looks more appropriate for the mailing list
gcc-h...@gcc.gnu.org rather than the mailing list gcc@gcc.gnu.org.

I agree that this looks like suboptimal code.  Please consider filing a
missed-optimization bug report.  See http://gcc.gnu.org/bugs/ .  Thanks.

Ian



improving combine pass

2011-04-27 Thread cirrus75

  Hi All,

  I am trying to improve combine pass (for all backends). One approach is 
changing the order of some insns before combine pass starts. The first problem 
I have is about the REGNOTES, they need to be rebuilt after changing insn 
order. Does anyone know how to do that ?

  Does anyone know any other problem I could have by changing insn order ?

thank you very much,
Alex R. Prado


Re: improving combine pass

2011-04-27 Thread cirrus75

 Hello Ian,

  One example is:

insn X   : "REG_X = "
insn X+1 : "MEM(addr) = REG_X"
insn X+2 : "REGY:CCmode compare(REG_X, const_int 0)"

generated by C code (already posted by me some weeks ago):
--

int a, b, c, d;

int foo()
{
  a += b;

  if(a)
c = d;
}

  Insns X+2 and X can usually be combined because arithmetic operation
usually sets condition codes.

  After some hours of debug I noticed combine pass never tries to insert
insn X+2 into insn insn X just because destination insn must be after
the source insn.

  Also, combine pass only try to combine the reg setter and its first
user (as far as I understood), which is not the case (comparison is the
second REG_X user).

  I tried to make try_combine to accept an insn destination that is
before (insn list order) than insn source, but after fixing a SEG FAULT
on can_combine_p I noticed subst didn't the expected job on insn X+2 and
insn X.

  If insn X+2 is placed just after insn X, combine can insert insn X into
insn X+2 and generate just one insn.

  So, changing the insn order seems to be simpler than changing combine
pass too deeply.

  thanks for the hint on df_ functions.

Alex R. Prado


Em 27/04/2011 14:43, Ian Lance Taylor < i...@google.com > escreveu:
cirrus75  writes:

>   I am trying to improve combine pass (for all backends). One approach 
is changing the order of some insns before combine pass starts. The first 
problem I have is about the REGNOTES, they need to be rebuilt after
changing insn order. Does anyone know how to do that ?

It's not clear to me why changing insn order will help combine.  Can you
give us an example?

In current mainline, the regnotes are added at the start of the combine
pass by df_note_add_problem and df_analyze at the start of
rest_of_handle_combine (please do not ask why it works this way).  So if
you reshuffle the insns in a pass before combine, and handle DF
information appropriately, then you don't have to worry about the
regnotes at all.

Ian



Re: improving combine pass

2011-04-27 Thread cirrus75

 Hi Paul,

 On i386 (and X86_64) RTL for insn X is generated with a "(clobber reg:CC 
FLAGS_REG)" instead of indicating exactly what is written on flags regs. I 
don't know if this could be different (as you suggested).

 Maybe the idea is combine "operation insns" and "test insns" later, but 
combine is not able to do it on some cases.

 here it is the insns generated by the i386 backend just before combine pass:

(insn 7 6 8 2 (parallel [
(set (reg:SI 61 [ a.2 ])
(plus:SI (reg:SI 64 [ a ])
(reg:SI 63 [ b ])))
(clobber (reg:CC 17 flags))
]) ../i386_tests/test_and.c:7 252 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 64 [ a ])
(expr_list:REG_DEAD (reg:SI 63 [ b ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (plus:SI (mem/c/i:SI (symbol_ref:SI ("a")  
) [2 a+0 S4 A32])
(mem/c/i:SI (symbol_ref:SI ("b")  ) [2 b+0 S4 A32]))
(nil))

(insn 8 7 9 2 (set (mem/c/i:SI (symbol_ref:SI ("a")  ) [2 a+0 S4 A32])
(reg:SI 61 [ a.2 ])) ../i386_tests/test_and.c:7 64 {*movsi_internal}
 (nil))

(insn 9 8 10 2 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:SI 61 [ a.2 ])
(const_int 0 [0]))) ../i386_tests/test_and.c:9 2 {*cmpsi_ccno_1}
 (expr_list:REG_DEAD (reg:SI 61 [ a.2 ])
(nil)))




Em 27/04/2011 16:20, Paul Koning < paul_kon...@dell.com > escreveu:

On Apr 27, 2011, at 3:15 PM, cirrus75 wrote:

> 
> Hello Ian,
> 
>  One example is:
> 
> insn X   : "REG_X = "
> insn X+1 : "MEM(addr) = REG_X"
> insn X+2 : "REGY:CCmode compare(REG_X, const_int 0)"
> 
> generated by C code (already posted by me some weeks ago):
> --
> 
> int a, b, c, d;
> 
> int foo()
> {
>  a += b;
> 
>  if(a)
>c = d;
> }
> 
>  Insns X+2 and X can usually be combined because arithmetic operation
> usually sets condition codes.

I haven't gotten into this much yet, so at the risk of showing off confusion...

I thought that the CCmode stuff allows this to work right without new changes, 
given that the expressions that make up the RTL are written as (parallel ...)
which set both the output reg and the CCmode reg (based on the expression
value). So the rtl for the first insn would have that compare as part of its
parallel...construct, the second insn (presumably in your example) doesn't
affect the condition codes register, and the third insn should then be deleted
since it's redundant.

Doesn't it work like that?  Am I confused about the right way?

 paul





how to specify instruction size for optimization

2010-01-15 Thread cirrus75
  Hi,

I could not understand exactly how to specify instruction size to gcc (so it 
can really optimize the code size when -Os is used).

I would like to inform gcc that if some registers are used for certain 
operations, the instruction will be smaller. For example, an add which 
destination register is register 4 has lowest size if compared to all "add" 
forms.

What is the easiest way to give this information to gcc ? I took a long look at 
the internals documentation and other ports but I'm not sure.

thank you for the help,

Alex Prado