from:"pranav bhandarkar"

CSE not combining equivalent expressions.

2007-01-15 Thread pranav bhandarkar


Hello Everyone,
I have the following source code

static int i;
static char a;

char foo_gen(int);
void foo_assert(char);
void foo ()
{
  int *x = &i;
  a = foo_gen(0);
  a |= 1; /*  1-*/
  if (*x) goto end:
  a | =1; /* -2--*/
  foo_assert(a);
end:
  return;
}

Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2.
However the RTL code before the first CSE passthe RTL snippet is as follows

(insn 11 9 12 0 (set (reg:SI 1 $c1)
  (const_int 0 [0x0])) 43 {*movsi} (nil)
  (nil))

(call_insn 12 11 13 0 (parallel [
  (set (reg:SI 1 $c1)
  (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41]
) [0 S4 A32])
  (const_int 0 [0x0])))
  (use (const_int 0 [0x0]))
  (clobber (reg:SI 31 $link))
  ]) 39 {*call_value_direct} (nil)
  (nil)
  (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
  (nil)))

(insn 13 12 14 0 (set (reg:SI 137)
  (reg:SI 1 $c1)) 43 {*movsi} (nil)
  (nil))

(insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ])
  (reg:SI 137)) 43 {*movsi} (nil)
  (nil))

(insn 16 14 17 0 (set (reg:SI 138)
  (ior:SI (reg:SI 135 [ D.1217 ])
  (const_int 1 [0x1]))) 63 {iorsi3} (nil)
  (nil))

(insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ])
  (zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} (nil)
  (nil))

(insn 18 17 19 0 (set (reg/f:SI 139)
  (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
  (nil))

(insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8])
  (subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil)
  (nil))

< expansion of the if condition 

;; End of basic block 0, registers live:
(nil)

;; Start of basic block 1, registers live: (nil)
(note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 27 25 28 1 (set (reg:SI 142)
  (ior:SI (reg:SI 134 [ D.1219 ])
  (const_int 1 [0x1]))) 63 {iorsi3} (nil)
  (nil))

(insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ])
  (zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} (nil)
  (nil))

(insn 29 28 30 1 (set (reg/f:SI 143)
  (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
  (nil))

(insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8])
  (subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil)
  (nil))



Now the problem is that the CSE  pass doesnt identify that the source
of the set in insn 27 is equivalent to the source of the set in insn
16. This It seems happens because of the zero_extend in insn 17. I am
using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend
gets generated and the result of the ior operation is immediately
copied into memory. I am compiling this case with -O3. Can  anybody
please tell me how this problem can be overcome.

TIA,
Pranav

Re: CSE not combining equivalent expressions.

2007-01-15 Thread pranav bhandarkar


On 1/15/07, Richard Guenther <[EMAIL PROTECTED]> wrote:

On 1/15/07, pranav bhandarkar <[EMAIL PROTECTED]> wrote:
> Hello Everyone,
> I have the following source code
>
> static int i;
> static char a;
>
> char foo_gen(int);
> void foo_assert(char);
> void foo ()
> {
>int *x = &i;
>a = foo_gen(0);
>a |= 1; /*  1-*/
>if (*x) goto end:
>a | =1; /* -2--*/
>foo_assert(a);
> end:
>return;
> }
>
> Now I expect the CSE pass to realise that 1 and 2 are equal and eliminate 2.
> However the RTL code before the first CSE passthe RTL snippet is as 
follows
>
> (insn 11 9 12 0 (set (reg:SI 1 $c1)
>(const_int 0 [0x0])) 43 {*movsi} (nil)
>(nil))
>
> (call_insn 12 11 13 0 (parallel [
>(set (reg:SI 1 $c1)
>(call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41]
> ) [0 S4 A32])
>(const_int 0 [0x0])))
>(use (const_int 0 [0x0]))
>(clobber (reg:SI 31 $link))
>]) 39 {*call_value_direct} (nil)
>(nil)
>(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
>(nil)))
>
> (insn 13 12 14 0 (set (reg:SI 137)
>(reg:SI 1 $c1)) 43 {*movsi} (nil)
>(nil))
>
> (insn 14 13 16 0 (set (reg:SI 135 [ D.1217 ])
>(reg:SI 137)) 43 {*movsi} (nil)
>(nil))
>
> (insn 16 14 17 0 (set (reg:SI 138)
>(ior:SI (reg:SI 135 [ D.1217 ])
>(const_int 1 [0x1]))) 63 {iorsi3} (nil)
>(nil))
>
> (insn 17 16 18 0 (set (reg:SI 134 [ D.1219 ])
>(zero_extend:SI (subreg:QI (reg:SI 138) 0))) 84 {zero_extendqisi2} 
(nil)
>(nil))
>
> (insn 18 17 19 0 (set (reg/f:SI 139)
>(symbol_ref:SI ("a") [flags 0x2] )) 43
> {*movsi} (nil)
>(nil))
>
> (insn 19 18 21 0 (set (mem/c/i:QI (reg/f:SI 139) [0 a+0 S1 A8])
>(subreg/s/u:QI (reg:SI 134 [ D.1219 ]) 0)) 56 {*movqi} (nil)
>(nil))
>
> <<<<< expansion of the if condition >>>>>>>>
>
> ;; End of basic block 0, registers live:
> (nil)
>
> ;; Start of basic block 1, registers live: (nil)
> (note 25 23 27 1 [bb 1] NOTE_INSN_BASIC_BLOCK)
>
> (insn 27 25 28 1 (set (reg:SI 142)
>(ior:SI (reg:SI 134 [ D.1219 ])
>(const_int 1 [0x1]))) 63 {iorsi3} (nil)
>(nil))
>
> (insn 28 27 29 1 (set (reg:SI 133 [ temp.28 ])
>(zero_extend:SI (subreg:QI (reg:SI 142) 0))) 84 {zero_extendqisi2} 
(nil)
>(nil))
>
> (insn 29 28 30 1 (set (reg/f:SI 143)
>(symbol_ref:SI ("a") [flags 0x2] )) 43
> {*movsi} (nil)
>(nil))
>
> (insn 30 29 32 1 (set (mem/c/i:QI (reg/f:SI 143) [0 a+0 S1 A8])
>(subreg/s/u:QI (reg:SI 133 [ temp.28 ]) 0)) 56 {*movqi} (nil)
>(nil))
>
>
>
> Now the problem is that the CSE  pass doesnt identify that the source
> of the set in insn 27 is equivalent to the source of the set in insn
> 16. This It seems happens because of the zero_extend in insn 17. I am
> using a 4.1 toolchain. However with a 3.4.6 toolchain no zero_extend
> gets generated and the result of the ior operation is immediately
> copied into memory. I am compiling this case with -O3. Can  anybody
> please tell me how this problem can be overcome.

CSE/FRE or VRP do not track bit operations and CSE of bits.  To overcome this
you need to implement such.



Thanks. Another question I have is that, in this case, will the following

http://gcc.gnu.org/wiki/Sign_Extension_Removal

help in removal of the sign / zero extension ?

Thanks in Advance,
Pranav

Re: relevant files for target backends

2007-01-16 Thread pranav bhandarkar


I am wondering where to define the prototypes for functions in
.c Shall the prototypes be defined in -protos.h or in
.h or in .c. As far as I understand the prototypes
should be defined in -protos.h, right? But if I do so several
errors/warnings arise because of undeclared prototypes.

Another question is where target macros should be defined. As far as I
can see .c has something like such a structure:

---snip---
#define 
#define 
#define 
#define 

struct gcc_target targetm = TARGET_INITIALIZER;

.h is used to define macros that give such information as the
register classes, whether little endian or not, sizes of integral
types etc.
The file .c, like you rightly said defines the targetm
structure that holds pointers to target related functions and data.
Such functions are defined in the .c file. Such target hooks are
#defined in the .c file.
HTH,
Pranav

Re: relevant files for target backends

2007-01-16 Thread pranav bhandarkar


On 1/16/07, Markus Franke <[EMAIL PROTECTED]> wrote:

Thank you for your response. I understood everything you said but I am
still confused about the file -protos.h. Which prototypes have
to be defined there?

Thanks in advance,
Markus Franke


IMO and from what I have gathered from the documentation, the
prototypes of all the functions in the source file i.e. the
.c file should go into -protos.h
HTH,
Pranav

Re: CSE not combining equivalent expressions.

2007-01-17 Thread pranav bhandarkar


Also this is removed for the case of integers by the CSE pass
IIRC . The problem arises only for the type being a char or a short.


Yes, That is true. With gcc 4.1 one of the 'or's gets eliminated for
'int'. I am putting below two sets of logs. The first just before
cse_main and the second just after cse_main has returned but the
trivially dead insns have not been deleted yet.

Set 1: Before cse_main

(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (set (reg:SI 1 $c1)
   (const_int 0 [0x0])) 43 {*movsi} (nil)
   (nil))

(call_insn 12 11 13 0 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41]
) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 39 {*call_value_direct} (nil)
   (nil)
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
   (nil)))

(insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ])
   (reg:SI 1 $c1)) 43 {*movsi} (nil)
   (nil))

(insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ])
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 16 15 17 0 (set (reg/f:SI 136)
   (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
   (nil))

(insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

<<>>>

(note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 25 23 26 1 (set (reg/f:SI 139)
   (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
   (nil))

(insn 26 25 27 1 (set (reg:SI 140)
   (ior:SI (reg:SI 133 [ D.1216 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 139) [2 a+0 S4 A32])
   (reg:SI 140)) 43 {*movsi} (nil)
   (nil))

(code_label 29 27 30 2 2 ("end") [1 uses])
.. to function end


Set 2: After cse_main
(note 9 6 11 0 [bb 0] NOTE_INSN_BASIC_BLOCK)

(insn 11 9 12 0 (set (reg:SI 1 $c1)
   (const_int 0 [0x0])) 43 {*movsi} (nil)
   (nil))

(call_insn 12 11 13 0 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI ("gen_T") [flags 0x41]
) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 39 {*call_value_direct} (nil)
   (nil)
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
   (nil)))

(insn 13 12 15 0 (set (reg:SI 134 [ D.1214 ])
   (reg:SI 1 $c1)) 43 {*movsi} (nil)
   (nil))

(insn 15 13 16 0 (set (reg:SI 133 [ D.1216 ])
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 16 15 17 0 (set (reg/f:SI 136)
   (symbol_ref:SI ("a") [flags 0x2] )) 43
{*movsi} (nil)
   (nil))

(insn 17 16 19 0 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

 Expansion of the If condition >>

(note 23 21 25 1 [bb 1] NOTE_INSN_BASIC_BLOCK)

(insn 25 23 26 1 (set (reg/f:SI 139)
   (reg/f:SI 136)) 43 {*movsi} (nil)
   (expr_list:REG_EQUAL (symbol_ref:SI ("a") [flags 0x2] )
   (nil)))

(insn 26 25 27 1 (set (reg:SI 140)
   (ior:SI (reg:SI 134 [ D.1214 ])
   (const_int 1 [0x1]))) 64 {iorsi3} (nil)
   (nil))

(insn 27 26 29 1 (set (mem/c/i:SI (reg/f:SI 136) [2 a+0 S4 A32])
   (reg:SI 133 [ D.1216 ])) 43 {*movsi} (nil)
   (nil))

(code_label 29 27 30 2 2 ("end") [1 uses])
 to function end

Therefore as I see it, cse_main has followed the following steps
1) found that the source of the set in insn 26 is equivalent to the
source of the set in insn 15 and replaced the source in insn 26 with
that from insn 15
2) Found the lhs of insn 15 to be equal to that of insn 26 and stored
that instead in insn 27 thus making the result of insn 26 ( reg 140 )
unused ever again ( and insn 26 subsequently gets deleted) .

However for the case of char or short, zero / sign extends are
generated after the 'ior' operations and as a result the source of the
second 'ior' is then not equal to the source of the first 'ior'.

Re: CSE not combining equivalent expressions.

2007-01-17 Thread pranav bhandarkar

On 1/17/07, Mircea Namolaru <[EMAIL PROTECTED]> wrote:

> Thanks. Another question I have is that, in this case, will the
following
>
> http://gcc.gnu.org/wiki/Sign_Extension_Removal
>
> help in removal of the sign / zero extension ?

First, it seems to me that in your case:

(1) a = a | 1 /* a |= 1 */
(2) a = a | 1 /* a |= 1 */

the expressions "a | 1" in (1) and (2) are different as the "a"
is not the same. So there is nothing to do for CSE.

If the architecture has an instruction that does both the
store and the zero extension, the zero extension instructions
become redundant.

The sign extension algorithm is supposed to catch such cases, but
I suspect that in this simple case the regular combine is enough.

Mircea

Thanks for the info. I went through the documentation provided by you
in see.c, which I must add is very comprehensive indeed, and realised
that we need an instruction that does a zero extend before a store so
that that the extension instructions become redundant and can be
removed.
Thank you,
Pranav

Re: CSE not combining equivalent expressions.

2007-01-18 Thread Pranav Bhandarkar

On 1/18/07, Richard Kenner <[EMAIL PROTECTED]> wrote:

> I'm not immediately aware of too many cases where lowering the IL is
> going to expose new opportunities to track and optimize nonzero/zero
> bits stuff.

Bitfield are the big one.  If you have both bitfield and logical operations,
you can often merge the logical operations with those used to retrieve
and/or store the field.

Things such as

x.y |= 1;

where Y is a bitfield and X non-BLKmode would be a large sequence of logical
operations that can all be replaced by a single OR insn at the RTL level
but presents no optimization opportunities at the tree level.

I have found another case where a zero / sign extend is inhibiting optimization

extern char b;
extern char c;
int main()
{

 b <<= 1;
 b <<= 1;
 b <<= 1;
 c = b;

 return 0;
}

Here again a zero extend gets generated after every  'ashift' whereas
we are interested only in the lower order 8 bits. However when i try
the same thing with int instead of char i.e. there is no need for
extension then the operations get converted into b<<=3 instead of 3
instructions.
regards,
Pranav

Use of INSN_CODE

2007-02-01 Thread Pranav Bhandarkar


Hi All,
I am using recog_memoized in the machine dependent reorg pass.
However, It is causing an ICE because unwittingly a CODE_LABEL is
getting passed to it.

I understand that CODE_LABEL is in the RTX_EXTRA class and intuitively
It is wrong to use INSN_CODE ( which is used in recog_memoized) on
CODE_LABEL simply because it is not int the RTX_INSN class.

However, the internals  only warn against using INSN_CODE on use,
clobber, asm_input, addr_vec, addr_diff_vec. There is no mention of
other members of the other members of RTX_EXTRA. or shouldnt
recog_memoized have an INSN_P check in it ?
Am I missing something here ?
TIA,
Pranav

A combiner type optimization is causing a failure

2007-02-22 Thread Pranav Bhandarkar


Hello all,
I added a small optimization which does the following . It converts
a = a + 1
if ( a > 0 )

to
if ( a > -1)

a is a signed int.
However this is causing 920612-1.c to fail, which is reproduced below
for convenience.

f(j)int j;{return++j>0;}
main(){ if(f((~0U)>>1)) abort(); exit(0); }

The problem is that this testcase passes the number 2147483647 (int is
4 bytes for my architecture) to which if 1 is added an overflow will
occur. Since I remove the increment operation in 'f' through the
optimization 2147483647 never gets incremented and 'f' always returns
1 and the testcase fails.

My question is that, IMO the test is checking overflow behaviour. Is
it right to have such a test ?
Regards,
Pranav

Re: A combiner type optimization is causing a failure

2007-02-22 Thread Pranav Bhandarkar


On 2/22/07, Paolo Bonzini <[EMAIL PROTECTED]> wrote:


> My question is that, IMO the test is checking overflow behaviour. Is
> it right to have such a test ?

Would you care to prepare a patch that moved it under gcc.dg, adding a {
dg-options "-O2 -fno-strict-overflow" } marker (or maybe "-O2
-fno-wrapv")?  But your optimization should also be conditional on
whether strict overflow behavior is requested.

Paolo


Thanks, I will prepare a patch. A 'strict overflow behaviour' check
also makes sense.
Thank you,
Pranav

Re: Question about Temporary Outputs

2011-03-31 Thread Pranav Bhandarkar

On Thu, Mar 31, 2011 at 4:04 PM, Iyer, Balaji V  wrote:
> Hello Everyone,
>               I see in GCC that when we use the flag "-f-tree-optimized" it 
> will dump the contents of the input file after doing all the tree-based 
> optimization. Is it possible for me to modify this file and then submit it 
> back into gcc for processing to create an executable/assembly dump?

It has been a while since I have worked on GCC - back then (a couple
of years ago) this was not possible. I do not have reason to believe
this would have changed.

Pranav

COMPONENT_REF problem ?

2009-10-03 Thread Pranav Bhandarkar

Hi,

Is it possible for a component_ref node to have its arg 0 to be NULL ?
I would think not because from tree.def I gather that arg 0 tells me
what structures field this component_ref refers to. For convenience, I
have pasted here what tree.def tells me about a component_ref


/* Value is structure or union component.
   Operand 0 is the structure or union (an expression).
   Operand 1 is the field (a node of type FIELD_DECL).
   Operand 2, if present, is the value of DECL_FIELD_OFFSET, measured
   in units of DECL_OFFSET_ALIGN / BITS_PER_UNIT.  */
DEFTREECODE (COMPONENT_REF, "component_ref", tcc_reference, 3)


I am working on a target hook wherein I use the MEM_EXPR of a mem rtx.
It returns the following component_ref node


unit size 
align 32 symtab 0 alias set -1 canonical type 0x2b359976d6c0
precision 32 min  max \

pointer_to_this >

arg 1 
used nonlocal decl_3 SI file ../src/synth_core/QDSP6/svrreg.h
line 134 col 11 size  unit siz\
e 
align 32 offset_align 64
offset  bit offset
 context 
chain 
used nonlocal decl_3 SI file
../src/synth_core/QDSP6/svrreg.h line 135 col 11 size  unit\
 size 
align 32 offset_align 64
offset 
bit offset  context
 chain >>>


Following this if I do
exp = TREE_OPERAND ( comp_ref_node, 0);

I get "exp" as NULL ?

Is this possible ? or Is there something I am doing wrong ? or There
is something fishy here with the tree node that MEM_EXPR is giving me
?

Thanks,
Pranav

Re: COMPONENT_REF problem ?

2009-10-05 Thread Pranav Bhandarkar

Richard,

> If you are not working on trunk this can happen because the way
> MEM_EXPRs are "canonicalized".

Thanks. Yes, I am not on trunk and may not be able to move right away.
I would appreciate some pointers about where I should look, If I want
to fix this ?

Thanks,
Pranav

Re: COMPONENT_REF problem ?

2009-10-06 Thread Pranav Bhandarkar

> Look at
>
> 2009-07-14  Richard Guenther  
>            Andrey Belevantsev 
>
>        * tree-ssa-alias.h (refs_may_alias_p_1): Declare.
>        (pt_solution_set): Likewise.
>        * tree-ssa-alias.c (refs_may_alias_p_1): Export.
>        * tree-ssa-structalias.c (pt_solution_set): New function.
>        * final.c (rest_of_clean_state): Free SSA data structures.
> ...
>        * emit-rtl.c (component_ref_for_mem_expr): Remove.
>        (mem_expr_equal_p): Use operand_equal_p.
>        (set_mem_attributes_minus_bitpos): Do not use
>        component_ref_for_mem_expr.
> ...
>
> this change.
>
> Richard.

Great. Thanks.

Pranav

Dead Store Elimination

2009-10-21 Thread Pranav Bhandarkar

Hi,

A possible silly question about the dead store elimination pass. From
the documentation it is clear that the store S1 below is removed by
this pass (in dse.c)

*(addr) = value1;  // S1
.
.
*(addr) = value2  // S2 .. No read of "addr" between S1 and S2.
..
 = *(addr)   // Load
...
end_of_the_function

However, consider a different example.

*(addr) = value1;  // S1
..
.
end_of_the_function.

i.e. there is no store Sn that follows S1 along any path from S1 to
the end of the function and there is no read of addr following S1
either. Is the dse pass expected to remove such stores ? (I am
inclined to think that it should, but I am seeing a case where dse
doesnt remove such stores) . Further is the behaviour expected to be
different if the "addr" is based on  "fp" ?

TIA,
Pranav

Re: Dead Store Elimination

2009-10-22 Thread Pranav Bhandarkar

> Are you talking about the tree dead-store elimination pass or
> the RTL one?  Basically *addr = value1; cannot be removed
> if addr does not point to local memory or if the pointed-to
> memory escapes through a call-site that is dominated by this store.

I am talking about the RTL dead-store elimination. In my case addr is
based on the stack pointer and the store is to a local variable on the
stack.

Thanks,
Pranav

Re: implementing load 8 byte instruction

2010-03-19 Thread Pranav Bhandarkar

On Thu, Mar 18, 2010 at 10:29 AM, roy rosen  wrote:
> Hi,
>
> I am trying to implement a simple load 8 bytes instruction.
> I tried to use movdi so that it would allocate two sequential
> registers for the load.
> It starts well but in pass subreg1 the insns are decomposed and all DI
> operands are replaced with SI.
>
> I understand that this is a desireable optimzation but then the load
> is done using two load 4 bytes instructions.
>
> Does anybody has any idea what should I do?


Could it be a problem with the constraints in your movdi define_insn ?

Pranav

Unable to access archives on gcc-patches

2006-01-24 Thread pranav bhandarkar

Hi,
I was having trouble with building gcc and found that the problem i
was having had been reported earlier and a patch to fix that had been
submitted in feb 2004.
However when i try to access the mailing list archives i am able to
reach the index page for feb 2004. the link to which is given as:

 http://gcc.gnu.org/ml/gcc-patches/2004-02/

However the patch that i am looking for is not accessible. the link to
the patch is

http://gcc.gnu.org/ml/gcc-patches/2004-02/msg00826.html

Just to check, i tried accessing other patches from the index link but
could not access most of them.
Can anybody please help me out.
Thanks in advance,
Pranav
--
"So far as I am able to judge, nothing has been left undone,
 either by man or nature, to make India the most extraordinary
country that the sun visits on his rounds. Nothing seems to
have been forgotten, nothing overlooked."
Mark Twain

The effects of closed loop SSA and Scalar Evolution Const Prop.

2008-03-11 Thread Pranav Bhandarkar

Hi,

I am writing about a problem I noticed with the code generated for
memcpy by arm-none-eabi-gcc.

Now, memcpy has three distinct loops - one that copies (4 *sizeof
(long) ) bytes per iteration, one that copies sizeof (long) bytes per
iteration and the last one that copies one byte per iteration. The
registers used for the src and destination pointers should, IMHO, be
the same across all the three loops. However, what I noticed is that
after the first loop the src and dest registers arent reused but the
address of the next byte to be copied is recalculated as (original src
address + (number of iterations of 1st loop * 16)) . Similarly for the
destination address too. See the assembly snippet below.

.L3:
   
bhi .L3 @,
sub r2, r4, #16 @ D.1312, len0,
mov r3, r2, lsr #4  @ D.1314, D.1312,
sub r1, r2, r3, asl #4  @ len, D.1312, D.1314,
add r3, r3, #1  @ tmp225, D.1314,
mov r3, r3, asl #4  @ D.1320, tmp225,
cmp r1, #3  @ len,
add r4, r5, r3  @ aligned_src.60, src0, D.1320

Problem with Fix for PR 35163 ?

2008-04-08 Thread Pranav Bhandarkar

Hi,

Consider the attached testcase.

Working on a private port (Infact I see this problem on
arm-none-eabi-gcc too). I see the following in test.c.003t.original

fail = (short int) usi <= ssi;

And then in test.c.025t.ssa
 usi.2_5 = (short int) usi_4;
 fail.3_6 = usi.2_5 <= ssi_2;

Now ccp1 does constant propagation and we are left with
usi.2_5 = -256;

This causes the test to fail.

Clearly the problem seems to be that since usi is unsigned short int a
short int cant represent all the possible values of usi

I reverted the following patch and the test passed.
 PR middle-end/35163
* fold-const.c (fold_widened_comparison): Use get_unwidened in
value-preserving mode.  Disallow final truncation.

Now with the patch reverted, test.c.003t.original has

 fail = (int) ssi >= (int) usi;

And this problem vanished.


Am I missing something here ?

Thanks,
Pranav
int fail;
short fs2(void)
{
return 126;
}

unsigned short ufs1(void)
{
return 65280;
}
int main ()
{

short  ssi;
unsigned short  usi;

ssi = fs2();
usi = ufs1();
fail = !(ssi < usi);

if (fail)
	printf ("Failed\n");
else
	printf ("Successful\n");

return 0;

}

Re: Problem with Fix for PR 35163 ?

2008-04-08 Thread Pranav Bhandarkar

>  Btw, I have a fix.

oh gr8. I just saw your post on the gcc-patches. Do you still want me
to add this to PR35163 for the record ?

Cheers!
Pranav

Common Subexpression Elimination Opportunity not being exploited

2008-05-02 Thread Pranav Bhandarkar

Hi,

I have a case where the code looks roughly like
foo = i1  i2;
if (test1) bar1 = i1  i2;
if (test2) bar2 = i1  i2;

This can get converted into
reg = i1  i2
foo = reg
if (test1) bar1 = reg
if (test2) bar2 = reg

GCC 4.3 does fine here except when the operator is "logical and" (see
attached. test.c uses logical and and test1.c uses plus)

It seems that shortcut evaluation of the "logical and" throws off
common subexpression elimination. i.e the code generated is like
foo = (i1 != 0 ) && (i2 != 0)
if (test1)
{
if (i1 != 0)
then lab2:
else lab 3:
lab2:
if (i2 != 0)
then lab 4
lab 3:
temp = 0;
lab 4 :
temp = 1
bar1 = temp
}

if (test2)
{

same as the body of the above if condition.
}

The entire body of the two if stmts can be pulled out but it isnt.

This kind of cse works when the operator is an arithmetic operator like +.

Am I missing something here ? Is there something I can do to improve
the case with the "logical and"
May be this is difficult on hardware that uses the CC register ? But I
am working on a private port that doesnt use the CC register.

Thanks,
Pranav
short ione;
short itwo;
short ithree;
short ifour;
short ifive;
int one_array[50];
short a[50];


void foo ()
{
  ione = gen_short(1);
  itwo = gen_short(1);
  ithree = gen_short(1);
  ifour = gen_short(1);
  ifive = gen_short(1);


  a[0] = ione && itwo && ithree && ifour && ifive;

  if (one_array[1])
a[1] = ione && itwo && ithree && ifour && ifive;
  if (one_array[2])
a[2] = ione && itwo && ithree && ifour && ifive;

  return;
}
short ione;
short itwo;
short ithree;
short ifour;
short ifive;
int one_array[50];
short a[50];


void foo ()
{
  ione = gen_short(1);
  itwo = gen_short(1);
  ithree = gen_short(1);
  ifour = gen_short(1);
  ifive = gen_short(1);


  a[0] = ione + itwo + ithree + ifour + ifive;

  if (one_array[1])
a[1] = ione + itwo + ithree + ifour + ifive;
  if (one_array[2])
a[2] = ione + itwo + ithree + ifour + ifive;

  return;
}

Re: Pipeline hazards and delay slots

2008-05-04 Thread Pranav Bhandarkar

Hi Mohammed,

>  But how can i handle instances like this? Should i be doing insertion
>  of nops in reorg pass?

FWIW, I had worked on a port for VLIW processor about three years back
and IIRC we had used the reorg pass for inserting the nops.  I think
if you look at the scheduler dumps  you will notice that the scheduler
would have, in all likelihood, accounted for the delay of 1 cycle
between the "lw" and the "add" instructions. Only that you will have
to put the "nop" yourself between these two instructions.

cheers!
Pranav

Re: How to use gcc source and try new optmization techniques

2008-08-20 Thread Pranav Bhandarkar

Hi,

I may not have correctly understood your questions but from what I
understand I think you mean to ask how you could easily plug in your
optimization pass into GCC so as to test your implementation of some
optimization.

Well, the way to do that would be to understand the pass structure and
decide where in the order of passes should your pass be inserted i.e
after which and before which other pass should your optimization pass
fit in. Look at passes.c to see how the order of passes is specified.

Once you have told the compiler when to execute your pass (primarily
through passes.c) and provided your optimization has been correctly
implemented in the context of GCC you should be good to go.

HTH,
Pranav

On Wed, Aug 20, 2008 at 7:45 PM, Rohan Sreeram <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I am a student in Utah State University researching on compilers optimization 
> techniques.
> I wanted to know how I could use gcc for experimenting with optimization.
>
> Here is what I intend to do:
>
> 1) Understand the control flow graphs being generated by GCC, which I could 
> build using the -fdump-tree-cfg option.
> 2) Write code that could convert CFG to a different graph.
> 3) Use this new graph for optimization.
> 4) Use the optimized graph to generate machine code (say Intel architecture 
> for now).
>
> Please let me know how I could leverage the GCC code for this purpose, and if 
> you any suggestions/comments for me you are most welcome ! :)
>
> Thanks,
> Rohan
>
>
>
>
>

Re: Extending RTL expansion and CG with a new operation

2007-06-26 Thread Pranav Bhandarkar


kerneltest.c:22: error: unrecognizable insn:
(jump_insn 26 25 29 3 (set (pc)
(create_body_after (cre (reg:DI 75)
(const_int 0 [0x0]))
(label_ref 13)
(pc))) -1 (nil)
(nil))
kerneltest.c:22: internal compiler error: in extract_insn, at recog.c:2096
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html> for instructions.


Find a pattern that is close to this one in your md file ( i.e. the
define_insn that you think should match this pattern, assuming such a
define_insn exists in your md file) and then try to check the reason
for recog crashing. IMO, the predicate may not be passing.

regards,
Pranav

Re: Execute test fails in gcc testsuite

2007-07-18 Thread Pranav Bhandarkar


On 7/18/07, Venkatesan Jeevanandam <[EMAIL PROTECTED]> wrote:



I am working on the testsuite for a new crosscompiler hosted on x86
Platform,

While performing execute test using gcc testsuite,

I am getting the error message in execute test
/tmp/2112-1.x0: /tmp/2112-1.x0: cannot execute binary file

I know, have to use cross compiler simulator, for executing
$Arch-sim /tmp/2112-1.x0

But I don't know where to mention this configuration.


Since you havent mentioned anything about your .exp file in
dejagnu/baseboards, I am assuming you havent already written one.
Therefore you will  need to write a  .exp file ( e.g arm-sim.exp) file
and put it in the dejagnu/baseboards directory. Then while doing make
check-gcc pass it using the RUNTESTLFAG command line argument.
Typically,
$> make check-gcc RUNTESTFLAGS="--target_board=-sim"

HTH,
Pranav

CSE removing a load that is necessary

2007-07-27 Thread Pranav Bhandarkar

Hi All,
I am working on a private port and am seeing the following problem.
For a function returning a double the value is stored by the function
in memory. cse removes one of the two loads (to retrieve this returned
value) after the function is called.

To elaborate, the following is the dump just before cse.

(insn 44 43 45 2 test.c:388 (set (reg:SI 1 $c1)
(reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(nil)))

(insn 45 44 46 2 test.c:388 (set (reg:SI 2 $c2)
(reg:SI 136 [ D.1517 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(nil)))

(call_insn 46 45 49 2 test.c:388 (parallel [
(call (mem:SI (symbol_ref:SI ("__floatunsidf") [flags
0x41]) [0 S4 A32])
(const_int 0 [0x0]))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
(nil

(insn 49 46 116 2 test.c:388 (clobber (reg:SI 179)) -1
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(nil)))

(insn 116 49 47 2 test.c:388 (clobber (reg:SI 180 [+4 ])) -1 (nil))

(insn 47 116 48 2 test.c:388 (set (reg:SI 179)
(mem/c/i:SI (reg/f:SI 112 *fp*) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(nil)))

(insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ])
(mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*)
(const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 2 [0x2])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))



cse modifies insn 48 as

(insn 48 47 50 2 test.c:388 (set (reg:SI 180 [+4 ])
(reg:SI 178 [+4 ])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 2 [0x2])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))
(nil
(nil

and also replaces every subsequent use of (reg:SI 180 [+4 ]) with
(reg:SI 178 [+4 ]) thus making the above load dead, which gets
subsequently removed. This way the result of the function call is
lost.

My take is that insn 48 should have a REG_RETVAL note  ( Infact it
does have this but the note is removed by lower_subreg) and cse should
be careful when  REG_RETVAL and REG_EQUAL appear in the same insn. Is
this the right way of going about it ?

Sorry for a rather verbose post.

Thanks in advance,
Pranav

Re: CSE removing a load that is necessary

2007-07-27 Thread Pranav Bhandarkar

> Where does reg 178 come from?  It does not appear in the other insns
> you listed.

I am sorry, my mistake. I meant to say that the dump was only a part
of the entire dump of the function. reg 178 is the result of a
previous call to __floatsidf and is defined by the following insn.

(insn 19 18 21 2 test.c:356 (set (reg:SI 178 [+4 ])
(mem/c/i:SI (plus:SI (reg/f:SI 112 *fp*)
(const_int 4 [0x4])) [7 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(expr_list:REG_EQUAL (float:DF (reg:SI 136 [ D.1517 ]))
(nil


> Am I reading your code correctly when it appears that the
> __floatunsidf function returns a value in memory rather than via a
> register?

For our architecture, we have had to follow this convention of
returning a double value by storing it in memory and loading it after
the function call. The ABI reserves only one SI mode register for
return values and hence the use of memory for returning a double.


> If lower-subreg split up the load from memory, then it was correct to
> remove the REG_RETVAL note.  There may be a bug here in that it should
> also remove the REG_EQUAL note in that case.  It may be that
> remove_retval_note needs to look for and remove a REG_EQUAL note.

Ok, so the approach should be to fix remove_retval_note to have it
remove REG_EQUAL note too rather than not call remote_retval_note at
all. I will submit a patch for comments.

Thanks,
Pranav

Re: CSE removing a load that is necessary

2007-07-30 Thread Pranav Bhandarkar

Hi,

>   Or perhaps this could be another manifestation of the "cse gets confused by
> reg_equal notes on subparts of dimode pseudos if no movdi pattern is defined
> in the backend" bug[*]?  Pranav, is there a movdi pattern in your backend?
> There needs to be one, gcc does get it wrong if you rely on it to break
> everything down to si-sized movs.

Yes, It looks like a similar problem, but there seems to be no
consensus on a correct solution to this problem. I couldnt find the
bug number but  this thread describes the exact same problem ( but
with REG_EQUIV notes).

http://gcc.gnu.org/ml/gcc/2001-02/msg01372.html


Thanks,
Pranav

ICE on valid code, cse related

2007-08-01 Thread Pranav Bhandarkar

Hi,
I am working on a private port and getting an ICE in valid code. This
mainly is because of the following ( which is a part of the entire
dump of RTL of the source file)

(insn 13 8 14 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 138)
(const_int 0 [0x0])) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 0 [0x0])
(nil)))

(insn 14 13 15 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 1 $c1)
(reg/f:SI 112 *fp*)) 44 {*movsi} (expr_list:REG_LIBCALL_ID
(const_int 1 [0x1])
(insn_list:REG_LIBCALL 17 (nil

(insn 15 14 16 2 /fc3/testcases/reduce/testcase-min.i:8 (set (reg:SI 2 $c2)
(reg:SI 138)) 44 {*movsi} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(nil)))

(call_insn 16 15 18 2 /fc3/testcases/reduce/testcase-min.i:8 (parallel [
(call (mem:SI (symbol_ref:SI ("__floatsisf") [flags 0x41])
[0 S4 A32])
(const_int 0 [0x0]))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 41 {*call_direct} (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(expr_list:REG_EH_REGION (const_int -1 [0x])
(nil)))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1))
(nil

(insn 18 16 17 2 /fc3/testcases/reduce/testcase-min.i:8 (clobber
(reg:SF 139)) -1 (expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(nil)))

(insn 17 18 19 2 /fc3/testcases/reduce/testcase-min.i:8 (set
(subreg:SI (reg:SF 139) 0)
(mem/c/i:SI (reg/f:SI 112 *fp*) [2 S4 A32])) 44 {*movsi}
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil


Note the REG_EQUAL note of insn 17. cse tries to replace reg:SI 138
with a constant and because of insn 13, the note becomes (float:SF
(const_int 0)) which in turn cse converts into

REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0])

and when CONST_DOUBLE_LOW is done on the above, the compiler crashes -

" internal compiler error: RTL check: expected code 'const_double' and
mode 'VOID', have code 'const_double' and mode 'SF' in plus_constant,
at explow.c:103"

i.e the compiler is crashing after converting a const_int to an SFmode value.

Could this possibly be a generic issue or a problem with my backend (
as in will I need to define movsf in my backend, which isnt defined at
present ) ?

Regret the rather verbose post.

Thanks in advance,
Pranav

Re: ICE on valid code, cse related

2007-08-01 Thread Pranav Bhandarkar

> Who is calling CONST_DOUBLE_LOW on this value?
plus_constant calls CONST_DOUBLE_LOW on this value.

simplify_binary_operation_1 calls plus_constant ( while trying to
simplify PLUS on (const_double:SF 0 [0x0] 0.0 [0x0.0p+0]) & (const_int
-2147483648 [0x8000]) ), which in turn calls CONST_DOUBLE_LOW.

Thanks,
Pranav

Re: ICE on valid code, cse related

2007-08-02 Thread Pranav Bhandarkar

> How can we have a PLUS on a CONST_DOUBLE and a CONST_INT?  That does
> not make sense, as there is no MODE argument that could make this work
> correctly.  From your description, MODE must be some integer mode, in
> which case it is wrong to be using a CONST_DOUBLE in SFmode.
>
> (I don't know where the bug is; I'm just trying to help pin it down.)
Here it is!!
The problem is that for  the following insn(insn 20 19 21 2
 (set (reg:SI 141)
(xor:SI (subreg:SI (reg:SF 139) 0)
(reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL
(const_double:SF 0 [0x0] -0.0 [-0x0.0p+0])
(nil)))

reg:SI 140 is known to have the constant value
(const_int -2147483648 [0x8000]))
 and (subreg:SI (reg:SF 139) 0) is known to have the value
(const_double:SF 0 [0x0] 0.0 [0x0.0p+0])   [ as described by my
previous post in the same thread ]

Now, simplify_binary_operation_1 in simplify-rtx.c tries to

 /* Canonicalize XOR of the most significant bit to PLUS.  */
(simplify-rtx.c:2203)

and this results in a PLUS on  CONST_INT and CONST_DOUBLE.

maybe there should be a better check before canonicalizing here ?

Thanks,
Pranav

Re: ICE on valid code, cse related

2007-08-02 Thread Pranav Bhandarkar

> reg:SI 140 is known to have the constant value
> (const_int -2147483648 [0x8000]))
Wasnt clear maybe in my previous post. reg:SI 140 is known to have
this const_int value from a previous copy into it - here

(insn 19 17 20 2 /fc3/scratchpad/testcase-min.i:8 (set (reg:SI 140)
(const_int -2147483648 [0x8000])) 44 {*movsi} (nil))


Thanks,
Pranav

Re: ICE on valid code, cse related

2007-08-03 Thread Pranav Bhandarkar

> (reg:SF 139) can hold the value (const_double:SF 0) but (subreg:SI
> (reg:SF 139)) should be the value (const_int 0).  Perhaps the problem
> is how we handle a REG_EQUAL note when the destination of the set is a
> SUBREG.

(subreg:SI (reg:SF 139), 0) shouldnt be able to hold the value
(float:SF (reg:SI 138) ? Am I right on this ? Because the expander
generates the following while storing the return value
(insn 17 18 19 testcase-min.i:8 (set (subreg:SI (reg:SF 139) 0)
(mem/c/i:SI (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil)

The answer to this question will help me decide where to fix the
problem, In the expander itself or while processing REG_EQUAL in the
cse pass.

Thanks,
Pranav

Re: ICE on valid code, cse related

2007-08-08 Thread Pranav Bhandarkar

Hi,

>   Pranav, although there is indeed a bug in the mid-end here, from your point
> of view the simple and effective workaround should be to implement a movdi
> pattern (and movsf and movdf if you don't have them yet: it's an absolute
> requirement to implement movMM for any modes you expect your machine to
> handle) in the backend.  This won't fix the underlying bug, but it'll stop it
> from affecting you, and you'll get better codegen all round into the bargain
> if you expand movdi early.

It worked!!! I implemented the movsf pattern ( and also movdf so that
the absence of a movdf also doesnt wont affect me in the future). Due
to the movsf pattern, the return value is now restored with

(insn 17 16 18 testcase-min.i:8 (set (reg:SF 139)
(mem/c/i:SF (reg/f:SI 129 virtual-stack-vars) [2 S4 A32])) -1
(expr_list:REG_LIBCALL_ID (const_int 1 [0x1])
(insn_list:REG_RETVAL 14 (expr_list:REG_EQUAL (float:SF (reg:SI 138))
(nil)

i.e. there is no subreg in the destination.
Later in cse when the above REG_EQUAL (float:SF (reg:SI 138)  note is
converted into  REG_EQUAL (const_double:SF 0 [0x0] 0.0 [0x0.0p+0] , It
doesnt replace
(subreg:SI (reg:SF 139) 0) in the insn
 (set (reg:SI 141)
   (xor:SI (subreg:SI (reg:SF 139) 0)
   (reg:SI 140))) 65 {xorsi3} (expr_list:REG_EQUAL
(const_double:SF 0 [0x0] -0.0 [-0x0.0p+0])
   (nil)))

and the compiler doesnt crash :)

Thanks Dave and Ian for your help!!

cheers!
Pranav

Re: How to add target specific dependency?

2007-08-23 Thread Pranav Bhandarkar

On 8/23/07, petruk_gile <[EMAIL PROTECTED]> wrote:
>
> Hi all ..
>
> I'm currently porting GCC into a new processor, and I have a problem in
> instruction scheduling ...
>
> The case is like this:
> In the machine description (*.md) file, sometimes I emit a single RTL
> instruction into multiple ASM instruction. The problem is, in some case I
> need to emit an operand that actually doesn't exist in its RTL
> representation. For example :
>
> "movpqi_insn x,y" instruction will be translated as ==> "sar x, *ar15"  and
> "lar y, *ar15"
>
>   Where "sar" means "Store register" and "lar" means "load register" ...
>
> Since GCC performs instruction scheduling in RTL form, it doesn't know that
> instruction "movpqi_insn" actually reads AR15  Hence, sometimes GCC
> moves an instruction that actually SHOULD NOT be moved, due to data
> dependence in AR15, and it causes incorrect scheduling  (This is my
> analysis, please tell me if you guys think I'm wrong)...
>
> So, I need to:
> (1) whether disable the scheduling for that particular dependency, or
> (2) Inform GCC that "movpqi_insn" has an additional dependency in AR15 
>
> The problem is, I still don't know how can i do those 2 things ... So if any
> of you have any advice, I'd be really grateful  :D
Ideally you could either split movpqi_insn (into a store and a load)
during expansion or split the insn appropriately just before
instruction scheduling (using a define_insn_and_split) .

cheers!
Pranav

Identifying a block copy

2007-10-09 Thread Pranav Bhandarkar

Hi,
consider the following code,

struct x { int a; int b; int c; int d; int e[120];};
struct x *a, *b;
void foo ( )
{
*a = *b;
}

Now for the stmt int the function foo a memcpy will be generated.
However, this can be tail call optimized. My aim is to identify such
opportunities in find_tail_calls in tree-tailcall.c. However, for the
stmt

*a.0_1 ={v} *b.1_2;

var_can_have_subvars return zero for  *a.0_1 . But  a.0_1 is pointer
to a structure and *a.0_1 is a structure and therefore can have
subvars.

The comment for var_can_have_subvars says
/* Return true if V is a tree that we can have subvars for.
   Normally, this is any aggregate type.  Also complex
   types which are not gimple registers can have subvars.  */

IMHO, var_can_have_subvars for the above case should return true, but
it doesnt because it fails the following test in var_can_have_subvars.
  /* Non decls or memory tags can never have subvars.  */
  if (!DECL_P (v) || MTAG_P (v))
return false;

Am I missing something here ?

TIA,
Pranav

Re: Identifying a block copy

2007-10-09 Thread Pranav Bhandarkar

On 10/9/07, Daniel Berlin <[EMAIL PROTECTED]> wrote:
> Yes
> we do not create subvars for non-named memory locations.  IE random
> pointer dereferences.
>
> This is mainly because it would require a lot of time and memory in
> the compiler.
>
> It was done because most optimizers rely solely on vdef/vuses, instead
> of further disambiguating the memory dependence chains.

ok, makes sense.

> In any case, it's not clear why you care about subvars at all here.
> If you want to identify block copies, simply look for assignments
> between AGGREGATE_TYPE_P trees.

Aha, this would be neater.

Thanks for pointing me in the right direction.

cheers!
Pranav

Problem with too many virtual operands ( tree-ssa-operands.c:484)

2007-10-16 Thread Pranav Bhandarkar

Hi,
In the attached testcase due to an ivopts modification, while
rewriting the uses the compiler crashes in tree-ssa-operands.c because
the number of virtual operands of the modified stmt is much greater
than the thresholds controlled by OP_SIZE_{1,2,3} in
tree-ssa-operands.c.

I went through
http://gcc.gnu.org/ml/gcc-patches/2006-12/msg01269.html

and it seemed to me that these values (OP_SIZE etc) have been quite
experimentally set. I am wondering If these values should be
increased.

I did increase these values and the attached testcase compiled fine. I
examined the resulting ivopts dump and the modifications seems valid
to me. I have attached two dumps ( before ivopts - i.e 105t.cunroll
and after ivopts - i.e 107t.ivopts ) . Note that post ivopts dump was
generated only after changing OP_SIZE_3 to 700 ( a randomly high value
).

For a quick check of the dumps, note that
 D.1281_6 = Hoopster_ptr_17->Magic;

gets changed to

 D.1281_6 = MEM[index: ivtmp.840_5];

I am wondering If increasing OP_SIZE_{1,2,3} is the way to go. Partly
not convinced because it means that the problem could hit again with
nastier code.

TIA,
Pranav


testcase-min.i
Description: Binary data


testcase-min.i.105t.cunroll
Description: Binary data


testcase-min.i.107t.ivopts
Description: Binary data

Reload using a live register to reload into

2007-11-06 Thread Pranav Bhandarkar

Hi,
Working on a private port I am seeing a problem with reload clobbering
a live register and thus causing havoc.

Consider the following snippet of the code dump.
(note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(note:HI 86 85 89 5 NOTE_INSN_DELETED)

(insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ])
(sign_extend:SI (subreg:HI (reg:SI 206 [ ivtmp.103 ]) 0))) 86
{extendhisi2} (nil))

(insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ])
(reg:SI 208 [ ivtmp.101 ])) 45 {*movsi} (nil))

(insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ])
(reg/v/f:SI 236 [ h ])) 45 {*movsi} (nil))

(insn:HI 270 88 91 5 cor_h.c:129 (set (reg:SI 4 $c4)
(const_int 0 [0x0])) 45 {*movsi} (nil))

(call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
(set (reg:SI 1 $c1)
(call (mem:SI (symbol_ref:SI
("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32])
(const_int 0 [0x0])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
(expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
(expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
(nil
(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
(nil))

(insn:HI 92 91 285 5 cor_h.c:129 (set (reg/v:SI 230 [ s ])
(reg:SI 1 $c1)) 45 {*movsi} (expr_list:REG_DEAD (reg:SI 1 $c1)
(nil)))

(jump_insn:HI 285 92 286 5 (set (pc)
(label_ref 118)) 8 {jump} (nil))
;; End of basic block 5 -> ( 10)

The register $c1 is used to pass the first argument to the function
DotProductWithoutShift
On encountering the call_insn ( insn no 91) global.c inserts a store
to save the register $c16 (which contains a variable 'tot'  and $c16
is a caller save register ).
Hence the following insn is inserted just before the call to
DotProductWithoutShift.

(insn 309 270 91 5 cor_h.c:129 (set (mem/c:SI (plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0])) [11 S4 A32])
(reg:SI 16 $c16)) 45 {*movsi} (nil))

However the index 176 is too large and
 (plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0])) needs to be reloaded.

$c1 gets chosen for this reload and now the dump snippet looks like

(note:HI 85 84 86 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(note:HI 86 85 89 5 NOTE_INSN_DELETED)

(insn:HI 89 86 87 5 cor_h.c:129 (set (reg:SI 3 $c3 [ ivtmp.103 ])
(sign_extend:SI (reg:HI 14 $c14 [orig:206 ivtmp.103 ] [206])))
86 {extendhisi2} (nil))

(insn:HI 87 89 88 5 cor_h.c:129 (set (reg:SI 1 $c1 [ ivtmp.101 ])
(reg:SI 8 $c8 [orig:208 ivtmp.101 ] [208])) 45 {*movsi} (nil))

(insn:HI 88 87 270 5 cor_h.c:129 (set (reg:SI 2 $c2 [ h ])
(reg/v/f:SI 22 $c22 [orig:236 h ] [236])) 45 {*movsi} (nil))

(insn:HI 270 88 329 5 cor_h.c:129 (set (reg:SI 4 $c4)
(const_int 0 [0x0])) 45 {*movsi} (nil))

(insn 329 270 330 5 cor_h.c:129 (set (reg:SI 1 $c1)
(const_int 176 [0xb0])) 45 {*movsi} (nil))

(insn 330 329 309 5 cor_h.c:129 (set (reg:SI 1 $c1)
(plus:SI (reg:SI 1 $c1)
(reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV
(plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0]))
(nil)))

(insn 309 330 91 5 cor_h.c:129 (set (mem/c:SI (reg:SI 1 $c1) [11 S4 A32])
(reg:SI 16 $c16)) 45 {*movsi} (nil))

(call_insn:HI 91 309 332 5 cor_h.c:129 (parallel [
(set (reg:SI 1 $c1)
(call (mem:SI (symbol_ref:SI
("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32])
(const_int 0 [0x0])))
(use (const_int 0 [0x0]))
(clobber (reg:SI 31 $link))
]) 42 {*call_value_direct} (nil)
(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
(nil))

(insn 332 91 333 5 (set (reg:SI 4 $c4)
(const_int 176 [0xb0])) 45 {*movsi} (nil))

(insn 333 332 310 5 (set (reg:SI 4 $c4)
(plus:SI (reg:SI 4 $c4)
(reg/f:SI 29 $sp))) 65 {*addsi3} (expr_list:REG_EQUIV
(plus:SI (reg/f:SI 29 $sp)
(const_int 176 [0xb0]))
(nil)))

(insn 310 333 285 5 (set (reg:SI 16 $c16)
(mem/c:SI (reg:SI 4 $c4) [11 S4 A32])) 45 {*movsi} (nil))

(jump_insn:HI 285 310 286 5 (set (pc)
(label_ref 118)) 8 {jump} (nil))
;; End of basic block 5 -> ( 10)

clearly the register $c1 which after insn 87 has the first argument of
the function DotProductWithoutShift is overwritten.

The file.c.176r.greg for insn 309 says

"Spilling for insn 309.
Using reg 6 for reload 0"

and indeed rld[0].regno is 6 and rld[0].in is

(plus:SI

Re: About VLIW backend

2007-11-06 Thread Pranav Bhandarkar

On 06 Nov 2007 21:50:09 -0800, Ian Lance Taylor <[EMAIL PROTECTED]> wrote:
> Li Wang <[EMAIL PROTECTED]> writes:
>
> > I wonder if any efforts have been made to retarget GCC to VLIW
> > backend.Is there any project trying to do that? Is it included in the
> > GCC mainstream? Thanks.

Dr. Baumgartl,  Jan Parthey and folks at the Chemnitz University of Technology
had got substantial success with a port for the TMS320C6x series of
VLIW processors.

http://archiv.tu-chemnitz.de/pub/2004/0107/data/index.html

We ( A few friends and I - college students then ) had added some
improvements to this port as part of
undergraduate university coursework ( project).

cheers!
Pranav

Re: About VLIW backend

2007-11-07 Thread Pranav Bhandarkar

>  I am interesting in it. How about the current status, is it ongoing
> developing? Is it included in GCC official release?

Unfortunately our group is not actively working on that right now.
Because of some reasons
( mainly the paucity of time) we couldnt release it to the GCC
community then ( about 3 years back).

- Pranav

Re: Reload using a live register to reload into

2007-11-07 Thread Pranav Bhandarkar

Hi Eric,
Thanks for the response

> Of course, it goes to great length to do so but there can be bugs.  You didn't
> specify which version of the compiler you're using though; they may have been
> already fixed on the mainline.

Oh, I am using quite a new version of the compiler - rev 129547,
DATESTAMP 20071022.

cheers!
Pranav

Re: Target specific attributes to variables

2007-11-07 Thread Pranav Bhandarkar

> Even though the other 2 addressing modes are implemented, the
> attributes could not be checked in the other 2 modes. These 2 modes
> are "disp with register" and "register indirect" addressing modes. The
> tree structure in these addressing modes could not be checked for
> attributes using the RTX of the operand. We were unable to get any
> information from other target specific attributes.

Look up MEM_EXPR in the internals. You might want to use that for the
register indirect
case.

cheers!
Pranav

Re: About VLIW backend

2007-11-07 Thread Pranav Bhandarkar

>  Did you test for large programs? Such as applications from SPEC 2006? or
> the equal size of programs. Thanks.

Oh no, we didnt. We stopped when we achieved fair stability purely on
the basis of
the number of  testsuite failures ( less than 100).

cheers!
Pranav

Re: Reload using a live register to reload into

2007-11-07 Thread Pranav Bhandarkar

> OK.  AFAICS there is nothing glaring in the RTL you posted so you'll have to
> put a watchpoint and find out who has set reg_rtx for this particular reload.

reg_rtx gets set due to a call to choose_reload_regs which in turn
calls allocate_reload_reg to set reg_rtx.

Also, just to confirm if I am on the right track, shouldnt the bit for
 reg #1 (i.e $c1)  be set in live_throughout in the insn chain for
insn  #91 ( reproduced below for convenience ) ?

(call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
   (set (reg:SI 1 $c1)
   (call (mem:SI (symbol_ref:SI
("DotProductWithoutShift") [flags 0x41] ) [0 S4 A32])
   (const_int 0 [0x0])))
   (use (const_int 0 [0x0]))
   (clobber (reg:SI 31 $link))
   ]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
   (expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
   (expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
   (nil
   (expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
   (expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
   (expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
   (expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
   (nil))


TIA,
Pranav

Re: Reload using a live register to reload into

2007-11-08 Thread Pranav Bhandarkar

Hi,

> > (call_insn:HI 91 270 92 5 cor_h.c:129 (parallel [
> >(set (reg:SI 1 $c1)
> >(call (mem:SI (symbol_ref:SI
> > ("DotProductWithoutShift") [flags 0x41]  > DotProductWithoutShift>) [0 S4 A32])
> >(const_int 0 [0x0])))
> >(use (const_int 0 [0x0]))
> >(clobber (reg:SI 31 $link))
> >]) 42 {*call_value_direct} (expr_list:REG_DEAD (reg:SI 4 $c4)
> >(expr_list:REG_DEAD (reg:SI 3 $c3 [ ivtmp.103 ])
> >(expr_list:REG_DEAD (reg:SI 2 $c2 [ h ])
> >(nil
> >(expr_list:REG_DEP_TRUE (use (reg:SI 4 $c4))
> >(expr_list:REG_DEP_TRUE (use (reg:SI 3 $c3 [ ivtmp.103 ]))
> >(expr_list:REG_DEP_TRUE (use (reg:SI 2 $c2 [ h ]))
> >(expr_list:REG_DEP_TRUE (use (reg:SI 1 $c1 [ ivtmp.101 ]))
> >(nil))
>
> I don't think so, it should be in dead_or_set, the value contained in $c1 dies
> in the insn.

Yes, after going through the code more closely, I concur.
The problem lies in that, $c1 isn't live_throughout, but at the point
before the call insn
it is live. Therefore If an instruction is inserted before the call
insn as is done when a caller
save instruction is inserted (by caller-save.c) and if this doesnt
kill $c1 then its live_throughout
should have the bit for $c1 set. This doesnt happen because while
inserting the caller save insn, its
live_throughout is simply set to the live_throughout of the call insn
+ the registers marked with
REG_DEAD notes in the call insn. However since $c1 is an argument to
the call it is used by the call_insn
and is marked  REG_DEP_TRUE ( Read after Write).  Shouldnt regs in
REG_DEP_TRUE be added to
live_throughout. My suspicion is that the LOG_LINKS are not always
up-to-date, therefore will it be
better to use DF_INSN_UID_USES ?

Thanks in advance,
Pranav

Re: Reload using a live register to reload into

2007-11-12 Thread Pranav Bhandarkar

Hi,
>
> DF is supposed to be out of the game at this point, it has handed over the
> control since global.c:build_insn_chain as far as liveness info is concerned.

Oh I used DF and it worked for me. But I think that is because this is
the first new instruction to be inserted and nothing really must have
changed w.r.t the call_insn for the DF info to be no longer
consistent. However, in the light of the above, I shouldnt be using
DF.

> The REG_DEP_TRUE are somewhat misleading, it's an artifact in the dump.
> What you're seeing are the contents of CALL_INSN_FUNCTION_USAGE, which are
> always correct, so a solution to your problem would be to scan it for uses of
> registers in caller-save.c:insert_one_insn.  Of course this wouldn't plug the
> hole entirely but would very likely be sufficient in practice.
Good idea, I'll try using CALL_INSN_FUNCTION_USAGE.

cheers!
Pranav

Re: How to describe function units allocation

2007-11-14 Thread Pranav Bhandarkar

> Hi,
> For the backend TI DSP TMS320C6x, There are four types of functional
> units which are .L unit, .M unit, .S unit and .D unit, and each type
> consists of two units named .X1 and .X2 respectively. Namely, there are
> total 8 units. Except the .M units surve only for multiply, other units
> share many functions. For example, they both enable 32 bits arithmetical
> operation. And in the assembly, which functional unit is used to perform
> operation must be explicitly indicated. For example, ADD .S1 A0, A1, A2;
> ADD .L1 A0, A1, A2; ADD .D1 A0, A1, A2 achieve the same goal by using
> different units. Surely, when producing assembly, a functional unit
> allocation somewhat like register allocation is needed. I wonder how can
> I describe the relationship in the machine description file, and whether
> I need write a functional unit allocation algorithm or it is done by a
> general purpose allocation algorithm embedded in GCC, like register
> allocation, I only need give some architecture descriptions? Thanks in
> advance for your kind assistance.

IMHO. the functional units that accompany the assembly instruction are
optional. However, for c6x-gcc the reason cc1 doesnt allocate
functional units is that the assembler ( as part of the c6x binutils )
does the functional unit allocation on its own. There are some notes
about how the assembler does this in Extending the GNU Assembler for
Texas Instruments TMS320C6x-DSP.pdf

HTH,
Pranav

>
> Regards,
> Li Wang
>

Re: VLIW scheduling and delayed branch

2007-12-10 Thread Pranav Bhandarkar

On Dec 9, 2007 2:19 AM, Thomas Sailer <[EMAIL PROTECTED]> wrote:
> > Has anyone faced a similar problem before? Are there targets for which
> > both VLIW and DBR are enabled? Perhaps ia64?
>

Ok, this was a long time back, but Yes I have faced a similar problem.
We disabled
delayed branch scheduling and used the machdep reorg pass. We examined
the dependencies of the
branch instructions moving backwards from the branch instruction and
marking all the instructions ( and the
containing insn bundle) that the branch depended upon. Then again,
moving backwards from the branch
insn, we picked the first insn bundle with all unmarked insns ( and
cycle size of the bundle <= no of delay slots
of a branch insn ) and put that bundle into the delay slot.

This approach worked fine for the small testcases that we had, but we
really didnt test this on any monstrous piece of software. We
implemented this for the TMS320C6x VLIW DSP.

HTH,
Pranav

50 matches

Mail list logo