Could we use VIEW_CONVERT_EXPR to build ADDR_EXPR ?

2010-08-19 Thread Fang, Changpeng
Hi, 

 I am working on bug 45260 and found that the problem is related to 
VIEW_CONVERT_EXPR.

In the prefetching pass, we generate the base address for the prefetching:
tree-ssa-loop-prefetch.c (issue_prefetch_ref):

addr_base = build_fold_addr_expr_with_type (ref->mem, ptr_type_node);
+ gcc_assert (is_gimple_address (addr_base));

Here ref->mem is a COMPONENT_REF and contains a VIEW_CONVERT_EXPR. When I put
an assert after  build_fold_addr_expr_with_type, I found that the addr_base is 
not a gimple
address at all. The direct reason is that the TREE_OPERAND of the 
VIEW_CONVERT_EXPR
is a SSA_NAME.

My questions are:

(1) Can we generate address expression for COMPONENT_REF and contains 
VIEW_CONVERT
  expression (is it legal to do so)?

(2) The assert in the bug actually occurs  in verify_expr in tree-cfg.c, is 
this assert valid?

I need to understand whether the bug is in the VIEW_CONVERT_EXPR generation or 
in build_fold_addr_expr_with_type.

Thanks for your inputs.

Changpeng

  


RE: Could we use VIEW_CONVERT_EXPR to build ADDR_EXPR ?

2010-08-20 Thread Fang, Changpeng




>No you should not generate addresses for VCEs that contain a SSA_NAME.
> I think you should check if get_base_address is a
>is_gimple_addressable inside gather_memory_references_ref.

There, TREE_CODE ( get_base_address (ref)) == SSA_NAME
 
and get_base_address (ref) is is_gimple_addressable.

However, address expression containing SSA_NAME is NOT considered
as a gimple address.

Thanks,

Changpeng 


RE: Could we use VIEW_CONVERT_EXPR to build ADDR_EXPR ?

2010-08-20 Thread Fang, Changpeng


> >No you should not generate addresses for VCEs that contain a SSA_NAME.
> > I think you should check if get_base_address is a
> >is_gimple_addressable inside gather_memory_references_ref.
>
> There, TREE_CODE ( get_base_address (ref)) == SSA_NAME
>
> and get_base_address (ref) is is_gimple_addressable.
>
> However, address expression containing SSA_NAME is NOT considered
> as a gimple address.

>You simply can't take an address of such thing.  Look at IVOPTs,
>it has measures to avoid this stuff.

Thanks, Richard:

I have a fix based on this suggestion:
http://gcc.gnu.org/ml/gcc-patches/2010-08/msg01625.html


Changpeng





How to avoid auto-vectorization for this loop (rolls at most 3 times)

2010-09-08 Thread Fang, Changpeng

It seems  the auto-vectorizer could not recognize that this loop will roll at 
most 3 times.
And it will generate quite messy code. 

int a[1024], b[1024]; 
void foo (int n) 
{
  int i;
  for (i = (n/4)*4; i< n; i++)
a[i] =  a[i] +  b[i];
}

How can we correctly estimate the number of iterations for this case and use 
this info for the vectorizer?

Thanks,

Changpeng


RE: How to avoid auto-vectorization for this loop (rolls at most 3 times)

2010-09-09 Thread Fang, Changpeng
>> It seems  the auto-vectorizer could not recognize that this loop will
>> roll at most 3 times.
>> And it will generate quite messy code.
>>
>> int a[1024], b[1024];
>> void foo (int n)
>> {
>>   int i;
>>   for (i = (n/4)*4; i< n; i++)
>> a[i] =  a[i] +  b[i];
>> }
>>
>> How can we correctly estimate the number of iterations for this case
>> and use this info for the vectorizer?

>Does it recognise it if you rewrite the loop as follows:

>for (i = n&~0x3; i< n; i++)
 >a[i] =  a[i] +  b[i];

NO.  

But it is OK for the following case:

 for (i = n-3; i< n; i++)
 a[i] =  a[i] +  b[i];

It seems it fails at the case of "unknown but small". Anyway, this mostly
affects compilation time and code size, and has limited impact on 
performance.

For
for (i = n&~0x3; i< n; i++)
a[i] =  a[i] +  b[i]; 

The attached foo-O3-no-tree-vectorize.s is what we expect from the optimizer.
foo-O3.s is too bad.

Thanks,

Changpeng


 

foo-O3-no-tree-vectorize.s
Description: foo-O3-no-tree-vectorize.s


foo-O3.s
Description: foo-O3.s


What loop optimizations could increase the code size significantly?

2010-12-10 Thread Fang, Changpeng
Hi,

 I am kooking ways to reduce the code size. What loop optimizations could 
increase the code size significantly?
The optimization I know are: unswitch, vectorization, prefetch and unrolling.
We should not perform these optimizations if the loop just roll a few 
iterations.

In addition, what loop optimizations could generate pre- and/or post loops?
For example, vectorization, unrolling, 

Thanks,

Changpeng