[Bug target/56406] New: attribute((target(xpto))) causes ICE in i386 and rs6000

2013-02-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56406



 Bug #: 56406

   Summary: attribute((target(xpto))) causes ICE in i386 and

rs6000

Classification: Unclassified

   Product: gcc

   Version: 4.7.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: target

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: pa...@matos-sorge.com





While doing some tests I came across this ICE:

int __attribute__((__target__(xpto)))

foo(int x)

{

  if (x == 1)

return x;

  else

return x * foo (x-1);

}





If any word instead  of xpto works. The compiler fails with:

internal compiler error: in ix86_valid_target_attribute_inner_p, at

config/i386/i386.c:4214





This is because it's expecting either a STRING_CST ("xpto") or a TREE_CHAIN. If

an identifier node is seen instead we hit a gcc_unreachable. The other backend

that should be affected by this (but I didn't reproduce it) is rs6000.


[Bug other/56472] New: vcondu undocumented

2013-02-27 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56472



 Bug #: 56472

   Summary: vcondu undocumented

Classification: Unclassified

   Product: gcc

   Version: 4.7.3

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: other

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: pa...@matos-sorge.com





vcondu pattern is undocumented at:

http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gccint/Standard-Names.html


[Bug other/32185] unused result warnings and -werror

2013-05-04 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32185



Paulo J. Matos  changed:



   What|Removed |Added



 CC||pa...@matos-sorge.com



--- Comment #7 from Paulo J. Matos  2013-05-04 21:01:22 
UTC ---

This still occurs with HEAD of gcc in the call to getcwd in server.c if you

build with --enable-werror-always. It's not high priority but it would be nice

to see this warning gone (via checking within gcc).


[Bug other/50345] Incomplete GCC Internals sentence on LTO

2013-05-08 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50345



--- Comment #1 from Paulo J. Matos  2013-05-08 08:58:54 
UTC ---

I am revisiting this bug and it seems that there's just an extra work, nothing

specially unexplained and the correct URL for the problem is:

http://gcc.gnu.org/onlinedocs/gccint/LTO-Overview.html#LTO-Overview



I will submit a patch to fix this.


[Bug other/50345] Incomplete GCC Internals sentence on LTO

2013-05-08 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50345



--- Comment #2 from Paulo J. Matos  2013-05-08 09:09:30 
UTC ---

Created attachment 30050

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30050

Patch with typo fix.


[Bug rtl-optimization/52235] rtlanal: commutative_operand_precedence should prioritize regs

2013-05-08 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52235

--- Comment #4 from Paulo J. Matos  2013-05-08 14:24:08 
UTC ---
This issue persists in HEAD, the submitted patch seems to have been forgotten.
Ping, ping.


[Bug rtl-optimization/52235] rtlanal: commutative_operand_precedence should prioritize regs

2013-05-08 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52235

--- Comment #6 from Paulo J. Matos  2013-05-08 20:20:00 
UTC ---
(In reply to comment #5)
> (In reply to comment #4)
> > This issue persists in HEAD, the submitted patch seems to have been 
> > forgotten.
> > Ping, ping.
> 
> Ping it on gcc-patches, BZ is *not* the place for that!

Sorry, ping redirected to gcc-patches.


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-05-14 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761

Paulo J. Matos  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Paulo J. Matos  ---
Mark Glisse has submitted the patch for this to HEAD. I guess we can now
comfortably close the report.



[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-05-14 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761

--- Comment #11 from Paulo J. Matos  ---
No worries Marc, that's fine. The most important thing is that's fixed. I did
post the patch to patches@ but haven't actually pinged. I tend to forget about
them myself.

Thanks for sorting it out.


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-05-14 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761

--- Comment #12 from Paulo J. Matos  ---
Also, I haven't touched tree-tailcall.c on my patches but I can't see why you
would need to do it.


[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2013-07-17 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #11 from Paulo J. Matos  ---
(In reply to Brooks Moses from comment #10)
> Other than the documentation issues, this seems like a non-bug.

A non-bug? If you write a memcpy function by hand and call it memcpy, gcc
replaces the function body by a call to memcpy which generates an infinite
loop. How come it's a non-bug?


[Bug middle-end/56888] memcpy implementation optimized as a call to memcpy

2013-07-18 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888

--- Comment #18 from Paulo J. Matos  ---
I notice(In reply to Brooks Moses from comment #12)
> 
> Now, if this replacement still happens when you compile with -nostdlib, that
> would be a bug since it becomes legal code in that case.  But that's
> somewhat of a separate issue and should be filed separately if it happens. 
> (We should arguably also have a test for it, if we don't already.)


I noticed this in the gcc testsuite with my port. File
./gcc.c-torture/execute/builtins/lib/memset.c contains an implementation of
memset called memset and gcc goes into recursion when it finds this for the
reasons mentioned above. This causes builtin/memset test to fail.


[Bug c/48885] missed optimization with restrict qualifier?

2013-08-28 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48885

Paulo J. Matos  changed:

   What|Removed |Added

 CC||pa...@matos-sorge.com

--- Comment #3 from Paulo J. Matos  ---
My understanding is that for restrict optimization to take effect, variable a
has also to be restrict. Otherwise in the first assignment *a = *v;, a could
point to the same memory as v and therefore overwrite it, rendering the value
of *v in *b = *v; invalid. I think gcc is now doing the right thing.


[Bug middle-end/58463] New: ICE with -fdump-tree-all-all in vector indexed access

2013-09-18 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463

Bug ID: 58463
   Summary: ICE with -fdump-tree-all-all in vector indexed access
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pa...@matos-sorge.com

Created attachment 30855
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30855&action=edit
Code example to reproduce ICE

GCC tries to access varmap through get_varinfo in tree-ssa-structalias.c, but
it fails with an ICE.

I have attached a reduced code example to reproduce the problem.

The command line and output with 4.8.1 is:
$ ~/work/tmp/gcc-4.8.1/gcc/cc1 -fpreprocessed -fdump-tree-all-all -O2
core_list_join.i 
core_list_join.i:4:1: warning: no semicolon at end of struct or union [enabled
by default]
 }
 ^
 fn1
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data>  
core_list_join.i: In function 'fn1':
core_list_join.i:9:1: internal compiler error: in operator[], at vec.h:815
 }
 ^
0xbfd572 vec::operator[](unsigned int)
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/vec.h:815
0xbfcea0 vec::operator[](unsigned int)
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/vec.h:1244
0xbeb5ba get_varinfo
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/tree-ssa-structalias.c:319
0xbefa59 perform_var_substitution
   
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/tree-ssa-structalias.c:2244
0xbfb279 solve_constraints
   
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/tree-ssa-structalias.c:6569
0xbfb679 compute_points_to_sets
   
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/tree-ssa-structalias.c:6665
0xbfbdca compute_may_aliases()
   
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/tree-ssa-structalias.c:6814
0x967b9e execute_function_todo
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/passes.c:1941
0x966f88 do_per_function
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/passes.c:1701
0x967d64 execute_todo
/home/pmatos/work/gcc-releases/gcc-4.8.1/gcc/passes.c:1996
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.


[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access

2013-09-20 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463

--- Comment #3 from Paulo J. Matos  ---
Thanks Richard, will check.


[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access

2013-09-20 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463

--- Comment #4 from Paulo J. Matos  ---
Backporting fixes the problem. Can we go ahead and backport to 4.8?

Can we add the testcase to the testsuite as well please?


[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access

2013-09-20 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463

--- Comment #6 from Paulo J. Matos  ---
(In reply to rguent...@suse.de from comment #5)
> On Fri, 20 Sep 2013, pa...@matos-sorge.com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463
> > 
> > --- Comment #4 from Paulo J. Matos  ---
> > Backporting fixes the problem. Can we go ahead and backport to 4.8?
> 
> Feel free to do the backport - I don't have time for it right now.
> 
> Richard.

Sure, no problem. If by doing the backport you mean sending the relevant
patches to gcc-patches then I will do that today.

Thanks.


[Bug middle-end/58463] ICE with -fdump-tree-all-all in vector indexed access

2013-09-27 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58463

Paulo J. Matos  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #11 from Paulo J. Matos  ---
Backported Richard's patch to branch 4.8 under r202979.
Will mark as fixed.


[Bug gcov-profile/58682] New: Profiling directed optimization doesn't play well with indirect inlining

2013-10-10 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58682

Bug ID: 58682
   Summary: Profiling directed optimization doesn't play well with
indirect inlining
   Product: gcc
   Version: 4.8.2
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pa...@matos-sorge.com

Created attachment 30976
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30976&action=edit
Profiling files and preprocessed file resulting in ICE

Unzip the attachment and running cc1 from head of 4_8 branch I get:
(5) $ /home/pmatos/work/tmp/GCC/builds/gcc-4_8/gcc/cc1 -fpreprocessed
core_list_join.i -quiet -dumpbase core_list_join.c -auxbase core_list_join -g3
-O2 -version -fprofile-use=profile_all.fpexe -o core_list_join.s
GNU C (GCC) version 4.8.2 20131010 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.7.2, GMP version 4.3.0, MPFR version 2.4.1,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.8.2 20131010 (prerelease) (x86_64-unknown-linux-gnu)
compiled by GNU C version 4.7.2, GMP version 4.3.0, MPFR version 2.4.1,
MPC version 0.8.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 76572ada0801d80f2ac370ed86237f71
core_list_join.c:525:1: internal compiler error: in edge_badness, at
ipa-inline.c:895
0xc7c61f edge_badness
../../../gcc-4_8/gcc/ipa-inline.c:895
0xc7e82d add_new_edges_to_heap
../../../gcc-4_8/gcc/ipa-inline.c:1385
0xc7fcb3 inline_small_functions
../../../gcc-4_8/gcc/ipa-inline.c:1615
0xc7fcb3 ipa_inline
../../../gcc-4_8/gcc/ipa-inline.c:1794
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.

This was obtained while running a few coremark tests.


[Bug gcov-profile/58682] Profiling directed optimization doesn't play well with indirect inlining

2013-10-10 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58682

Paulo J. Matos  changed:

   What|Removed |Added

 CC||pa...@matos-sorge.com

--- Comment #1 from Paulo J. Matos  ---
I am not sure if the bug component is correctly set. Please change otherwise.


[Bug gcov-profile/58682] Profiling directed optimization doesn't play well with indirect inlining

2013-10-10 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58682

--- Comment #2 from Paulo J. Matos  ---
Here's my reading of the problem:

core_bench_list calls core_list_mergesort which indirectly (through a function
pointer) calls cmp_idx. The global max_count variable is updated in the
beginning of inline_small_functions, but later we inline core_list_mergesort
into core_bench_list which adds cmp_idx to the list of callees of
core_list_mergesort generating this cgraph_node:
core_list_mergesort/32 (core_list_mergesort) @0x2b97efed8378
  Type: function
  Visibility: public
  References: 
  Referring: 
  Function core_list_mergesort/32 is inline copy in core_bench_list/4
  Clone of core_list_mergesort/11
  Availability: local
  Function flags: analyzed executed 2x body local finalized
  Called by: core_bench_list/4 (2x) (1.00 per call) (inlined) 
  Calls: cmp_idx/2 (indirect_inlining) (217x) (100.00 per call)

Now there is an edge (core_list_mergesort -> cmp_idx) whose count is higher
(217) than the max_count global variable. This causes an ICE in edge_badness
which has the assertion (max_count >= edge_count).


[Bug gcov-profile/58682] Profiling directed optimization doesn't play well with indirect inlining

2013-10-10 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58682

--- Comment #3 from Paulo J. Matos  ---
I have now a fix for this. I will prepare a patch for gcc-patches.


[Bug gcov-profile/58682] Profiling directed optimization doesn't play well with indirect inlining

2013-10-21 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58682

Paulo J. Matos  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #4 from Paulo J. Matos  ---
Fixed in r203897.


[Bug ipa/58862] [4.9 Regression] LTO profiledbootstrap failure: lto1: ICE in edge_badness, at ipa-inline.c:1008

2013-10-30 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58862

Paulo J. Matos  changed:

   What|Removed |Added

 CC||pa...@matos-sorge.com

--- Comment #8 from Paulo J. Matos  ---
(In reply to Teresa Johnson from comment #7)
> This looks like a separate issue from the edge probability issue that I
> fixed. The edge probability issue was introduced earlier. Almost certainly
> due to the following change, given that you noticed it at r203899 and it is
> an error about edge badness:
> 
> 
> r203897 | pmatos | 2013-10-21 08:41:46 -0700 (Mon, 21 Oct 2013) | 4 lines
> Changed paths:
>M /trunk/gcc/ChangeLog
>M /trunk/gcc/ipa-inline.c
> 
> * ipa-inline.c (edge_badness): Cap edge->count at max_count for
> badness
> calculations.
> 
> 
> 
> 
> Adding author to cc.
> 
> Teresa

Thanks for adding me to CC. I will try to confirm if it was my patch.


[Bug ipa/58862] [4.9 Regression] LTO profiledbootstrap failure: lto1: ICE in edge_badness, at ipa-inline.c:1008

2013-11-01 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58862

--- Comment #9 from Paulo J. Matos  ---
I didn't manage to reproduce the bug yet. With the git sha before my commit
4bc0f16, I get the following on a profiledbootstrap on x64:

insn-opinit.c: In function 'void init_all_optabs(target_optabs*)':
insn-opinit.c:1234:1: error: verify_flow_info: Wrong probability of edge
1437->2606 66380
 init_all_optabs (struct target_optabs *optabs)
 ^
insn-opinit.c:1234:1: error: verify_flow_info: Wrong probability of edge
1427->2598 66380
insn-opinit.c:1234:1: internal compiler error: verify_flow_info failed
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.

Unfortunately running one of these takes a long time so it's a slow process to
check it out since as far as I am aware it's not possible to use a parallel
build. Do let me know if there's a fast way to build it. 

I will keep investigating.


[Bug ipa/58862] [4.9 Regression] LTO profiledbootstrap failure: lto1: ICE in edge_badness, at ipa-inline.c:1008

2013-11-02 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58862

--- Comment #14 from Paulo J. Matos  ---
(In reply to Uroš Bizjak from comment #13)
> (In reply to Paulo J. Matos from comment #9)
>  
> > Unfortunately running one of these takes a long time so it's a slow process
> > to check it out since as far as I am aware it's not possible to use a
> > parallel build. Do let me know if there's a fast way to build it. 
> > 
> > I will keep investigating.
> 
> Have you been able to trigger the ICE following the instructions in Comment
> 12?

No, not yet. Is that through a profiledbootstrap on HEAD?

[Bug ipa/58862] [4.9 Regression] LTO profiledbootstrap failure: lto1: ICE in edge_badness, at ipa-inline.c:1008

2013-11-12 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58862

--- Comment #20 from Paulo J. Matos  ---
Thanks for fixing this.


[Bug middle-end/57748] [4.7/4.8/4.9 Regression] ICE when expanding assignment to unaligned zero-sized array

2013-11-21 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

--- Comment #47 from Paulo J. Matos  ---
Would like to add that I backported the patch locally and all the testsuite is
passing until now. The ICE I initially got is not gone as well. So I can
confirm that as far as I know, the patch is indeed fine in 4.8.


[Bug middle-end/57748] [4.7/4.8/4.9 Regression] ICE when expanding assignment to unaligned zero-sized array

2013-11-21 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

--- Comment #45 from Paulo J. Matos  ---
Can we backport Bernd's patch of the 20th of September to 4.8? 

I just met this ICE in 4.8 and I guess we should still try to fix them in 4.8
since it's still maintained.


[Bug middle-end/57748] [4.7/4.8/4.9 Regression] ICE when expanding assignment to unaligned zero-sized array

2013-11-22 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

--- Comment #48 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #47)
> Would like to add that I backported the patch locally and all the testsuite
> is passing until now. The ICE I initially got is not gone as well. So I can
> confirm that as far as I know, the patch is indeed fine in 4.8.

s/not/now/.

Obviously I meant that the patch applies cleanly and works for me in 4.8.


[Bug middle-end/57748] [4.7/4.8/4.9 Regression] ICE when expanding assignment to unaligned zero-sized array

2013-11-22 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

--- Comment #49 from Paulo J. Matos  ---
I noticed that enabling misaligned moves have created a few test failures on my
port. Namely: execute.exp=20051113-1.c. It was generating one too many moves to
deference the structure in function Sum.

Applying patch posted by Bernd:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02291.html

fixes the problems I was seeing.
The patch does not apply cleanly to 4.8 (3 out of 14 hunks fail) but they are
easily fixable.

Again, a request for this to be approved in master and later in 4.8 (since it
lead to wrong code generation).


[Bug middle-end/57748] [4.7/4.8/4.9 Regression] ICE when expanding assignment to unaligned zero-sized array

2013-11-28 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

--- Comment #51 from Paulo J. Matos  ---
This was in a private VLIW SIMD port.


[Bug rtl-optimization/55025] reg_nonzero_bits_for_combine/get_last_value: missing mode check for hardware registers

2013-11-29 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55025

Paulo J. Matos  changed:

   What|Removed |Added

 CC||pa...@matos-sorge.com

--- Comment #2 from Paulo J. Matos  ---
I think this can be closed. 4.6 is not maintained anymore and this is fixed in
current versions.


[Bug tree-optimization/55761] New: process_assignment assumes -1 can be created

2012-12-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



 Bug #: 55761

   Summary: process_assignment assumes -1 can be created

Classification: Unclassified

   Product: gcc

   Version: 4.7.2

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: pa...@matos-sorge.com





Created attachment 29013

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29013

breaks GCC4.7.2 x86_64



Hello,



process_assignment in tree-tailcall.c assumes constant -1 can be created in any

mode (that matches all the conditions up until this point) in the line:

*m = build_int_cst (TREE_TYPE (non_ass_var), -1);



If, however, TREE_TYPE of non_ass_var of vector type for example, an ICE

occurs.



The following example breaks my port, x86_64 and probably more:



typedef short V4H __attribute__ ((vector_size (8)));

extern V4H __fp_rep_h (unsigned short B);



V4H

fn1 ()

{

V4H vuq = __fp_rep_h (0);

vuq -= __fp_rep_h (1);

return vuq;

}



In processing assignment vuq = vuq - vuq0; (where vuq0 = __fp_rep_h (1)), we

try to create multiplier -1 with type V4H and that fails in build_int_cst_wide.



We need to check if it's of integral type and only ten apply build_int_cst.

Otherwise we should use fold_build1.



I tested GCC 4.7.2:

$ gcc -O2 baaclc_block.i 

baaclc_block.i: In function 'fn1':

baaclc_block.i:10:1: internal compiler error: in build_int_cst_wide, at

tree.c:1222

Please submit a full bug report,

with preprocessed source if appropriate.

See <http://gcc.gnu.org/bugs.html> for instructions.



$ $ gcc -v

Using built-in specs.

COLLECT_GCC=/tools/oss/packages/x86_64-rhel5/gcc/4.7.2/bin/gcc

COLLECT_LTO_WRAPPER=/tools/oss/packages/x86_64-rhel5/gcc/4.7.2/libexec/gcc/x86_64-unknown-linux-gnu/4.7.2/lto-wrapper

Target: x86_64-unknown-linux-gnu

Configured with: ../../gcc-4.7.2/configure

--prefix=/tools/oss/packages/x86_64-rhel5/gcc/4.7.2 --with-gnu-as

--with-as=/tools/oss/packages/x86_64-rhel5/binutils/default/bin/as

--with-gnu-ld

--with-ld=/tools/oss/packages/x86_64-rhel5/binutils/default/bin/ld

--with-mpc=/tools/oss/packages/x86_64-rhel5/mpc/0.8.1

--with-mpfr=/tools/oss/packages/x86_64-rhel5/mpfr/2.4.2

--with-gmp=/tools/oss/packages/x86_64-rhel5/gmp/4.3.2

--enable-languages=c,c++,objc,fortran --enable-symvers=gnu

Thread model: posix

gcc version 4.7.2 (GCC)


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2012-12-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



--- Comment #1 from Paulo J. Matos  2012-12-20 15:53:48 
UTC ---

This happens for the negate_expr case too in the same switch. 



I have a patch to fix this that I will upload momentarily.


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2012-12-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



--- Comment #3 from Paulo J. Matos  2012-12-20 16:01:23 
UTC ---

Created attachment 29014

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29014

Use built_int_cst only for integral types, otherwise use fold_build1


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2012-12-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



--- Comment #4 from Paulo J. Matos  2012-12-20 16:58:08 
UTC ---

I created a new patch from your comment to gcc-patches:

diff --git a/gcc/tree-tailcall.c b/gcc/tree-tailcall.c

index 5b1fd2b..8c7d142 100644

--- a/gcc/tree-tailcall.c

+++ b/gcc/tree-tailcall.c

@@ -326,26 +326,14 @@ process_assignment (gimple stmt, gimple_stmt_iterator

call

   return true;



 case NEGATE_EXPR:

-  if (FLOAT_TYPE_P (TREE_TYPE (op0)))

-*m = build_real (TREE_TYPE (op0), dconstm1);

-  else

-*m = build_int_cst (TREE_TYPE (op0), -1);

-

+  *m = fold_unary (NEGATE_EXPR, TREE_TYPE (op0), op0);

   *ass_var = dest;

   return true;



 case MINUS_EXPR:

-  if (*ass_var == op0)

-*a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);

-  else

-{

-  if (FLOAT_TYPE_P (TREE_TYPE (non_ass_var)))

-*m = build_real (TREE_TYPE (non_ass_var), dconstm1);

-  else

-*m = build_int_cst (TREE_TYPE (non_ass_var), -1);

-

-  *a = fold_build1 (NEGATE_EXPR, TREE_TYPE (non_ass_var),

non_ass_var);

-}

+*a = fold_unary (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);

+  if (*ass_var == op1)

+*m = fold_unary (NEGATE_EXPR, TREE_TYPE (non_ass_var), non_ass_var);



   *ass_var = dest;

   return true;





However, I am less confident that it works now. Mainly because of the *m in

MINUS_EXPR. It seems from the comments in tailcall structure that m should be

-1, not (NEGATE non_ass_var). However, we cannot really create a -1 for vectors

straightforwardly.



I guess that, as per your comment below, we would need a build_minus_one_cst

that handles not only scalar types but also vector types.


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2012-12-20 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



--- Comment #5 from Paulo J. Matos  2012-12-20 17:06:04 
UTC ---

As per previous comments, I looks at build_one_cst and implemented

build_minus_one_cst:

tree

build_minus_one_cst (tree type)

{

  switch (TREE_CODE (type))

{

case INTEGER_TYPE: case ENUMERAL_TYPE: case BOOLEAN_TYPE:

case POINTER_TYPE: case REFERENCE_TYPE:

case OFFSET_TYPE:

  return build_int_cst (type, -1);



case REAL_TYPE:

  return build_real (type, dconstm1);



case VECTOR_TYPE:

  {

tree scalar = build_minus_one_cst (TREE_TYPE (type));



return build_vector_from_val (type, scalar);

  }



case COMPLEX_TYPE:

  return build_complex (type,

build_minus_one_cst (TREE_TYPE (type)),

build_zero_cst (TREE_TYPE (type)));



case FIXED_POINT_TYPE:

default:

  gcc_unreachable ();

}

}





However, I am unsure on how to best model the FIXED_POINT_TYPE case, if at all

possible.


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-01-22 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



--- Comment #6 from Paulo J. Matos  2013-01-22 15:30:48 
UTC ---

I have some further patches that replace the previously posted ones that I will

upload soon. Should these also be sent to gcc-patches or it's unnecessary since

they're being posted here?


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-01-22 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



Paulo J. Matos  changed:



   What|Removed |Added



  Attachment #29014|0   |1

is obsolete||



--- Comment #7 from Paulo J. Matos  2013-01-22 16:03:25 
UTC ---

Created attachment 29251

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29251

Proposed fix



This is based on Richard's suggestion and it seems to fix the bug.

Should this be submitted to gcc-patches?


[Bug tree-optimization/55761] process_assignment assumes -1 can be created

2013-01-22 Thread pa...@matos-sorge.com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55761



Paulo J. Matos  changed:



   What|Removed |Added



  Attachment #29251|0   |1

is obsolete||



--- Comment #8 from Paulo J. Matos  2013-01-22 16:29:08 
UTC ---

Created attachment 29252

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29252

Proprosed fix



Previous patch was removing trailing lines from lines I didn't touch. Now fixed

in this patch.


[Bug tree-optimization/59999] New: Sign extension in loop regression blocks generation of zero overhead loop

2014-01-30 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

Bug ID: 5
   Summary: Sign extension in loop regression blocks generation of
zero overhead loop
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pa...@matos-sorge.com

Following the discussion in the mailing list thread:
http://gcc.gnu.org/ml/gcc/2014-01/msg00319.html

I removed the undefined behaviour mentioned by Andreas. This code:

extern short delayLength;
typedef int Sample;
extern Sample *temp_ptr;
extern Sample x;

void
foo (short blockSize)
{
  short i;
  unsigned short loopCount;

  loopCount = (unsigned short)(blockSize + delayLength) % 8;

  for (i = 0; (int)i < (int)loopCount; i++)
{
  *temp_ptr = x ^ *temp_ptr;
  temp_ptr++;
}
}

displays the same regression.
v850 on trunk with -O2 -mv850e3v5:
_foo:
.LFB0:
mov hilo(_delayLength),r10
ld.h 0[r10],r10
add r10,r6
andi 7,r6,r6
be .L1
mov hilo(_temp_ptr),r15
mov 0,r10
ld.w 0[r15],r11
mov hilo(_x),r14
.L4:
ld.w 0[r11],r13
ld.w 0[r14],r12
add 1,r10
add 4,r11
xor r13,r12
sxh r10
st.w r12,-4[r11]
cmp r6,r10
blt .L4
st.w r11,0[r15]
.L1:
jmp [r31]
.LFE0:
.size   _foo, .-_foo
.section.debug_frame,"",@progbits


GCC until 
commit e0ae2fe2a0bebe9de31e3d8eb4feace4909ef009
Author: vries 
Date:   Fri May 20 19:32:30 2011 +

2011-05-20  Tom de Vries  

PR target/45098
* tree-ssa-loop-ivopts.c: Include expmed.h.
(get_shiftadd_cost): New function.
(force_expr_to_var_cost): Declare forward.  Use get_shiftadd_cost.


git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@173976
138bc75d-0d04-0410-961f-82ee72b054a4

could do better by avoiding the sign extend inside the loop.
At the time it was not such of a problem. Nowadays we support zero overhead
loop for v850 and it is not being generated because of the sign extend. Similar
situation for the mep backend.


[Bug tree-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #2 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #1)
> I guess pure co-incidence

If I understand you correctly you think that the patch I mentioned is not the
culprit but simply triggered this to happen. It might be true indeed. The patch
was obtained with a simple bisect from the git repository using the v850 as
testing backend since the mep backend is much more recent so I couldn't really
test it.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #4 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #3)
> Yes, I think that the IV choice merely shows that we miss to optimize the
> extension - which would be somewhere in the RTL opt pipeline.

Makes sense. My first instinct was to do it in expand but since expand does one
gimple statement at a time it might be too much for it to handle since it
probably has to detect the sign extend and promote the type of the register if
there are no conflicting conditions. 

If you suggest where to do this kind of thing I can give it a try.

Thanks.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-01-31 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #6 from Paulo J. Matos  ---
humm, ree is no good because by then we missed already the generation of zero
overhead loops. Do you think this is something that could be added to expand?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #7 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #5)
> Apart from expand there is the redundant-extension-elimination, ree.c.

In expand we get the following gimple for the loop:
;;   basic block 4, loop depth 0
;;pred:   2
;;4
  # i_15 = PHI <0(2), i_12(4)>
  # _18 = PHI <0(2), _4(4)>
  _6 = arr[_18];
  _7 = _6 + 1;
  arr[_18] = _7;
  _17 = (unsigned short) i_15;
  _13 = _17 + 1;
  i_12 = (short int) _13;
  _4 = (int) i_12;
  if (_4 < limit_5(D))
goto ;
  else
goto ;
;;succ:   4
;;3


Where _13 is an unsigned short and what we want to eliminate is this sign
extend:
  _4 = (int) i_12;

This doesn't seem trivial in the expand phase because to eliminate the sign
expand, you promote i_12 to int and have then to promote a bunch of other
variables, whose insn have been already emitted when you get here. Shouldn't
this be ivopts noticing that if it generates an int IV, it saves a sign extend
and therefore is better?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #8 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #7)
> (In reply to Richard Biener from comment #5)
> > Apart from expand there is the redundant-extension-elimination, ree.c.
> 
> In expand we get the following gimple for the loop:
> ;;   basic block 4, loop depth 0
> ;;pred:   2
> ;;4
>   # i_15 = PHI <0(2), i_12(4)>
>   # _18 = PHI <0(2), _4(4)>
>   _6 = arr[_18];
>   _7 = _6 + 1;
>   arr[_18] = _7;
>   _17 = (unsigned short) i_15;
>   _13 = _17 + 1;
>   i_12 = (short int) _13;
>   _4 = (int) i_12;
>   if (_4 < limit_5(D))
> goto ;
>   else
> goto ;
> ;;succ:   4
> ;;3
> 
> 
> Where _13 is an unsigned short and what we want to eliminate is this sign
> extend:
>   _4 = (int) i_12;
> 
> This doesn't seem trivial in the expand phase because to eliminate the sign
> expand, you promote i_12 to int and have then to promote a bunch of other
> variables, whose insn have been already emitted when you get here. Shouldn't
> this be ivopts noticing that if it generates an int IV, it saves a sign
> extend and therefore is better?

Made a mistake. With the attached test, the final gimple before expand for the
loop basic block is:
;;   basic block 5, loop depth 0
;;pred:   5
;;4
  # i_26 = PHI 
  # ivtmp.24_18 = PHI 
  _28 = (void *) ivtmp.24_18;
  _13 = MEM[base: _28, offset: 0B];
  x.4_14 = x;
  _15 = _13 ^ x.4_14;
  MEM[base: _28, offset: 0B] = _15;
  ivtmp.24_12 = ivtmp.24_18 + 4;
  temp_ptr.5_17 = (Sample *) ivtmp.24_12;
  _11 = (unsigned short) i_26;
  _2 = _11 + 1;
  i_1 = (short int) _2;
  _10 = (int) i_1;
  if (_10 < _25)
goto ;
  else
goto ;
;;succ:   5
;;6

However, the point is the same. IVOPTS should probably generate an int IV
instead of a short int IV to avoid the sign extend since removing the sign
extend during RTL seems to be quite hard.

What do you think?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #9 from Paulo J. Matos  ---
Created attachment 32044
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32044&action=edit
Testcase


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #10 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #8)
> 
> Made a mistake. With the attached test, the final gimple before expand for
> the loop basic block is:
> ;;   basic block 5, loop depth 0
> ;;pred:   5
> ;;4
>   # i_26 = PHI 
>   # ivtmp.24_18 = PHI 
>   _28 = (void *) ivtmp.24_18;
>   _13 = MEM[base: _28, offset: 0B];
>   x.4_14 = x;
>   _15 = _13 ^ x.4_14;
>   MEM[base: _28, offset: 0B] = _15;
>   ivtmp.24_12 = ivtmp.24_18 + 4;
>   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
>   _11 = (unsigned short) i_26;
>   _2 = _11 + 1;
>   i_1 = (short int) _2;
>   _10 = (int) i_1;
>   if (_10 < _25)
> goto ;
>   else
> goto ;
> ;;succ:   5
> ;;6
> 
> However, the point is the same. IVOPTS should probably generate an int IV
> instead of a short int IV to avoid the sign extend since removing the sign
> extend during RTL seems to be quite hard.
> 
> What do you think?

For >= 4.8 the scalar evolution of _10 is deemed not simple, because it looks
like the following:
 
unit size 
align 32 symtab 0 alias set 3 canonical type 0x2ab16690 precision
32 min  max  context 
pointer_to_this >

arg 0 
unit size 
align 16 symtab 0 alias set 4 canonical type 0x2ab16540
precision 16 min  max 
pointer_to_this >

arg 0 
arg 1  arg 2 >>

This is something like: (int) (short int) {1, +, 1}_1. Since these are signed
integers, we can assume they don't overflow, can't we simplify the scalar
evolution to a polynomial_chrec over 32bit integers and forget the nop_expr
that represents the sign extend?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-05 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #11 from Paulo J. Matos  ---
(In reply to Paulo J. Matos from comment #10)
> (In reply to Paulo J. Matos from comment #8)
> > 
> > Made a mistake. With the attached test, the final gimple before expand for
> > the loop basic block is:
> > ;;   basic block 5, loop depth 0
> > ;;pred:   5
> > ;;4
> >   # i_26 = PHI 
> >   # ivtmp.24_18 = PHI 
> >   _28 = (void *) ivtmp.24_18;
> >   _13 = MEM[base: _28, offset: 0B];
> >   x.4_14 = x;
> >   _15 = _13 ^ x.4_14;
> >   MEM[base: _28, offset: 0B] = _15;
> >   ivtmp.24_12 = ivtmp.24_18 + 4;
> >   temp_ptr.5_17 = (Sample *) ivtmp.24_12;
> >   _11 = (unsigned short) i_26;
> >   _2 = _11 + 1;
> >   i_1 = (short int) _2;
> >   _10 = (int) i_1;
> >   if (_10 < _25)
> > goto ;
> >   else
> > goto ;
> > ;;succ:   5
> > ;;6
> > 
> > However, the point is the same. IVOPTS should probably generate an int IV
> > instead of a short int IV to avoid the sign extend since removing the sign
> > extend during RTL seems to be quite hard.
> > 
> > What do you think?
> 
> For >= 4.8 the scalar evolution of _10 is deemed not simple, because it
> looks like the following:
>   type  size 
> unit size 
> align 32 symtab 0 alias set 3 canonical type 0x2ab16690
> precision 32 min  max  0x2ab12fa0 2147483647> context  D.2881>
> pointer_to_this >
>
> arg 0  type  HI
> size 
> unit size 
> align 16 symtab 0 alias set 4 canonical type 0x2ab16540
> precision 16 min  max  0x2ab12ee0 32767>
> pointer_to_this >
>
> arg 0 
> arg 1  arg 2  0x2acc9140 1>>>
> 
> This is something like: (int) (short int) {1, +, 1}_1. Since these are
> signed integers, we can assume they don't overflow, can't we simplify the
> scalar evolution to a polynomial_chrec over 32bit integers and forget the
> nop_expr that represents the sign extend?

This chain of nop_expr in the scalar evolution is due to Richards fix for
PR53676. It is still not clear to me, what the fix is for and if it needs
tweaking or if it needs for a later pass to remove the widening from the loop.
I am investigating.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #13 from Paulo J. Matos  ---
(In reply to Richard Biener from comment #12)
> 
> Note that {1, +, 1}_1 is unsigned.  The issue is that while i is short
> i++ is really i = (short)((int) i + 1) and thus only the operation in
> type 'int' is known to not overflow and thus the IV in short _can_
> overflow and the loop can loop infinitely for example for loopCount
> == SHORT_MAX + 1.
> 
> The fix to SCEV analysis was to still be able to analyze the evolution at
> all.
> 
> The testcase is simply very badly written (unsigned short upper bound,
> signed short IV and IV comparison against upper bound in signed int).

I thought any signed operation cannot overflow, independently on its width,
therefore (short) (int + 1) shouldn't overflow.

I agree with you on the testcase, however, that's taken from customer code and
it's even if badly written, it's acceptable C. GCC 4.5.4 generates the scalar
evolution for the integer variable: {1, +, 1}_1 without the casts (therefore a
simple_iv). This allows GCC to use an int for an IV which helps discard the
sign extend in the loop body and later on allows the zero overhead loop being
generated. This case happens again and again and causes serious performance
regression on customer code.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #14 from Paulo J. Matos  ---
Something like this which looks much simpler hits the same problem:
extern int arr[];

void
foo32 (int limit)
{
  short i;
  for (i = 0; (int)i < limit; i++)
arr[i] += 1;
}


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #16 from Paulo J. Matos  ---
(In reply to rguent...@suse.de from comment #15)
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

OK, that makes sense. But in GCC 4.8 that doesn't seem to be what happens.
It seems to be i = (short) ((unsigned short) i + 1)
Later i is cast to int for comparison.

Before ivopts this is the end of the loop body:
  i.7_19 = (unsigned short) i_26;
  _20 = i.7_19 + 1;
  i_21 = (short intD.8) _20;
  _10 = (intD.1) i_21;
  if (_10 < _25)
goto ;
  else
goto ;

i is initially a short, then moved to unsigned short. The addition is performed
and returned to short. Then cast to int for the comparison.

For GCC 4.5.4 the end of loop body is:
  iD.2767_18 = iD.2767_26 + 1;
  D.5046_9 = (intD.0) iD.2767_18;
  if (D.5046_9 < D.5047_25)
goto ;
  else
goto ;

Here the addition is made in short int and then there's only one cast to int.


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #17 from Paulo J. Matos  ---
(In reply to rguent...@suse.de from comment #15)
> On Thu, 6 Feb 2014, pa...@matos-sorge.com wrote:
> 
> > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5
> > 
> > --- Comment #14 from Paulo J. Matos  ---
> > Something like this which looks much simpler hits the same problem:
> > extern int arr[];
> > 
> > void
> > foo32 (int limit)
> > {
> >   short i;
> >   for (i = 0; (int)i < limit; i++)
> > arr[i] += 1;
> > }
> 
> Exactly the same problem.  C integral type promotion rules make
> that i = (short)((int)i + 1) again.  Note that (int)i + 1
> does not overflow, (short) ((int)i + 1) invokes implementation-defined
> behavior which in our case is modulo-2 reduction.
> 
> Nothing guarantees that (short)i + 1 does not overflow.

I am being thick... indeed I forgot to notice that i++ also invokes undefined
behaviour. I guess then GCC sorts that out by casting i into unsigned short for
the addition and all the remaining issues then unfold.


[Bug middle-end/60102] New: powerpc fp-bit ices at dwf_regno

2014-02-06 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60102

Bug ID: 60102
   Summary: powerpc fp-bit ices at dwf_regno
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pa...@matos-sorge.com

Created attachment 32073
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32073&action=edit
Testcase

This might be a dup of PR52372 or PR57933 but since I am not sure I am opening
a new bug.

When trying to compile powerpc libgcc fp-bit I get an ICE using trunk:

$ /home/pmatos/projects/EXTERNAL/GCC/builds/gcc-trunk_powerpc/./gcc/cc1
-fpreprocessed fp-bit.i -quiet -dumpbase fp-bit.c -msoft-float -mcpu=8540
-auxbase-strip _addsub_df.o -g -g -g -O2 -O2 -O2 -Wextra -Wall -Wno-narrowing
-Wwrite-strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes
-Wold-style-definition -version -fbuilding-libgcc -fno-stack-protector
-fvisibility=hidden -o fp-bit.s
GNU C (GCC) version 4.9.0 20140205 (experimental) (powerpc-eabispe)
compiled by GNU C version 4.8.1, GMP version 5.1.2, MPFR version
3.1.1-p2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
GNU C (GCC) version 4.9.0 20140205 (experimental) (powerpc-eabispe)
compiled by GNU C version 4.8.1, GMP version 5.1.2, MPFR version
3.1.1-p2, MPC version 1.0.1
GGC heuristics: --param ggc-min-expand=30 --param ggc-min-heapsize=4096
Compiler executable checksum: 52229a64c376a95d997a16c551e2c79f
fp-bit.i: In function ‘fn3’:
fp-bit.i:27:1: internal compiler error: in dwf_regno, at dwarf2cfi.c:909
 }
 ^
0x786749 dwf_regno
../../../gcc-trunk/gcc/dwarf2cfi.c:909
0x78696b dwarf2out_flush_queued_reg_saves
../../../gcc-trunk/gcc/dwarf2cfi.c:988
0x789981 scan_trace
../../../gcc-trunk/gcc/dwarf2cfi.c:2522
0x789ab2 create_cfi_notes
../../../gcc-trunk/gcc/dwarf2cfi.c:2565
0x78a553 execute_dwarf2_frame
../../../gcc-trunk/gcc/dwarf2cfi.c:2925
0x78b402 execute
../../../gcc-trunk/gcc/dwarf2cfi.c:3421
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <http://gcc.gnu.org/bugs.html> for instructions.



I will attach the fp-bit.i reduced version.

[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-07 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #20 from Paulo J. Matos  ---
OK, I was trying to make sense of all this and there are two things that stick
out.

One is when you say that due to C integer promotion rules make i =
(short)((int)i + 1). However GCC is doing i = (short) ((unsigned short) i + 1).
Am I missing something that allows this or makes the addition in int equivalent
to the addition in unsigned short?

Secondly we still have a dangling sign_extend later on that we could possibly
optimize. I find it hard to understand if this can be done properly in expand
or if a small pass like ree but before zero overhead loop generation is better.
What do you think?


[Bug rtl-optimization/59999] [4.9 Regression] Sign extension in loop regression blocks generation of zero overhead loop

2014-02-12 Thread pa...@matos-sorge.com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=5

--- Comment #22 from Paulo J. Matos  ---
After some thought, I am concluding this cannot actually be optimized and that
GCC 4.5.4 was better because it was taking advantage of an undefined behaviour
that doesn't exist.

The thought process is as follows. The whole process has to do with this type
of loop:
void foo (int loopCount)
{
  short i;
  for (i = 0; (int)i < loopCount; i++)
...
}

GCC 4.5.4 was assuming i++ could have undefined behaviour and the increment was
done in type short. Then i was promoted to int through a sign_extend and
compared to loopCount. This undefined behaviour allows GCC 4.5.4 to generate an
int scev for the loop.

In GCC 4.8 or later (haven't tested with 4.6 or 4.7), i++ is known not to have
undefined behaviour. i++ due to C integer promotion rules is: i = (short)
((int) i + 1). GCC validly simplifies to i = (short) ((unsigned short)i + 1).
This is then sign extended to int for comparison. GCC cannot generate an int
scev because it's not simple: (int) (short) {1, +, 1}_1.

This can validly loop forever if loopCount > SHORT_MAX.
For example, is loopCount is SHORT_MAX + 1, then when i reaches SHORT_MAX and
is incremented by one the addition is fine because is done in (unsigned short)
and then truncated using modulo 2 (implementation defined behaviour) to short,
therefore never reaching loopCount and looping forever.

In RTL we get the following sequence:
r4:SI <- [loopCount]
r0:HI <- 0

code label...

...

r2:HI <- r1:HI + 1
r3:SI <- sign_extend r2:HI

p0:BI <- r3:SI < r4:SI
loop to code label if p0:BI

I was tempted to simplify this to:
r4:SI <- [loopCount]
r0:SI <- 0

code label...

...

r2:SI <- r1:SI + 1

p0:BI <- r2:SI < r4:SI
loop to code label if p0:BI

However this will never have an infinite loop behaviour if r4:SI == SHORT_MAX,
therefore I think that at least in this case this cannot be optimized.

I am tempted to close the bug report. Richard?