Identifying Pure and Const Functions

2015-05-29 Thread Pritam Gharat
Hi,


How do we identify whether a function is a pure or a const function?
Is there any flag associated with its cgraph_node or the tree node
(decl of cgraph_node)?


Thanks,
Pritam Gharat


Re: Identifying Pure and Const Functions

2015-05-29 Thread Marek Polacek
On Fri, May 29, 2015 at 01:16:32PM +0530, Pritam Gharat wrote:
> How do we identify whether a function is a pure or a const function?
> Is there any flag associated with its cgraph_node or the tree node
> (decl of cgraph_node)?

You'll want to look into ipa-pure-const.c.

Marek


Re: Better info for combine results in worse code generated

2015-05-29 Thread Segher Boessenkool
On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> I'll tell you one of the reasons why they are
> slower, as any decent hardware engineer could probably figure this out
> themselves anyway.  The record form instructions are cracked into two
> internal ops, the basic arithmetic/logic op, and a compare.  There's a
> limit to how much hardware can do in one clock cycle, or conversely,
> if you try to do more your clock must be slower.

Logical and simple arithmetic record-form ops aren't cracked, according
to our pipeline descriptions, and some simple testing (which could well
be flawed :-) ).

Of course I agree that cmp is better than and., but it will execute
pretty much the same in most code as far as I can tell.

> > > one of the aims of the wider patch I was working
> > > on was to remove patterns like rotlsi3_64, ashlsi3_64, lshrsi3_64 and
> > > ashrsi3_64.
> > 
> > We will need such patterns no matter what; the compiler cannot magically
> > know what machine insns set the high bits of a 64-bit reg to zero.
> 
> No, not by magic.  I define EXTEND_OP in rs6000.h and use it in
> record_value_for_reg.  Full patch follows.  I see enough code gen
> improvements on powerpc64le to make this patch worth pursuing,
> things like "rlwinm 0,5,6,0,25; extsw 0,0" being converted to
> "rldic 0,5,6,52".  No doubt due to being able to prove an int var
> doesn't have the sign bit set.  Hmm, in fact the 52 says it is
> known to be only 6 bits before shifting.

Ah, interesting.  So you let reg_stat know about the full register
result in cases where the RTL instruction does not mention the full
register at all.  That sounds like a worthwhile direction to explore :-)

> +/* Describe how rtl operations on registers behave on this target when
> +   operating on less than the entire register.  */
> +#define EXTEND_OP(OP) \
> +  (GET_MODE (OP) != SImode   \
> +   || !TARGET_POWERPC64  \
> +   ? UNKNOWN \
> +   : (GET_CODE (OP) == AND   \
> +  || GET_CODE (OP) == ZERO_EXTEND\
> +  || GET_CODE (OP) == ASHIFT \
> +  || GET_CODE (OP) == ROTATE \
> +  || GET_CODE (OP) == LSHIFTRT)  \
> +   ? ZERO_EXTEND \
> +   : (GET_CODE (OP) == SIGN_EXTEND   \
> +  || GET_CODE (OP) == ASHIFTRT)  \
> +   ? SIGN_EXTEND \
> +   : UNKNOWN)

I think this is too simplistic though.  For example, AND with -7 is not
zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
32 bits of rA).

In general, everything depends on what exact machine insn is used; basing
the decision on the RTL leads to duplication, is fragile, _will_ get out
of synch.


Segher


Re: Better info for combine results in worse code generated

2015-05-29 Thread Alan Modra
On Fri, May 29, 2015 at 07:58:38AM -0500, Segher Boessenkool wrote:
> On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> > +/* Describe how rtl operations on registers behave on this target when
> > +   operating on less than the entire register.  */
> > +#define EXTEND_OP(OP) \
> > +  (GET_MODE (OP) != SImode \
> > +   || !TARGET_POWERPC64\
> > +   ? UNKNOWN   \
> > +   : (GET_CODE (OP) == AND \
> > +  || GET_CODE (OP) == ZERO_EXTEND  \
> > +  || GET_CODE (OP) == ASHIFT   \
> > +  || GET_CODE (OP) == ROTATE   \
> > +  || GET_CODE (OP) == LSHIFTRT)\
> > +   ? ZERO_EXTEND   \
> > +   : (GET_CODE (OP) == SIGN_EXTEND \
> > +  || GET_CODE (OP) == ASHIFTRT)\
> > +   ? SIGN_EXTEND   \
> > +   : UNKNOWN)
> 
> I think this is too simplistic though.  For example, AND with -7 is not
> zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
> 32 bits of rA).

We take some pains in rs6000.md to ensure that the wrap-around case
for rlwinm does not occur for TARGET_POWERPC64.  You'll find that an
SImode AND with any value is in fact zero extending.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Better info for combine results in worse code generated

2015-05-29 Thread Segher Boessenkool
On Fri, May 29, 2015 at 11:20:08PM +0930, Alan Modra wrote:
> On Fri, May 29, 2015 at 07:58:38AM -0500, Segher Boessenkool wrote:
> > On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> > > +/* Describe how rtl operations on registers behave on this target when
> > > +   operating on less than the entire register.  */
> > > +#define EXTEND_OP(OP) \
> > > +  (GET_MODE (OP) != SImode   \
> > > +   || !TARGET_POWERPC64  \
> > > +   ? UNKNOWN \
> > > +   : (GET_CODE (OP) == AND   \
> > > +  || GET_CODE (OP) == ZERO_EXTEND\
> > > +  || GET_CODE (OP) == ASHIFT \
> > > +  || GET_CODE (OP) == ROTATE \
> > > +  || GET_CODE (OP) == LSHIFTRT)  \
> > > +   ? ZERO_EXTEND \
> > > +   : (GET_CODE (OP) == SIGN_EXTEND   \
> > > +  || GET_CODE (OP) == ASHIFTRT)  \
> > > +   ? SIGN_EXTEND \
> > > +   : UNKNOWN)
> > 
> > I think this is too simplistic though.  For example, AND with -7 is not
> > zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
> > 32 bits of rA).
> 
> We take some pains in rs6000.md to ensure that the wrap-around case
> for rlwinm does not occur for TARGET_POWERPC64.

I consider that a bug; it pessimises code.

> You'll find that an
> SImode AND with any value is in fact zero extending.

int f(int x) { return x & 0xc000; }

is a counter-example with current trunk (it does a rldicr).


Segher


Re: Identifying Pure and Const Functions

2015-05-29 Thread Pritam Gharat
I had a look at the file ipa-pure-const.c. I have wriiten a Simple IPA
Pass which identifies pure and const functions in the program. I have
inserted this pass after ipa-pta pass which is executed after
ipa-pure-const pass. Also, the option -fipa-pure-const is enabled.
I am using DECL_PURE_P(t) and TREE_READONLY(t) to identify whether t
is a pure or a const function.

I tested this code on a sample program which consists of a function
"error" which returns a constant value. gcc fails to identify it as a
pure function. However, if I specify the function as pure as shown
below, gcc identifies it as pure function

__attribute__((pure))
int error()
{
return 2;
}

I am unable to understand why error function is not identified as a
pure function by gcc?  Are the checks to identify pure and const
functions correct?

Thanks,
Pritam Gharat

On Fri, May 29, 2015 at 1:26 PM, Marek Polacek  wrote:
> On Fri, May 29, 2015 at 01:16:32PM +0530, Pritam Gharat wrote:
>> How do we identify whether a function is a pure or a const function?
>> Is there any flag associated with its cgraph_node or the tree node
>> (decl of cgraph_node)?
>
> You'll want to look into ipa-pure-const.c.
>
> Marek


Re: Identifying Chain of Recurrence

2015-05-29 Thread Pritam Gharat
I am writing a Simple IPA Pass which is inserted after ipa-pta. This
pass identifies whether a chain of recurrence is built by gcc or not.
I have used tree_is_chrec(t) to check if t represents chain of
recurrence and function find_var_scev_info(bb, var)  to obtain the
scev information generated (which comprises of chrec information).
However, find_var_scev_info() is a static function and cannot be
called from my plugin.

Is there any alternative way to get the chain of recurrence
information? Or do we need to change the gcc source code by making the
function find_var_scev_info() non-static and recompiling the source
code?

Thanks,
Pritam Gharat

On Fri, May 29, 2015 at 10:34 AM, Bin.Cheng  wrote:
> On Fri, May 29, 2015 at 12:41 PM, Pritam Gharat
>  wrote:
>> GCC builds a chain of recurrence to capture a pattern in which an
>> array is accessed in a loop. Is there any function which identifies
>> that gcc has built a chain of recurrence? Is this information
>> associated to the gimple assignment which accesses the array elements?
>>
> GCC analyzes evolution of scalar variables on SSA representation.
> Each ssa var is treated as unique and can be analyzed.  If the address
> expression itself is a ssa var, it can be analyzed by scev; otherwise,
> the users of scev have to compute by themselves on the bases of other
> scev vars.  For example, address of MEM[scev_var] can be analyzed by
> scev; while address of MEM[invariant_base+scev_iv] is computed by
> users.  Well, at least this is how IVOPT works.  You can refer to
> top-file comment in tree-scalar-evolution.c for detailed information.
> General routines for chain of recurrence is in file tree-chrec.c.
>
> Thanks,
> bin
>>
>> Thanks,
>> Pritam Gharat


Re: Relocations to use when eliding plts

2015-05-29 Thread Richard Henderson
On 05/28/2015 01:36 PM, Rich Felker wrote:
> On Thu, May 28, 2015 at 09:40:57PM +0200, Jakub Jelinek wrote:
>> On Thu, May 28, 2015 at 03:29:02PM -0400, Rich Felker wrote:
 You're not missing anything.  But do you want the performance of a
 library to depend on how the main executable is compiled?
>>>
>>> Not directly. But I'd rather be in that situation than have
>>> pessimizations in library codegen to avoid it. I'm worried about cases
>>> where code both loads the address of a function and calls it, such as
>>> this (stupid) example:
>>>
>>> a((void *)a);
>>
>> That can be handled by using just one GOT slot, the non-.got.plt one;
>> only if there are only relocations that guarantee that address equality is
>> not important it would use the faster (*_JUMP_SLOT?) relocations.
> 
> How far would this extend, e.g. in the case of LTO or compiling the
> whole library at once?

It depends on how difficult that becomes, I suppose.  It's certainly something
that we can look for during LTO.

I did in fact mention this exact point in the original message:

> This does leave open other optimization questions, mostly around weak
> functions.  Consider constructs like
> 
>   if (foo) foo();
> 
> Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting the
> possibility (not certainty) of jump-to-jump but definitely avoiding a separate
> load insn and the latency implied by that?

As a last resort the two can always be unified at static link time, so that
only one got slot is created, and only one runtime relocation exists.  At which
point we'd still have two loads in the insn stream.  But barring preemption,
the second load will be from cache and cost a single cycle.

So which is less likely, this double-use of a function pointer, or a non-PIE
executable?


r~


Re: [RFC] Design and Implementation for Path Splitting for Loop with Conditional IF-THEN-ELSE

2015-05-29 Thread Jeff Law

On 05/16/2015 06:49 AM, Ajit Kumar Agarwal wrote:

I have Designed and  implemented with the following design for the path 
splitting of the loops with conditional IF-THEN-ELSE.
The implementation has gone through the bootstrap for Microblaze target along 
DEJA GNU regressions tests and
running the MIBench/EEMBC benchmarks. There is no regression seen in Deja GNU 
tests and Deja C++ tests results are
better with the path splitting changes(lesser failures). The Mibench/EEMBC 
benchmarks are run and no performance degradation is
seen and the performance improvement in telcom_gsm( Mibench benchmarks) of 
9.15% and rgbcmy01_lite(EEMBC
benchmarks) the performance improvement of 9.4% is observed.

Proposal on the below were made earlier on path splitting and the reference is 
attached.
The first thing I notice is you haven't included any tests for the new 
optimization.  We're trying really hard these days to have tests in the 
suite for this kind of new optimization/feature work.


I don't offhand know if any of the benchmarks you cite above are 
free-enough to derive a testcase from.  But one trick many of us use is 
to instrument the pass and compile some known free software (often gcc 
itself) to find triggering code and use that to generate tests for the 
new transformation.





Design changes.

1. The dominators of the block with conditional IF statements say BB1 are found 
and the join node of the IF-THEN-ELSE
inside the loops is found on the blocks dominated by the BB1 and are not 
successor of BB1 are the join node.
This isn't necessarily my area of strength, but it doesn't seem right to 
me.   I guess it would work if there aren't any more nodes in the loop 
after the join block, except for the loop latch, but it doesn't see 
right in the general case.  It might work if you also verify that the IF 
block is the immediate dominator of potential join node.






2. The Join node is same as the source of the loops latch basic blocks.

OK.  This is why your initial algorithm in #1 works.



3. If the above conditional in (1) and (2) are found the Join node  same as the 
source of the Loop latch node is
moved into predecessors and the Join node ( Source of the Loop latch node) is 
made empty statement block with only
the phi nodes.
Hopefully you do this by duplicating the join node and manipulating the 
CFG so that the original predecessors of the join each go to their copy 
of the join node.  The copied join nodes unconditionally then pass 
control to the original join node.





4. In the process of moving the Join node into its predecessors the result of 
the phi node in the Join node propagated
with the corresponding phi arguments  based on which predecessor it came from 
in the Join blocks and move into its predecessors.
Right.  Note there's a phi-only cprop pass which is designed to do this 
propagation and in theory should be very fast.



5. The Dominator INFO is updated after performing the steps of (1) (2) (3) and 
(4).
Right.  Getting the sequencing of the various updates can be tricky. 
Even more so if you try to do it as a part of jump threading because you 
have to (for example) deal with unreachable blocks in the CFG.





6. The Join which is same as the source of the Loop latch node is made empty 
with only the phi node in the Join node.

Right.



7. The implementation is done in Jump threading phase of the machine 
independent optimization on tree based
representation. The path splitting is called after the Jump threading 
optimization is performed.
The following files are modifed.

 a) tree-vrp.c
 b) tree-ssa-threadedge.c
 c) tree-cfg.c
 d) tree-ssa-threadedge.h
 e) tree-cfg.h
 f) cfghooks.c

The diff is attached along with this mail and pasted below.
I would recommend this as a separate pass.  One of the goals for GCC 6 
is to break out the threader into its own pass, if you're piggybacking 
on the threader it just makes this task harder.


Also by implementing this as a distinct pass you have more freedom to 
put it where it really belongs in the optimization pipeline.  Given the 
duplication may end up exposing partial redundancies, dead code & 
propagation opportunities, it's probably best placed before DOM.  That 
also eliminates the need for running phi-only cprop because DOM will 
take care of it for you.


When jump threading is broken out into its own distinct pass, it'll 
probably end up just before or just after your path splitting pass.





Please share your thoughts and feedbacks on the above optimization and the 
design and the coding and implementation
done for the given above design.
Some general notes.  Every function should have a block comment which 
describes what the function does, its arguments & return value.  There 
are many examples throughout the code you can use to understand what 
we're looking for in those function comments.


Those comments are particularly useful during initial review and later 
maintenance, so in the future, plea

Re: Relocations to use when eliding plts

2015-05-29 Thread H.J. Lu
On Fri, May 29, 2015 at 8:38 AM, Richard Henderson  wrote:
> On 05/28/2015 01:36 PM, Rich Felker wrote:
>> On Thu, May 28, 2015 at 09:40:57PM +0200, Jakub Jelinek wrote:
>>> On Thu, May 28, 2015 at 03:29:02PM -0400, Rich Felker wrote:
> You're not missing anything.  But do you want the performance of a
> library to depend on how the main executable is compiled?

 Not directly. But I'd rather be in that situation than have
 pessimizations in library codegen to avoid it. I'm worried about cases
 where code both loads the address of a function and calls it, such as
 this (stupid) example:

 a((void *)a);
>>>
>>> That can be handled by using just one GOT slot, the non-.got.plt one;
>>> only if there are only relocations that guarantee that address equality is
>>> not important it would use the faster (*_JUMP_SLOT?) relocations.
>>
>> How far would this extend, e.g. in the case of LTO or compiling the
>> whole library at once?
>
> It depends on how difficult that becomes, I suppose.  It's certainly something
> that we can look for during LTO.
>
> I did in fact mention this exact point in the original message:
>
>> This does leave open other optimization questions, mostly around weak
>> functions.  Consider constructs like
>>
>>   if (foo) foo();
>>
>> Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting 
>> the
>> possibility (not certainty) of jump-to-jump but definitely avoiding a 
>> separate
>> load insn and the latency implied by that?
>
> As a last resort the two can always be unified at static link time, so that
> only one got slot is created, and only one runtime relocation exists.  At 
> which
> point we'd still have two loads in the insn stream.  But barring preemption,
> the second load will be from cache and cost a single cycle.
>
> So which is less likely, this double-use of a function pointer, or a non-PIE
> executable?

Can you try hjl/no-plt branch in GCC git mirror with -fno-plt?
I got

[hjl@gnu-6 pr18458]$ make
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt   -c -o
main.o main.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt -fpic
-c -o a.o a.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt
-Wl,-z,now -shared -o a.so a.o
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt -fpic
-c -o b.o b.c
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt
-Wl,-z,now -shared -o b.so b.o a.so
/export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/gcc/build-x86_64-linux/gcc -Wl,-rpath=. -Wl,-z,now
-o main main.o a.so b.so
./main
PASS
[hjl@gnu-6 pr18458]$ readelf -r main

Relocation section '.rela.dyn' at offset 0x4b0 contains 4 entries:
  Offset  Info   Type   Sym. ValueSym. Name + Addend
00600a20  00020006 R_X86_64_GLOB_DAT  b + 0
00600a28  00050006 R_X86_64_GLOB_DAT 
__libc_start_main@GLIBC_2.2.5 + 0
00600a30  00060006 R_X86_64_GLOB_DAT  __gmon_start__ + 0
00600a38  00080006 R_X86_64_GLOB_DAT  a + 0
[hjl@gnu-6 pr18458]$ gdb main
GNU gdb (GDB) Fedora 7.7.1-21.fc20
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from main...done.
(gdb) r
Starting program: /export/home/hjl/bugs/binutils/pr18458/main
PASS
[Inferior 1 (process 10663) exited normally]
Missing separate debuginfos, use: debuginfo-install glibc-2.18-19.2.fc20.x86_64
(gdb) b b
Breakpoint 1 at 0x77bf75f0: file b.c, line 5.
(gdb) r
Starting program: /export/home/hjl/bugs/binutils/pr18458/main

Breakpoint 1, b () at b.c:5
5  a();
(gdb) si
a () at a.c:5
5  printf("PASS\n");
(gdb)


-- 
H.J.


Re: Relocations to use when eliding plts

2015-05-29 Thread H.J. Lu
On Fri, May 29, 2015 at 10:59 AM, H.J. Lu  wrote:
> On Fri, May 29, 2015 at 8:38 AM, Richard Henderson  wrote:
>> On 05/28/2015 01:36 PM, Rich Felker wrote:
>>> On Thu, May 28, 2015 at 09:40:57PM +0200, Jakub Jelinek wrote:
 On Thu, May 28, 2015 at 03:29:02PM -0400, Rich Felker wrote:
>> You're not missing anything.  But do you want the performance of a
>> library to depend on how the main executable is compiled?
>
> Not directly. But I'd rather be in that situation than have
> pessimizations in library codegen to avoid it. I'm worried about cases
> where code both loads the address of a function and calls it, such as
> this (stupid) example:
>
> a((void *)a);

 That can be handled by using just one GOT slot, the non-.got.plt one;
 only if there are only relocations that guarantee that address equality is
 not important it would use the faster (*_JUMP_SLOT?) relocations.
>>>
>>> How far would this extend, e.g. in the case of LTO or compiling the
>>> whole library at once?
>>
>> It depends on how difficult that becomes, I suppose.  It's certainly 
>> something
>> that we can look for during LTO.
>>
>> I did in fact mention this exact point in the original message:
>>
>>> This does leave open other optimization questions, mostly around weak
>>> functions.  Consider constructs like
>>>
>>>   if (foo) foo();
>>>
>>> Do we, within the compiler, try to CSE GOTPCREL and GOTPLTPCREL, accepting 
>>> the
>>> possibility (not certainty) of jump-to-jump but definitely avoiding a 
>>> separate
>>> load insn and the latency implied by that?
>>
>> As a last resort the two can always be unified at static link time, so that
>> only one got slot is created, and only one runtime relocation exists.  At 
>> which
>> point we'd still have two loads in the insn stream.  But barring preemption,
>> the second load will be from cache and cost a single cycle.
>>
>> So which is less likely, this double-use of a function pointer, or a non-PIE
>> executable?
>
> Can you try hjl/no-plt branch in GCC git mirror with -fno-plt?
> I got
>
> [hjl@gnu-6 pr18458]$ make
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt   -c -o
> main.o main.c
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt -fpic
> -c -o a.o a.c
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt
> -Wl,-z,now -shared -o a.so a.o
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt -fpic
> -c -o b.o b.c
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -O2 -g -fno-plt
> -Wl,-z,now -shared -o b.so b.o a.so
> /export/build/gnu/gcc/build-x86_64-linux/gcc/xgcc
> -B/export/build/gnu/gcc/build-x86_64-linux/gcc -Wl,-rpath=. -Wl,-z,now
> -o main main.o a.so b.so
> ./main
> PASS
> [hjl@gnu-6 pr18458]$ readelf -r main
>
> Relocation section '.rela.dyn' at offset 0x4b0 contains 4 entries:
>   Offset  Info   Type   Sym. ValueSym. Name + 
> Addend
> 00600a20  00020006 R_X86_64_GLOB_DAT  b + 0
> 00600a28  00050006 R_X86_64_GLOB_DAT 
> __libc_start_main@GLIBC_2.2.5 + 0
> 00600a30  00060006 R_X86_64_GLOB_DAT  __gmon_start__ 
> + 0
> 00600a38  00080006 R_X86_64_GLOB_DAT  a + 0
> [hjl@gnu-6 pr18458]$ gdb main
> GNU gdb (GDB) Fedora 7.7.1-21.fc20
> Copyright (C) 2014 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-redhat-linux-gnu".
> Type "show configuration" for configuration details.
> For bug reporting instructions, please see:
> .
> Find the GDB manual and other documentation resources online at:
> .
> For help, type "help".
> Type "apropos word" to search for commands related to "word"...
> Reading symbols from main...done.
> (gdb) r
> Starting program: /export/home/hjl/bugs/binutils/pr18458/main
> PASS
> [Inferior 1 (process 10663) exited normally]
> Missing separate debuginfos, use: debuginfo-install 
> glibc-2.18-19.2.fc20.x86_64
> (gdb) b b
> Breakpoint 1 at 0x77bf75f0: file b.c, line 5.
> (gdb) r
> Starting program: /export/home/hjl/bugs/binutils/pr18458/main
>
> Breakpoint 1, b () at b.c:5
> 5  a();
> (gdb) si
> a () at a.c:5
> 5  printf("PASS\n");
> (gdb)
>

I built GCC with -fno-plt on hjl/no-plt branch with binutils users/hjl/relax
branch.  I got

[hjl@gnu-mic-2 gcc]$ objdump -dw cc1plus | grep addr32 | wc -l
204864
[hjl@gnu

s390: SImode pointers vs LR

2015-05-29 Thread DJ Delorie

In config/s390/s390.c we accept addresses that are SImode:

  if (!REG_P (base)
  || (GET_MODE (base) != SImode
  && GET_MODE (base) != Pmode))
return false;

However, there doesn't seem to be anything in the s390's opcodes that
masks the top half of address registers in 64-bit mode, the SImode
convention seems to just be a convention for addresses in the first
4Gb.

So... what happens if gcc uses a subreg to load the lower half of a
register (via LR), leaving the upper half with random bits in it, then
uses that register as an address?  I could see no code that checked
for this, and I have a fairly large and ungainly test case that says
it breaks :-(

My local solution was to just disallow "SImode" as an address in
s390_decompose_address, which forces gcc to do an explicit SI->DI
conversion to clear the upper bits, and it seems to work, but I wonder
if it's the ideal solution...


Re: Better info for combine results in worse code generated

2015-05-29 Thread Alan Modra
On Fri, May 29, 2015 at 10:00:04AM -0500, Segher Boessenkool wrote:
> On Fri, May 29, 2015 at 11:20:08PM +0930, Alan Modra wrote:
> > On Fri, May 29, 2015 at 07:58:38AM -0500, Segher Boessenkool wrote:
> > > On Fri, May 29, 2015 at 12:41:20PM +0930, Alan Modra wrote:
> > > > +/* Describe how rtl operations on registers behave on this target when
> > > > +   operating on less than the entire register.  */
> > > > +#define EXTEND_OP(OP) \
> > > > +  (GET_MODE (OP) != SImode \
> > > > +   || !TARGET_POWERPC64\
> > > > +   ? UNKNOWN   \
> > > > +   : (GET_CODE (OP) == AND \
> > > > +  || GET_CODE (OP) == ZERO_EXTEND  \
> > > > +  || GET_CODE (OP) == ASHIFT   \
> > > > +  || GET_CODE (OP) == ROTATE   \
> > > > +  || GET_CODE (OP) == LSHIFTRT)\
> > > > +   ? ZERO_EXTEND   \
> > > > +   : (GET_CODE (OP) == SIGN_EXTEND \
> > > > +  || GET_CODE (OP) == ASHIFTRT)\
> > > > +   ? SIGN_EXTEND   \
> > > > +   : UNKNOWN)
> > > 
> > > I think this is too simplistic though.  For example, AND with -7 is not
> > > zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the low
> > > 32 bits of rA).
> > 
> > We take some pains in rs6000.md to ensure that the wrap-around case
> > for rlwinm does not occur for TARGET_POWERPC64.
> 
> I consider that a bug; it pessimises code.

At the time I added the checks for wrap-around, I recall that gcc
generated wrong code without the fix.

> > You'll find that an
> > SImode AND with any value is in fact zero extending.
> 
> int f(int x) { return x & 0xc000; }
> 
> is a counter-example with current trunk (it does a rldicr).

Huh, that does look like you've destroyed my claim about SImode AND.

-- 
Alan Modra
Australia Development Lab, IBM


Re: Identifying Chain of Recurrence

2015-05-29 Thread Bin.Cheng
On Fri, May 29, 2015 at 11:14 PM, Pritam Gharat
 wrote:
> I am writing a Simple IPA Pass which is inserted after ipa-pta. This
> pass identifies whether a chain of recurrence is built by gcc or not.
> I have used tree_is_chrec(t) to check if t represents chain of
> recurrence and function find_var_scev_info(bb, var)  to obtain the
> scev information generated (which comprises of chrec information).
> However, find_var_scev_info() is a static function and cannot be
> called from my plugin.
find_var_scev_info is for internal use only.   I think the proper
interface is simple_iv.  You may refer to other users of that
interface in GCC.
Also, please don't top-reply the message.

Thanks,
bin
>
> Is there any alternative way to get the chain of recurrence
> information? Or do we need to change the gcc source code by making the
> function find_var_scev_info() non-static and recompiling the source
> code?


>
> Thanks,
> Pritam Gharat
>
> On Fri, May 29, 2015 at 10:34 AM, Bin.Cheng  wrote:
>> On Fri, May 29, 2015 at 12:41 PM, Pritam Gharat
>>  wrote:
>>> GCC builds a chain of recurrence to capture a pattern in which an
>>> array is accessed in a loop. Is there any function which identifies
>>> that gcc has built a chain of recurrence? Is this information
>>> associated to the gimple assignment which accesses the array elements?
>>>
>> GCC analyzes evolution of scalar variables on SSA representation.
>> Each ssa var is treated as unique and can be analyzed.  If the address
>> expression itself is a ssa var, it can be analyzed by scev; otherwise,
>> the users of scev have to compute by themselves on the bases of other
>> scev vars.  For example, address of MEM[scev_var] can be analyzed by
>> scev; while address of MEM[invariant_base+scev_iv] is computed by
>> users.  Well, at least this is how IVOPT works.  You can refer to
>> top-file comment in tree-scalar-evolution.c for detailed information.
>> General routines for chain of recurrence is in file tree-chrec.c.
>>
>> Thanks,
>> bin
>>>
>>> Thanks,
>>> Pritam Gharat


Re: s390: SImode pointers vs LR

2015-05-29 Thread Jeff Law

On 05/29/2015 06:57 PM, DJ Delorie wrote:

In config/s390/s390.c we accept addresses that are SImode:

   if (!REG_P (base)
  || (GET_MODE (base) != SImode
  && GET_MODE (base) != Pmode))
return false;

However, there doesn't seem to be anything in the s390's opcodes that
masks the top half of address registers in 64-bit mode, the SImode
convention seems to just be a convention for addresses in the first
4Gb.

So... what happens if gcc uses a subreg to load the lower half of a
register (via LR), leaving the upper half with random bits in it, then
uses that register as an address?  I could see no code that checked
for this, and I have a fairly large and ungainly test case that says
it breaks :-(

My local solution was to just disallow "SImode" as an address in
s390_decompose_address, which forces gcc to do an explicit SI->DI
conversion to clear the upper bits, and it seems to work, but I wonder
if it's the ideal solution...

Looks wrong to me -- but the s390 maintainers ought to chime in.

jeff