Re: Function attribute((optimize(...))) ignored on inline functions?

2015-07-31 Thread Richard Biener
On Thu, Jul 30, 2015 at 6:40 PM, Matt Turner  wrote:
> I'd like to tell gcc that it's okay to inline functions (such as
> rintf(), to get the SSE4.1 roundss instruction) at particular call
> sights without compiling the entire source file or calling function
> with different CFLAGS.
>
> I attempted this by making inline wrapper functions annotated with
> attribute((optimize(...))), but it appears that the annotation does
> not apply to inline functions? Take for example, ex.c:
>
> #include 
>
> static inline float __attribute__((optimize("-fno-trapping-math")))
> rintf_wrapper_inline(float x)
> {
>return rintf(x);
> }
>
> float
> rintf_wrapper_inline_call(float x)
> {
>return rintf(x);
> }
>
> float __attribute__((optimize("-fno-trapping-math")))
> rintf_wrapper(float x)
> {
>return rintf(x);
> }
>
> % gcc -O2 -msse4.1 -c ex.c
> % objdump -d ex.o
>
> ex.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
>  :
>0: e9 00 00 00 00   jmpq   5 
>5: 66 66 2e 0f 1f 84 00 data32 nopw %cs:0x0(%rax,%rax,1)
>c: 00 00 00 00
>
> 0010 :
>   10: 66 0f 3a 0a c0 04 roundss $0x4,%xmm0,%xmm0
>   16: c3   retq
>
> whereas I expected that rintf_wrapper_inline_call would be the same as
> rintf_wrapper.
>
> I've read that per-function optimization is broken [1]. Is this still
> the case? Is there a way to accomplish what I want?

Not in this way.  Once rintf would be inlined the no-trapping-math flag
would be gone.

The only way is to use SSE intrinsics directly here or have the optimized
variant not inlined.

Richard.

> [1] https://gcc.gnu.org/ml/gcc/2012-07/msg00201.html


Re: making the new if-converter not mangle IR that is already vectorizer-friendly

2015-07-31 Thread Richard Biener
On Wed, Jul 29, 2015 at 6:15 PM, Abe  wrote:
> [Richard wrote:]
>>
>> Yes, the GIMPLE if-converter should strictly be a vectorization enabler.
>> If the "new" if-converter produces code that the vectorizer does not
>> handle (on the target targeted) then it has done a bad job.
>
>
> Understood [or at least I _think_ I did ;-)] and agreed.
>
> OTOH, there _are_ things that apparently the vectorizer
> _could_ do better, esp. handling scatter/gather.
> That improvement might be pending new vector
> instructions in the target ISA[s], though...
>
> ... YMMV.  ;-)
>
>
> [Richard wrote:]
>>
>> I think I fixed this bug.  It was because of an inverted test.
>> 2015-07-10  Richard Biener  
>>  PR tree-optimization/66823
>>  * tree-if-conv.c (memrefs_read_or_written_unconditionally):
>>Fix inverted predicate.
>
>
> I can confirm that as of that just after that commit, the bug is gone.
>
> If I understand correctly, the bug was in the analysis leading to
> the decision to convert or not, such that the fix makes the old
> [pre-scratchpad] converter decide "don`t convert".  Is that correct?

Yes.

> Would you like me to include the DejaGNUified form of the related new test
> case
> in the patch even though this bug no longer exists in the "old" converter?
> Maybe it`s nice to keep the test for catching possible future regressions.

Yes.

>
> [Richard wrote:]
>>
>> It was known to produce store data-races with
>
>> -ftree-loop-if-convert-stores.
>>
>> That is something the scratchpad use avoids
>> (at the expense of a lot(?) of vectorization).
>
>
> TTBOMK, that "expense" is temporary, pending fixes/improvement
> to the new converter.  IOW, I don`t think any of the relevant
> regressions are as a result of "essential complexity".

I don't see how you can fix this without scatter support in the CPU.

> [Richard wrote:]
>>
>> First of all to fix regressions on the load if-conversion side
>
>> you should stop using a scratchpad for loads (do you have one?
>> I seem to remember you do from the design description).
>
> Yes, we do have one.
>
> I respectfully disagree with the suggestion to remove all scratchpad use for
> loads.
> That use is what allows us to if-convert -- and therefor vectorize --
> expressions that
> may result in invalid pointer dereferences for "condition is false"
> iterations/columns.
> AFAIK the old converter -- due to lack of a scratchpad -- must give up and
> not
> convert such cases, thus resulting in code of that kind not getting
> vectorized.

Correct.  But then you should restrict the if-conversion to targets that
can do gather vectorization.

> As of now, the relevant "improvement" in/due-to the new converter is not
> always
> effective, i.e. it does not always result in a vectorization, due to the
> current implementation of the scratchpad mechanism being too generic and
> adding indirection that confuses the vectorizer.  We intend to fix this.

This should be fixed before the patch can go in.

>
> [Richard wrote:]
>>
>> Then you should IMHO keep the old store path
>
>> for --param allow-store-data-races=1.
>
> I have discussed this with Sebastian;
> he and I formed a plan to allow the following modes:
>
>   * no if conversion [note: %%]
>   * if-conversion of loadsonly, the old way
>   * if-conversion of   stores only, the old way
>   * if-conversion of loads and stores,  the old way
>   * if-conversion of loadsonly, with scratchpads
>   * if-conversion of   stores only, with scratchpads
>   * if-conversion of loads and stores,  with scratchpads
>
> "%%": to retain as the default in trunk GCC until we have
>   enough improvement, e.g. a cost model to decide whether
>   or not it`s good to if-convert a particular loop

There is the existing mechanism to if-convert only a copy
of the loop guarded with IFN_LOOP_VECOTRIZED so the
cost model is that of the vectorizer when it tries to vectorize
the if-converted copy.  If it doesn't vectorize that copy the
non-if-converted loop is preserved, otherwise it is deleted.
I've argued in the past that the GIMPLE if-converter should
always go that path (thus strictly be a vectorization enabler only).

> Once all the known regressions in the new converter have been
> fixed and the numbers [e.g. from SPEC] of loops vectorized
> looks good, I expect/hope that it will be possible to
> remove the old if converter in favor of the new one.
>
> I expect/hope that we will {enable {if conversion} by default
> when autovectorization is enabled} once we have a cost model that
> prevents {if conversion that is inadvisable due to it promoting
> from conditional to unconditional the evaluation of [pure]
> expression[s] that is/are too expensive, thereby making the
> conversion a "pessimization" instead of a true optimization}.

if-conversion is already on by default if vectorizing.  We only
do not introduce store-data-races by default.

>
> [Richard wrote:]
>>
>> I'd like to see a compariso

Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-07-31 Thread Alex Velenko

On 29/07/15 23:14, Jeff Law wrote:

On 07/28/2015 12:18 PM, Alex Velenko wrote:

On 21/04/15 06:27, Jeff Law wrote:

On 04/20/2015 01:09 AM, Shiva Chen wrote:

Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks.  This looks pretty good.

I expanded the comment for the new function a bit and renamed the
function in an effort to clarify its purpose.  From reviewing
can_replace_by, it seems it should have been handling this case, but
clearly wasn't due to implementation details.

I then bootstrapped and regression tested the patch on x86_64-linux-gnu
where it passed.  I also instrumented that compiler to see how often
this code triggers.  During a bootstrap it triggers a couple hundred
times (which is obviously a proxy for cross jumping improvements).  So
it's triggering regularly on x86_64, which is good.

I also verified that this fixes BZ64916 for an arm-non-eabi toolchain
configured with --with-arch=armv7.

Installed on the trunk.  No new testcase as it's covered by existing
tests.

Thanks,,
jeff



Hi,
I see this patch been committed in r56 on trunk. Is it okay to port
this to fsf-5?

It's not a regression, so backporting it would be generally frowned
upon.  If you feel strongly about it, you should ask Jakub, Joseph or
Richi (the release managers) for an exception to the general policy.

jeff


Hi Jakub,
Can this commit be ported to fsf-5? It fixed gcc.target/arm/pr43920-2.c
at the time, so I think it is a good idea to port. Please, see
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916
Kind regards,
Alex



Nathan Sidwell appointed nvptx maintainer

2015-07-31 Thread David Edelsohn
I am pleased to announce that the GCC Steering Committee has
appointed Nathan Sidwell as nvptx maintainer.

Please join me in congratulating Nathan on his new role.
Nathan, please update your listing in the MAINTAINERS file.

Happy hacking!
David



Re: Nathan Sidwell appointed nvptx maintainer

2015-07-31 Thread Nathan Sidwell

On 07/31/15 13:37, David Edelsohn wrote:

I am pleased to announce that the GCC Steering Committee has
appointed Nathan Sidwell as nvptx maintainer.

Please join me in congratulating Nathan on his new role.
Nathan, please update your listing in the MAINTAINERS file.


thanks!

nathan



Re: making the new if-converter not mangle IR that is already vectorizer-friendly

2015-07-31 Thread Abe

[Abe wrote:]

TTBOMK, that "expense" is temporary, pending fixes/improvement
to the new converter.  IOW, I don`t think any of the relevant
regressions are as a result of "essential complexity".


[Richard wrote:]

I don't see how you can fix this without scatter support in the CPU.


Well, for some test cases, I can see in the GIMPLE dumps that
a/the scratchpad wasn`t needed here and wasn`t needed there.
IOW, the converter that "knows how" to indirect via scratchpads
is doing so _always_ rather than only when it`s truly needed.
This is causing additional indirection which the vectorizer
rejects.  Making the new converter "smarter" should fix this.

Overall, however, I see your point and agree with it.  Gather support
is needed for the loads and scatter for the stores, in the general case.


[Richard wrote:]

But then you should restrict the if-conversion
to targets that can do gather vectorization.


I think adding this requirement -- to generalize, _two_ req.s:
_gather_ support for _load_ if conversion with [a] scratchpad[s],
and scatter support for _store_ if conversion with [a] scratchpad[s] --
is a reasonable and probably wise goal.  Doing so should eventually
enable us to conclude automatically "it is safe to enable this kind of
if conversion for/on this target" and enable the relevant transformations
automatically when given e.g. "-O3" and autovectorization.


[Abe wrote]

As of now, the relevant "improvement" in/due-to the new
converter is not always effective [...] We intend to fix this.


[Richard wrote:]

This should be fixed before the patch can go in.


The current plan is to add new switches that control which [if any]
if converter is active, and to have the old one _and_ the new one
be included in GCC trunk for the time being.  Then I/we can have
more eyes on the new code, helping with bugs etc., without having
any impact at all on code that only either is compiled with
the old flags [which shall retain their old semantics
for now] or with no relevant flags whatsoever.

Is that plan OK?


[Richard wrote:]

There is the existing mechanism to if-convert only a copy
of the loop guarded with IFN_LOOP_VECOTRIZED so the
cost model is that of the vectorizer when it tries to vectorize
the if-converted copy.  If it doesn't vectorize that copy the
non-if-converted loop is preserved, otherwise it is deleted.


I think I found the code to which you are referring here,
and I tried to use it [so as to not reduplicate efforts],
but my first effort at doing so was unsuccessful AFAIK.
Trying again -- this time with help from Jakub,
I hope -- is on my work agenda.


[Richard wrote:]

I've argued in the past that the GIMPLE if-converter should
always go that path (thus strictly be a vectorization enabler only).


If I understand correctly what you mean by that, then I agree.



if-conversion is already on by default if vectorizing.

> We only do not introduce store-data-races by default.

Thanks.

Do you know if the e.g. GCC 5.1 [and 5.2, if bug fix not included]
release[s] were "safe" even without your recent bug fix [July 10]
for code _not_ compiled with "-ftree-loop-if-convert-stores"?
I ask b/c the test case I wrote which was able to cause GCC to
generate crashing code [for a correct-under-low-optimization test
case] intentionally had a crash due to dereferencing a null pointer
in a _load_.  IIRC, getting GCC to generate that bad code required
"-ftree-loop-if-convert-stores" despite it being a faulty _load_.

FYI: I have made some progress on the SPEC testing, but I do not
yet have the work at a point where I`m ready to report numbers.


Regards,

Abe


Bin Cheng as Loop Induction Variable Optimizations maintainer

2015-07-31 Thread Jeff Law
I am pleased to announce that the GCC Steering Committee has appointed 
Bin Cheng as the IVopts maintainer.


Please join me in congratulating Bin on his new role.

Bin, please update your entry in the MAINTAINERS file.  I also believe 
you have some patches to self-approve :-)


Thanks,

Jeff