[GSoC] the last code review

2014-08-18 Thread Roman Gareev
Dear gcc contributors,

The removing of CLooG library installation dependency is almost
finished. The  code review of these patches
(https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01564.html) is the only
thing, which prevents it. Could you please review them? My mentor’s
already accepted them, but we still still need a non-graphite reviewer
oking the changes
(https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01655.html). I shall
not be able to commit this patch after “18 August: 19:00 UTC” because
of GSoC’s 'pencils down'. We would be very glad for your comments.

-- 
Cheers, Roman Gareev.


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-18 Thread Richard Earnshaw
On 14/08/14 09:45, Kyrill Tkachov wrote:
> 
> On 13/08/14 18:32, Segher Boessenkool wrote:
>> On Wed, Aug 13, 2014 at 03:57:31PM +0100, Richard Earnshaw wrote:
>>> The problem with the frankenmonster patterns is that they tend to
>>> proliferate into the machine description, and before you know where you
>>> are the back-end is full of them.
>>>
>>> Furthermore, they are very sensitive to the greedy first-match nature of
>>> combine: a better, later, combination is missed because a less good,
>>> earlier, optimization matched.  If the first insn in the sequence is
>>> merged into an earlier instruction then you can end up with a junk
>>> sequence that completely fails to simplify.  That ends up with
>>> super-frankenmonster patterns to deal with all the subcases and the
>>> problems grow exponentially from there.
>> Right.  Of course, combine should be fixed, yadda yadda.
>>
>>> I really do think that the best solution would be to try and catch this
>>> during expand if possible and generate the right pattern from the start;
>>> then you don't risk combine failing to come to the rescue after several
>>> intermediate transformations have taken place.
>> I think ssa-phiopt should simply not do this obfuscation at all.  Without
>> it, RTL ifcvt picks it up just fine on targets with conditional assignment
>> instructions.  I agree on targets without expand should do a better job
>> (also for more generic conditional assignment).
> 
> That particular transformation was added to tree-ssa-phiopt.c for PR 
> 45685, the problem it was trying to solve was a missed vectorisation 
> opportunity and transforming it made it into straightline code that was 
> more amenable to vectorisation, that's why I'm somewhat reluctant to 
> completely disable it.
> 
> Hmm... I noticed in the midend we guard some optimisations on 
> HAVE_conditional_move. Maybe we can guard this one on something like 
> !HAVE_conditional_negation ?
> 

Can't we just guard it on HAVE_conditional_move?  With such an
instruction expand would then generate

t1 = -a
r =  ? b : t1

and combine will do the rest.

R.

> Kyrill
> 
>>
>> Instruction selection belongs in RTL land.
>>
>>
>> Segher
>>
> 
> 




Re: Is escaping of a temp variable valid?

2014-08-18 Thread Joey Ye
On Fri, Aug 15, 2014 at 6:33 PM, Richard Biener
 wrote:
> On Fri, Aug 15, 2014 at 10:45 AM, Joey Ye  wrote:
>> Running into an unexpected result with GCC with following case, but
>> not sure if it is a valid C++ case.
>>
>> #define nullptr 0
>> enum nonetype { none };
>>
>> template
>> class class_zoo {
>>   public:
>> const T *data;
>> int length;
>>
>> class_zoo (nonetype) : data (nullptr), length (0) {}
>> class_zoo (const T &e) : data (&e), length (1) {}
>
> Capturing a const referece via a pointer is error-prone as for
> example literal constants class_zoo zoo(0)
> have associated objects that live only throughout the function
> call.
Thanks for confirming this. But do you imply capturing a non-const
reference via a pointer is safe, which I would assume it unsafe
either?

- Joey
>
> So clearly your testcase is invalid.
>
> Richard.


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-18 Thread Kyrill Tkachov


On 18/08/14 10:19, Richard Earnshaw wrote:

On 14/08/14 09:45, Kyrill Tkachov wrote:

On 13/08/14 18:32, Segher Boessenkool wrote:

On Wed, Aug 13, 2014 at 03:57:31PM +0100, Richard Earnshaw wrote:

The problem with the frankenmonster patterns is that they tend to
proliferate into the machine description, and before you know where you
are the back-end is full of them.

Furthermore, they are very sensitive to the greedy first-match nature of
combine: a better, later, combination is missed because a less good,
earlier, optimization matched.  If the first insn in the sequence is
merged into an earlier instruction then you can end up with a junk
sequence that completely fails to simplify.  That ends up with
super-frankenmonster patterns to deal with all the subcases and the
problems grow exponentially from there.

Right.  Of course, combine should be fixed, yadda yadda.


I really do think that the best solution would be to try and catch this
during expand if possible and generate the right pattern from the start;
then you don't risk combine failing to come to the rescue after several
intermediate transformations have taken place.

I think ssa-phiopt should simply not do this obfuscation at all.  Without
it, RTL ifcvt picks it up just fine on targets with conditional assignment
instructions.  I agree on targets without expand should do a better job
(also for more generic conditional assignment).

That particular transformation was added to tree-ssa-phiopt.c for PR
45685, the problem it was trying to solve was a missed vectorisation
opportunity and transforming it made it into straightline code that was
more amenable to vectorisation, that's why I'm somewhat reluctant to
completely disable it.

Hmm... I noticed in the midend we guard some optimisations on
HAVE_conditional_move. Maybe we can guard this one on something like
!HAVE_conditional_negation ?


Can't we just guard it on HAVE_conditional_move?  With such an
instruction expand would then generate

t1 = -a
r =  ? b : t1

and combine will do the rest.


That was my first idea, but then it disables this transformation for 
x86, for which it was added

specifically to solve PR45685...

Kyrill


R.


Kyrill


Instruction selection belongs in RTL land.


Segher








Re: RFD: selective linking of floating point support for *printf / *scanf

2014-08-18 Thread Joey Ye
Joern, there is https://sourceware.org/ml/newlib/2014/msg00166.html,
which is already in newlib mainline. I think it solves the same issue
in a slight different approach.

Does it work for you?

Thanks,
Joey

On Thu, Aug 14, 2014 at 4:52 PM, Joern Rennecke
 wrote:
> For embedded targets with small memories and static linking, the size of
> functions like *printf and their dependencies is painful, particularily for
> targets that need software floating point.
>
> avr-libc has long had a printf / scanf implementation that by default does not
> include floating point support.  There's a library that can be liked to 
> provide
> the floating-point enabled functions, but the required functions have
> to be pulled
> in manually with -Wl,-u if they are otherwise only referenced from libc, lest
> these symbols got resolved with the integer-only implementations from
> libc itself.
> All in all, a rather unsatisfying state of affairs when trying to run the
> gcc regression test suite.
>
> Newlib also has an integer-only printf implementation, but in this case,
> the default is the other way round - you have to use functions with 
> nonstandard
> names to use the integer-only implementations.  And a half-hearted approach to
> use this can easily end up with linking in both the integer-only version and 
> the
> floating-point enabled one, resulting in increased executable size instead of
> a saving.
>
> I think we can do better with integrated compiler/linker support.
> Trying to do a perfect job i of course impossible because of Rice's theorem,
> but it doesn't have to be perfect to be useful.
> Just looking statically at each *printf statement, we can look at the format
> strings and/or the passed arguments.
> Floating point arguments are easier to check for by the compiler than parsing
> the format string.  There is already code that parses the format strings for 
> the
> purpose of warnings, but it would be a somewhat intrusive change to add this
> functionality there, and the information is not available where a variable
> format is used anyway.
> In a standards-conformant application, floating point formats can only be used
> with floating point arguments, so checking for the latter seems most 
> effective.
>
> So my idea is to make the compile emit special calls when there are no 
> floating
> point arguments.  A library that provides the floating point enabled
> *printf/*scanf
> precedes libc in link order.
> Libc contains the integer-only implementations of *scanf/*printf, in two 
> parts:
> entry points with the special function name, which in the same object file
> also contain a reference to the ordinary function name, and another object 
> file
> with the ordinary symbol and the integer-only implementation.
> Thus, if any application translation unit has pulled in a floating-point 
> enabled
> implementation, this is the one that'll be used.  Otherwise, the integer-only
> one will be used.
> Use of special sections and alphasorting of these in the linker script
> ensures that the integer-only entry points appear in the right place at
> the start of the chosen implementation.
> If vfprintf is used
>
> I've implemented this for AVR with these commits:
> https://github.com/embecosm/avr-gcc/commit/3b3bfe33fe29b6d29d8fb96e5d57ee025adf7af0
> https://github.com/embecosm/avr-libc/commit/c55eba74838635613c8b80d86a85ed605a79d337
> https://github.com/embecosm/avr-binutils-gdb/commit/72b3a1ea3659577198838a7149c6882a079da403
>
> Although it could use some more testing, and thought how to best
> introduce the change as to avoid getting broken toolchains when components
> are out-of-sync.
>
> Now Joerg Wunsch suggested we might want to facto out more pieces, like the
> long long support.  This quickly leads to a combinatorial explosion.
> If we want to support a more modular *printf / *scanf, than maybe a different
> approach is warranted.
> Say, if we could give a symbol and section attribute and/or pragma to 
> individual
> case labels of a switch, and put the different pieces into separate object
> files (maybe with a bit of objcopy massaging).
> The symbols references to trigger the inclusion of the case objects could be
> generated by the gcc backend by processing suitably annotated function calls.
> E.g. we might put something into CALL_FUNCTION_USAGE, or play with
> TARGET_ENCODE_SECTION_INFO.


Re: LTO inhibiting dwarf lexical blocks output

2014-08-18 Thread Richard Biener
On Fri, Aug 15, 2014 at 9:59 PM, Aldy Hernandez  wrote:
> So... I've been getting my feet wet with LTO and debugging and I noticed a
> seemingly unrelated yet annoying problem.  On x86-64,
> gcc.dg/guality/pr48437.c fails when run in LTO mode.
>
> I've compared the dwarf output with and without LTO, and I noticed that the
> DW_TAG_lexical_block is missing from the LTO case.
>
> The relevant bit is that without LTO, we have a DW_TAG_lexical_block for
> lines 3-6, which is not present in the LTO case:
>
> 1  volatile int i;
> 2  for (i = 3; i < 7; ++i)
> 3{
> 4  extern int i;
> 5  asm volatile (NOP : : : "memory");
> 6}
>
> The reason this tag is not generated is because gen_block_die() unsets
> must_output_die because there are no BLOCK_VARS associated with the BLOCK.
>
> must_output_die = ((BLOCK_VARS (stmt) != NULL
> || BLOCK_NUM_NONLOCALIZED_VARS (stmt))
>&& (TREE_USED (stmt)
>|| TREE_ASM_WRITTEN (stmt)
>|| BLOCK_ABSTRACT (stmt)));
>
> And there is no block var because the streamer purposely avoided streaming
> an extern block var:
>
>   /* We avoid outputting external vars or functions by reference
>  to the global decls section as we do not want to have them
>  enter decl merging.  This is, of course, only for the call
>  for streaming BLOCK_VARS, but other callers are safe.  */
>   /* ???  FIXME wrt SCC streaming.  Drop these for now.  */
>   if (VAR_OR_FUNCTION_DECL_P (t)
>   && DECL_EXTERNAL (t))
> ; /* stream_write_tree_shallow_non_ref (ob, t, ref_p); */
>   else
> stream_write_tree (ob, t, ref_p);
>
> I naively tried to uncomment the offending line, but that brought about
> other problems in DFS assertions.
>
> I wasn't on the hunt for this, but I'm now curious.  Can you (or anyone
> else) pontificate on this? Do we avoid streaming extern block variables by
> design?

Apart from other comments about emitting DIEs early the commented
code above tried to "force" to not put 't' into the global decls table
but retain it as local tree to avoid (as Honza says) merging it with
other entities and thus screwing up DECL_CHAIN.

With the SCC way this didn't work out (you can't simply do
stream_write_tree_shallow_non_ref here for reasons I don't remember).
The ??? comment means I've wanted to come back to this... ;)
"shallow non-ref" means treat 't' as !ref but not the trees it references.

Note that the biggest "hack" wrt lexical scopes is that we don't stream
any abstract origins

/* Write all pointer fields in the TS_BLOCK structure of EXPR to output
   block OB.  If REF_P is true, write a reference to EXPR's pointer
   fields.  */

static void
write_ts_block_tree_pointers (struct output_block *ob, tree expr, bool ref_p)
{
  streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);

  stream_write_tree (ob, BLOCK_SUPERCONTEXT (expr), ref_p);

  /* Stream BLOCK_ABSTRACT_ORIGIN for the limited cases we can handle - those
 that represent inlined function scopes.
 For the rest them on the floor instead of ICEing in dwarf2out.c.  */
  if (inlined_function_outer_scope_p (expr))
{
  tree ultimate_origin = block_ultimate_origin (expr);
  stream_write_tree (ob, ultimate_origin, ref_p);
}
  else
stream_write_tree (ob, NULL_TREE, ref_p);

which makes early inlined functions behave differently (?) in the
debugger with LTO than without (you still get blocks, but they
do not refer to the out-of-line copy by reference but get fully
re-created with DIEs for each inline instance).  But maybe the
abstract origins are only a dwarf-size optimization here.

Richard.

> Thanks.
> Aldy


Re: Is escaping of a temp variable valid?

2014-08-18 Thread Richard Biener
On Mon, Aug 18, 2014 at 11:39 AM, Joey Ye  wrote:
> On Fri, Aug 15, 2014 at 6:33 PM, Richard Biener
>  wrote:
>> On Fri, Aug 15, 2014 at 10:45 AM, Joey Ye  wrote:
>>> Running into an unexpected result with GCC with following case, but
>>> not sure if it is a valid C++ case.
>>>
>>> #define nullptr 0
>>> enum nonetype { none };
>>>
>>> template
>>> class class_zoo {
>>>   public:
>>> const T *data;
>>> int length;
>>>
>>> class_zoo (nonetype) : data (nullptr), length (0) {}
>>> class_zoo (const T &e) : data (&e), length (1) {}
>>
>> Capturing a const referece via a pointer is error-prone as for
>> example literal constants class_zoo zoo(0)
>> have associated objects that live only throughout the function
>> call.
> Thanks for confirming this. But do you imply capturing a non-const
> reference via a pointer is safe, which I would assume it unsafe
> either?

Well, "more" safe at least ;)

Richard.

> - Joey
>>
>> So clearly your testcase is invalid.
>>
>> Richard.


Re: [GSoC] constant-folding pattern not fired

2014-08-18 Thread Richard Biener
On Sun, Aug 17, 2014 at 9:50 PM, Prathamesh Kulkarni
 wrote:
> Hi,
>Apparently this pattern is not getting fired (even in isolation).
>
> /* x % 1 -> 0 */
> (simplify
>   (trunc_mod @0 integer_onep)
>   { build_zero_cst (type); })
>
> I tried with this test-case:
> int f(int x)
> {
>   int t1 = 1;
>   int t2 = x % t1;
>   return t2;
> }
>
> I get the following output in ccp1 dump file:
> http://pastebin.com/B6HjptkC

It shows (and I also see that):

Visiting statement:
t2_3 = x_2(D) % t1_1;
which is likely CONSTANT
Match-and-simplified x_2(D) % t1_1 to 0
Lattice value changed to CONSTANT 0.  Adding SSA edges to worklist.

so it works as expected?  Or what do you miss?

Note that the function is simplified all the way to return 0; and
intermediate statemens are removed.

Thanks,
Richard.

> and the following output is generated in gimple-match.c:
> http://pastebin.com/tmi0cpxv
>
> I guess the generated code appears to be correct for the above pattern,
> so we are not doing anything wrong in genmatch ?
>
> Thanks,
> Prathamesh


Re: What are open tasks about GIMPLE loop optimizations?

2014-08-18 Thread Ilya Palachev

Dear Evgeniya,

Maybe missed optimizations in vectorizer will be interesting to you

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947

It has a lot of open tasks that can highly influence the performance, 
but many of them have not been solved for long years.
For now gcc vectorizer works in some number of patterns, but there are a 
lot of ones that are implemented in icc or llvm and not implemented in gcc.


Best regards.
Ilya



*From:* Evgeniya Maenkova 
*Sent:* Friday, August 15, 2014 4:45PM
*To:* gcc@gcc.gnu.org
*Subject:* What are open tasks about GIMPLE loop optimizations?

Dear GCC Developers,

Nobody answers my question below, so perhaps something wrong with my email :)

So let me clarify in more details what I’m asking about.

I’ve made some very very very basic evaluation of GCC code ([1]) and
started to think about concrete task to contribute to GCC (language
and machine optimization would be interesting to me, in particular,
loop optimization).

I cannot invent this task myself because my knowledge of GCC and
compilers in general is not enough for this.  And even if I could
think out something perhaps GCC developers have their own
understanding of the world.

Then I have looked at GCC site to answer my question. What I could
find about loop optimizations is information from GNU Tools Cauldron
2012, “Status of High level Loop Optimizations”.  So perhaps this is
out-of-date in 2014.

Unfortunately, I have not enough time, so I would not commit to manage
a task which is on the critical task.  (Are you interested only in
full time developers?)

So it would be great if you could advise some tasks, which could be

useful to gcc in some future, however nobody will miss if I cannot do

it (as you had not time/people for these tasks anyway :) ).



What do you think?



Thanks,



Evgeniya



[1] Used GDB to look inside GCC. Wrote some notes in my blog which
could be useful to other newbies
(http://perfstories.wordpress.com/2013/11/17/compiler-internals-introduction-to-a-new-post-series/).




-- Forwarded message --
From: Evgeniya Maenkova 
Date: Fri, Aug 8, 2014 at 6:50 PM
Subject: GIMPLE optimization passes: any Road Map?
To: gcc@gcc.gnu.org


Dear GCC Developers!

Could you please clarify about GIMPLE loop passes?

Where could I find the latest changes in these passes? Is it trunk or
some of the branches? May I look at some RoadMap on GIMPLE loop
optimizations?

Actually, I ask these questions because I would like to contribute to
GCC. GIMPLE optimizations would be interesting to me (in particular,
loop optimizations).

However, I’m newbie at GCC and have not enough time, so would not
commit to manage a task which is on the critical path.

So it would be great if you could advise some tasks, which could be
useful to gcc in some future, however nobody will miss if I can’t do
it (as you had not time/people for these tasks anyway :) ).

Thank you!

Evgeniya





Re: What are open tasks about GIMPLE loop optimizations?

2014-08-18 Thread Manuel López-Ibáñez
> *From:* Evgeniya Maenkova 
> *Sent:* Friday, August 15, 2014 4:45PM
> *To:* gcc@gcc.gnu.org
> *Subject:* What are open tasks about GIMPLE loop optimizations?
>
> Dear GCC Developers,
>
> Nobody answers my question below, so perhaps something wrong with my email
> :)
>

Starting as a newbie in GCC requires a lot of self-motivation. The
general answer to your question is to try. If something is wrong or
not what the GCC devs want, don't worry they will tell you.

See also the general advice here on how to interact with the GCC
community: https://gcc.gnu.org/wiki/GCC_Research

I would say your email falls into: "too long", "too general", "not
specific question", "not aimed at anyone in particular". :-)

For newbie tasks, the Summer of Code page has many ideas, some of them
with specific contact persons: https://gcc.gnu.org/wiki/SummerOfCode

See also the links under "Getting Started with GCC Development" at
https://gcc.gnu.org/wiki/

And also https://gcc.gnu.org/wiki/ImprovementProjects

I would suggest to start fixing bug in the areas that interest you. If
you search in GCC's bugzilla, there must be plenty of bugs about
anything you can imagine. Even if you don't fix it, analyzing it would
be already helpful for you (to learn how to debug GCC, modify it and
rebuild) and for us to save us time.

Once you get enough knowledge, you will also get ideas of what
features are actually missing or could be improved.

Cheers,

Manuel.


Re: [GSoC] constant-folding pattern not fired

2014-08-18 Thread Prathamesh Kulkarni
On Mon, Aug 18, 2014 at 4:37 PM, Richard Biener
 wrote:
> On Sun, Aug 17, 2014 at 9:50 PM, Prathamesh Kulkarni
>  wrote:
>> Hi,
>>Apparently this pattern is not getting fired (even in isolation).
>>
>> /* x % 1 -> 0 */
>> (simplify
>>   (trunc_mod @0 integer_onep)
>>   { build_zero_cst (type); })
>>
>> I tried with this test-case:
>> int f(int x)
>> {
>>   int t1 = 1;
>>   int t2 = x % t1;
>>   return t2;
>> }
>>
>> I get the following output in ccp1 dump file:
>> http://pastebin.com/B6HjptkC
>
> It shows (and I also see that):
>
> Visiting statement:
> t2_3 = x_2(D) % t1_1;
> which is likely CONSTANT
> Match-and-simplified x_2(D) % t1_1 to 0
> Lattice value changed to CONSTANT 0.  Adding SSA edges to worklist.
>
> so it works as expected?  Or what do you miss?
>
> Note that the function is simplified all the way to return 0; and
> intermediate statemens are removed.
oops, I got mixed up -:)
sorry for the noise.

This patch adds the test-case for that pattern.
testsuite/
  * match-constant-folding.c: Add new test-case.

Thanks,
Prathamesh

>
> Thanks,
> Richard.
>
>> and the following output is generated in gimple-match.c:
>> http://pastebin.com/tmi0cpxv
>>
>> I guess the generated code appears to be correct for the above pattern,
>> so we are not doing anything wrong in genmatch ?
>>
>> Thanks,
>> Prathamesh
Index: match-constant-folding.c
===
--- match-constant-folding.c	(revision 214095)
+++ match-constant-folding.c	(working copy)
@@ -118,5 +118,13 @@ int c13(int x)
 }
 /* { dg-final { scan-tree-dump "Match-and-simplified x_\\d\+\\(D\\) \\^ t1_\\d\+ to x_\\d\+\\(D\\)" "ccp1" } } */
 
+/* x % 1 -> 0 */
+int c14(int x)
+{
+  int t1 = 1;
+  int c14_val = x % t1;
+  return c14_val;
+}
+/* { dg-final { scan-tree-dump "Match-and-simplified x_\\d\+\\(D\\) % t1_\\d\+ to 0" "ccp1" } } */
 
 /* { dg-final { cleanup-tree-dump "forwprop2" } } */


Re: What are open tasks about GIMPLE loop optimizations?

2014-08-18 Thread Manuel López-Ibáñez
The wiki also contains the following: https://gcc.gnu.org/wiki/LoopOptTasks

Probably very outdated, but updating it might be a helpful learning
experience. Don't be afraid to edit the wiki, we can always revert
your changes ;-)

Cheers,

Manuel.


On 18 August 2014 13:43, Manuel López-Ibáñez  wrote:
>> *From:* Evgeniya Maenkova 
>> *Sent:* Friday, August 15, 2014 4:45PM
>> *To:* gcc@gcc.gnu.org
>> *Subject:* What are open tasks about GIMPLE loop optimizations?
>>
>> Dear GCC Developers,
>>
>> Nobody answers my question below, so perhaps something wrong with my email
>> :)
>>
>
> Starting as a newbie in GCC requires a lot of self-motivation. The
> general answer to your question is to try. If something is wrong or
> not what the GCC devs want, don't worry they will tell you.
>
> See also the general advice here on how to interact with the GCC
> community: https://gcc.gnu.org/wiki/GCC_Research
>
> I would say your email falls into: "too long", "too general", "not
> specific question", "not aimed at anyone in particular". :-)
>
> For newbie tasks, the Summer of Code page has many ideas, some of them
> with specific contact persons: https://gcc.gnu.org/wiki/SummerOfCode
>
> See also the links under "Getting Started with GCC Development" at
> https://gcc.gnu.org/wiki/
>
> And also https://gcc.gnu.org/wiki/ImprovementProjects
>
> I would suggest to start fixing bug in the areas that interest you. If
> you search in GCC's bugzilla, there must be plenty of bugs about
> anything you can imagine. Even if you don't fix it, analyzing it would
> be already helpful for you (to learn how to debug GCC, modify it and
> rebuild) and for us to save us time.
>
> Once you get enough knowledge, you will also get ideas of what
> features are actually missing or could be improved.
>
> Cheers,
>
> Manuel.


Re: [gomp4] openacc kernels directive support

2014-08-18 Thread Tom de Vries

On 06-08-14 17:10, Tom de Vries wrote:

The place after build_ealias is early enough to be before the lto-stream
write/read. I don't see how we can do this earlier. Before ealias, there's no
alias info, and one of the loops fails to be recognized as parallel.
Furthermore, pass_ch, pass_ccp, pass_lim_aux and pass_parloops are written to
work on cfg/ssa code, which we don't have at omp_low/omp_exp time.



Slight correction: we do have cfg at omp_exp time.


We could insert a pass-group here that only deals with functions that have the
kernels directive, and do the auto-par thing in a pass_oacc_kernels (which
should share the majority of the infrastructure with the parloops pass):
...
   NEXT_PASS (pass_build_ealias);
   INSERT_PASSES_AFTER/WITHIN (passes_oacc_kernels)
  NEXT_PASS (pass_ch);
  NEXT_PASS (pass_ccp);
  NEXT_PASS (pass_lim_aux);
  NEXT_PASS (pass_oacc_par);
   POP_INSERT_PASSES ()
...

Any comments, ideas or suggestions ?


I've experimented with implementing this on top of gomp-4_0-branch, and I ran 
into PR46032.


PR46032 is about vectorization failure on a function split off by omp 
parallelization. The vectorization fails due to aliasing constraints in the 
split off function, which are not present in the original code.


In the gomp-4_0-branch, the code marked by the openacc kernels directive is 
split off during omp_expand. The generated code has the same additional aliasing 
constraints, and in pass_oacc_par the parallelization fails.


The PR46032 contains a tentative patch by Richard Biener, which applies cleanly 
on top of 4.6 (I haven't yet reached a level of understanding of 
tree-ssa-structalias.c to be able to resolve the conflict in 
intra_create_variable_infos when applying on 4.7). The tentative patch involves 
running ipa-pta, which is also a pass run after the point where we write out the 
lto stream. I'm not sure whether it makes sense to run the pta-ipa pass as part 
of the pass_oacc_kernels pass list.


I see three ways of continuing from here:
- take the tentative patch and make it work, including running pta-ipa during
  passes_oacc_kernels
- same, but try somehow to manage without running pta-ipa.
- try to postpone splitting of the function until the end of pass_oacc_par.

Some advice on how to continue from here would be *highly* appreciated. My hunch 
atm is to investigate the last option.


Thanks,
- Tom



Re: [GSoC] replacing op in c_expr

2014-08-18 Thread Richard Biener
On Sat, Aug 16, 2014 at 3:46 PM, Prathamesh Kulkarni
 wrote:
> On Mon, Aug 11, 2014 at 4:58 PM, Richard Biener
>  wrote:
>> On Sun, Aug 10, 2014 at 11:17 PM, Prathamesh Kulkarni
>>  wrote:
>>> On Mon, Aug 4, 2014 at 2:13 PM, Richard Biener
>>>  wrote:
 On Sun, Aug 3, 2014 at 6:58 PM, Prathamesh Kulkarni
  wrote:
> On Tue, Jul 29, 2014 at 4:29 PM, Richard Biener
>  wrote:
>> On Mon, Jul 28, 2014 at 10:02 PM, Prathamesh Kulkarni
>>  wrote:
>>> I am having few issues replacing op in c_expr.
>>> I thought of following possibilities:
>>>
>>> a) create a new vec vector new_code.
>>> for each token in code
>>>   {
>>> if token.type is not CPP_NAME
>>>   new_code.safe_push (token);
>>> else
>>>  {
>>> cpp_token new_token =
>>> ??? create new token of type CPP_NAME
>>>   with contents as name of operator ???
>>>  }
>>>   }
>>>
>>> I tried to go this way, but am stuck with creating a new token type.
>>> i started by:
>>> cpp_token new_token = token;  // get same attrs as token.
>>> CPP_HASHNODE (new_token.val.node.node)->ident.str = name of operator.
>>> CPP_HASHNODE (new_token.val.node.node)->ident.len = len of operator 
>>> name.
>>> name of operator is obtained from opers[i] in parse_for.
>>>
>>> however this does not work because I guess
>>>  new_token = token, shallow copies
>>> the token (default assignment operator, i didn't find an overloaded 
>>> version).
>>>
>>> b) create new struct c_expr_elem and use
>>> vec code, instead of vec code;
>>>
>>> sth like:
>>> struct c_expr_elem
>>> {
>>>enum c_expr_elem_type { ID, TOKEN };
>>>enum c_expr_elem_type type;
>>>
>>>union {
>>>  cpp_token token;
>>>  const char *id;
>>>};
>>> };
>>>
>>> while replacing op, compare token with op, and if it matches,
>>> create a new c_expr_elem with type = ID, and id = name of operator.
>>> This shall probably work, but shall require many changes to other parts
>>> since we change c_expr::code.
>>>
>>> I would like to hear any other suggestions.
>>
>> Together with the vector of tokens recorded at parse_c_expr time
>> record a vector of token mappings (op -> plus, op2 -> ...) and do
>> the replacement at code-generation time where we also special-case
>> captures.
>>
>> Yeah, it's a but unfortunate that c_expr parsing is done the way it
>> is done
> Thanks. I guess we would require a multi-map for this since there can
> be many operators
> (op -> [plus, minus], op2 -> [negate]) ?

 Well, it would be enough to attach the mapping to c_expr()s after the
 AST lowering when there is at most one?  Because obviously
 code-generation cannot know which to replace.

> Unfortunately, I somehow seem to have missed your response and ended up 
> with a
> hackish way of doing it, although it works. I will soon change that to
> use token mappings.
>
> I mostly followed b), except i made it sub-class of cpp_token, so the
> other code using c_expr::code
> (outline_c_expr, c_expr::gen_transform) did not require changes except
> for special-casing op.

 Indeed not too ugly.  Still at the point where you replace in the for()
 processing

 - operand *result_op = replace_id (s->result, user_id, 
 opers[i]);
 +
 + operand *result_op;
 + if (is_a (s->result))
 +   result_op = replace_op_in_c_expr (s->result, user_id, 
 opers[i]);
 + else
 +   result_op = replace_id (s->result, user_id, opers[i]);
 +
 +

 it should be "easy" to attach a replacemet vector/map to the c_expr
 and use that duing code-generation.

 Note that sub-expressions can also be c_exprs, thus

 (match-and-simplify
()
(plus { ... } @2))

 I don't think your patch covers that.  That is, you should add
 c_expr handing to replace_id instead.
>>> Thanks, this patch covers that case.
>>> For now, I have still kept the old way, since the change was one-liner.
>>> I will change it after I am done with conditional convert
>>
>> I'll wait for that - the patch introduces extra warnings which will break
>> bootstrap.
>>
> Hi,
> This patch replaces op in c_expr, by using vector in c_expr to record
>  mapping.
> Sorry for late response.
>
> I needed to clone c_expr, so added clone member function to operand hierarchy.
> Ideally it should be a pure member function (= 0) in operand, however
> for simplicity I have
> put gcc_unreachable (), since I only want it used for c_expr (not
> required so far for other classes).
> Is that okay for now ? Eventually I will implement clone for other classes...

Hmm, I wonder

Re: [GSoC] constant-folding pattern not fired

2014-08-18 Thread Richard Biener
On Mon, Aug 18, 2014 at 1:45 PM, Prathamesh Kulkarni
 wrote:
> On Mon, Aug 18, 2014 at 4:37 PM, Richard Biener
>  wrote:
>> On Sun, Aug 17, 2014 at 9:50 PM, Prathamesh Kulkarni
>>  wrote:
>>> Hi,
>>>Apparently this pattern is not getting fired (even in isolation).
>>>
>>> /* x % 1 -> 0 */
>>> (simplify
>>>   (trunc_mod @0 integer_onep)
>>>   { build_zero_cst (type); })
>>>
>>> I tried with this test-case:
>>> int f(int x)
>>> {
>>>   int t1 = 1;
>>>   int t2 = x % t1;
>>>   return t2;
>>> }
>>>
>>> I get the following output in ccp1 dump file:
>>> http://pastebin.com/B6HjptkC
>>
>> It shows (and I also see that):
>>
>> Visiting statement:
>> t2_3 = x_2(D) % t1_1;
>> which is likely CONSTANT
>> Match-and-simplified x_2(D) % t1_1 to 0
>> Lattice value changed to CONSTANT 0.  Adding SSA edges to worklist.
>>
>> so it works as expected?  Or what do you miss?
>>
>> Note that the function is simplified all the way to return 0; and
>> intermediate statemens are removed.
> oops, I got mixed up -:)
> sorry for the noise.
>
> This patch adds the test-case for that pattern.
> testsuite/
>   * match-constant-folding.c: Add new test-case.

Thanks - committed.

Richard.

> Thanks,
> Prathamesh
>
>>
>> Thanks,
>> Richard.
>>
>>> and the following output is generated in gimple-match.c:
>>> http://pastebin.com/tmi0cpxv
>>>
>>> I guess the generated code appears to be correct for the above pattern,
>>> so we are not doing anything wrong in genmatch ?
>>>
>>> Thanks,
>>> Prathamesh


Re: LTO inhibiting dwarf lexical blocks output

2014-08-18 Thread Richard Biener
On Mon, Aug 18, 2014 at 12:46 PM, Richard Biener
 wrote:
> On Fri, Aug 15, 2014 at 9:59 PM, Aldy Hernandez  wrote:
>> So... I've been getting my feet wet with LTO and debugging and I noticed a
>> seemingly unrelated yet annoying problem.  On x86-64,
>> gcc.dg/guality/pr48437.c fails when run in LTO mode.
>>
>> I've compared the dwarf output with and without LTO, and I noticed that the
>> DW_TAG_lexical_block is missing from the LTO case.
>>
>> The relevant bit is that without LTO, we have a DW_TAG_lexical_block for
>> lines 3-6, which is not present in the LTO case:
>>
>> 1  volatile int i;
>> 2  for (i = 3; i < 7; ++i)
>> 3{
>> 4  extern int i;
>> 5  asm volatile (NOP : : : "memory");
>> 6}
>>
>> The reason this tag is not generated is because gen_block_die() unsets
>> must_output_die because there are no BLOCK_VARS associated with the BLOCK.
>>
>> must_output_die = ((BLOCK_VARS (stmt) != NULL
>> || BLOCK_NUM_NONLOCALIZED_VARS (stmt))
>>&& (TREE_USED (stmt)
>>|| TREE_ASM_WRITTEN (stmt)
>>|| BLOCK_ABSTRACT (stmt)));
>>
>> And there is no block var because the streamer purposely avoided streaming
>> an extern block var:
>>
>>   /* We avoid outputting external vars or functions by reference
>>  to the global decls section as we do not want to have them
>>  enter decl merging.  This is, of course, only for the call
>>  for streaming BLOCK_VARS, but other callers are safe.  */
>>   /* ???  FIXME wrt SCC streaming.  Drop these for now.  */
>>   if (VAR_OR_FUNCTION_DECL_P (t)
>>   && DECL_EXTERNAL (t))
>> ; /* stream_write_tree_shallow_non_ref (ob, t, ref_p); */
>>   else
>> stream_write_tree (ob, t, ref_p);
>>
>> I naively tried to uncomment the offending line, but that brought about
>> other problems in DFS assertions.
>>
>> I wasn't on the hunt for this, but I'm now curious.  Can you (or anyone
>> else) pontificate on this? Do we avoid streaming extern block variables by
>> design?
>
> Apart from other comments about emitting DIEs early the commented
> code above tried to "force" to not put 't' into the global decls table
> but retain it as local tree to avoid (as Honza says) merging it with
> other entities and thus screwing up DECL_CHAIN.
>
> With the SCC way this didn't work out (you can't simply do
> stream_write_tree_shallow_non_ref here for reasons I don't remember).
> The ??? comment means I've wanted to come back to this... ;)
> "shallow non-ref" means treat 't' as !ref but not the trees it references.
>
> Note that the biggest "hack" wrt lexical scopes is that we don't stream
> any abstract origins
>
> /* Write all pointer fields in the TS_BLOCK structure of EXPR to output
>block OB.  If REF_P is true, write a reference to EXPR's pointer
>fields.  */
>
> static void
> write_ts_block_tree_pointers (struct output_block *ob, tree expr, bool ref_p)
> {
>   streamer_write_chain (ob, BLOCK_VARS (expr), ref_p);
>
>   stream_write_tree (ob, BLOCK_SUPERCONTEXT (expr), ref_p);
>
>   /* Stream BLOCK_ABSTRACT_ORIGIN for the limited cases we can handle - those
>  that represent inlined function scopes.
>  For the rest them on the floor instead of ICEing in dwarf2out.c.  */
>   if (inlined_function_outer_scope_p (expr))
> {
>   tree ultimate_origin = block_ultimate_origin (expr);
>   stream_write_tree (ob, ultimate_origin, ref_p);
> }
>   else
> stream_write_tree (ob, NULL_TREE, ref_p);
>
> which makes early inlined functions behave differently (?) in the
> debugger with LTO than without (you still get blocks, but they
> do not refer to the out-of-line copy by reference but get fully
> re-created with DIEs for each inline instance).  But maybe the
> abstract origins are only a dwarf-size optimization here.

The following seems to fix it.  In testing now.

Richard.

> Richard.
>
>> Thanks.
>> Aldy


p5
Description: Binary data


Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Alexander Potapenko
Not sure I understand what the problem is. Responded inline.

On Mon, Aug 18, 2014 at 9:43 AM, Yury Gribov  wrote:
> On 08/18/2014 09:42 AM, Yury Gribov wrote:
>>
>> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote:
>>>
>>> On the compile farm, ASAN tests seem to fail a lot like:
>>>
>>> FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
>>> test, is ==31166==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> ==31166==ReserveShadowMemoryRange failed while trying to map
>>> 0xdfff0001000 bytes. Perhaps you're using ulimit -v
>>> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*(
Sounds like the tests do not even start up properly. No mmap failures
should be reported.

>>> The problem is that those addresses and sizes are very random,
The output pattern that must be printed has these addresses masked out
(note "0x[0-9a-f]+" in your report).
No other lines with varying addresses should be printed.

>>> so when
>>> I compare the test results of a pristine trunk with a patched one, I
>>> get:
>>>
>>> New tests that FAIL:
>>>
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> [... hundreds of ASAN tests that failed...]
>>>
>>> Old tests that failed, that have disappeared: (Eeek!)
>>>
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>>> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate
>>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>>> 12)
>>> [... the same hundreds of tests that already failed before...]
>>>
>>> The above makes very difficult to identify failures caused by my patch.
>>>
>>> Can we remove the "==" part of the error? This way compare_tests
>>> will ignore the failures.
Am I understanding correctly that "==" in the test stdout has some
special meaning for compare_tests (whatever they are, I'm not really
familiar with GCC testing infrastructure)?
If so, this is quite a questionable choice (e.g. Valgrind also
prefixes the report lines with "==12345=="), and I don't see the point
in removing PIDs/addresses to please this script.


>>> Alternatively, I could patch compare_tests to sed out that part before
>>> comparing. Would that be acceptable?
>>>
>>> Cheers,
>>>
>>> Manuel.
>>>
>>
>> Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and
>> addresses could be turned off.
>>
>
> Ok, this time actually added them.
>



-- 
Alexander Potapenko
Software Engineer
Google Moscow


Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Alexander Potapenko
On Mon, Aug 18, 2014 at 9:42 AM, Yury Gribov  wrote:
> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote:
>>
>> On the compile farm, ASAN tests seem to fail a lot like:
>>
>> FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
>> test, is ==31166==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> ==31166==ReserveShadowMemoryRange failed while trying to map
>> 0xdfff0001000 bytes. Perhaps you're using ulimit -v
>> , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*(
>>
>> The problem is that those addresses and sizes are very random, so when
>> I compare the test results of a pristine trunk with a patched one, I
>> get:
>>
>> New tests that FAIL:
>>
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==12875==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==18428==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> [... hundreds of ASAN tests that failed...]
>>
>> Old tests that failed, that have disappeared: (Eeek!)
>>
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==30142==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> unix//-m64: c-c++-common/asan/global-overflow-1.c   -O0  output
>> pattern test, is ==31166==ERROR: AddressSanitizer failed to allocate
>> 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
>> 12)
>> [... the same hundreds of tests that already failed before...]
>>
>> The above makes very difficult to identify failures caused by my patch.
>>
>> Can we remove the "==" part of the error? This way compare_tests
>> will ignore the failures.
>>
>> Alternatively, I could patch compare_tests to sed out that part before
>> comparing. Would that be acceptable?
>>
>> Cheers,
>>
>> Manuel.
>>
>
> Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses
> could be turned off.
>

Could you please name a reason for that?
Doing so complicates the debugging of multi-process applications but
doesn't bring any obvious advantages.

-- 
Alexander Potapenko
Software Engineer
Google Moscow


Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Yury Gribov

On 08/18/2014 06:36 PM, Alexander Potapenko wrote:

Added Sanitizer folks. Frankly it'd be cool if dumping PIDs and addresses
could be turned off.


Could you please name a reason for that?


Reproducibility?

-Y


Re: ASAN test failures make compare_tests useless

2014-08-18 Thread Manuel López-Ibáñez
On 18 August 2014 16:34, Alexander Potapenko  wrote:
> Not sure I understand what the problem is. Responded inline.
>
> On Mon, Aug 18, 2014 at 9:43 AM, Yury Gribov  wrote:
>> On 08/18/2014 09:42 AM, Yury Gribov wrote:
>>>
>>> On 08/16/2014 04:37 AM, Manuel López-Ibáñez wrote:

 On the compile farm, ASAN tests seem to fail a lot like:

 FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
 test, is ==31166==ERROR: AddressSanitizer failed to allocate
 0xdfff0001000 (15392894357504) bytes at address 2008fff7000 (errno:
 12)
 ==31166==ReserveShadowMemoryRange failed while trying to map
 0xdfff0001000 bytes. Perhaps you're using ulimit -v
 , should match READ of size 1 at 0x[0-9a-f]+ thread T0.*(
> Sounds like the tests do not even start up properly. No mmap failures
> should be reported.
>
 The problem is that those addresses and sizes are very random,
> The output pattern that must be printed has these addresses masked out
> (note "0x[0-9a-f]+" in your report).
> No other lines with varying addresses should be printed.

For the record, I think the fault lies in the GCC testing
infrastructure and not in ASAN. It is wrong to print as the test error
message the output of ASAN. It should print

FAIL: c-c++-common/asan/global-overflow-1.c   -O0  output pattern
test, is  ERROR

This is enough to see that something failed. For details one can go to
the detailed logs. But I didn't add the asan testing infrastructure
and I couldn't figure out how to fix this.

Any suggestions?

Manuell.


Re: LTO inhibiting dwarf lexical blocks output

2014-08-18 Thread Jan Hubicka
> 
> The following seems to fix it.  In testing now.

Will streaming as non-reference prevent DECL from being merged and tails of 
BLOCK_VAR chains
to be corrupted?


Honza
> 
> Richard.
> 
> > Richard.
> >
> >> Thanks.
> >> Aldy




hi from Zina

2014-08-18 Thread zina tekere

Its been quite a while, hope you are doing well. I have been writing with no 
response. Please respond to me, i was browsing looking for honest partner, It 
hurts me so bad. No one to talk to, i want to know you and tell you more about 
me, share my experience with you, with my pictures and phone number for easy 
communication. (Remember the distance or colour does not matter but our good 
heart and sincerity matters allot in life) I am waiting for your mail to my 
email address  above
Thank you very much,
Regards Miss Zina



Re: LTO inhibiting dwarf lexical blocks output

2014-08-18 Thread Richard Biener
On August 18, 2014 8:46:00 PM CEST, Jan Hubicka  wrote:
>> 
>> The following seems to fix it.  In testing now.
>
>Will streaming as non-reference prevent DECL from being merged and
>tails of BLOCK_VAR chains
>to be corrupted?

Yes, the decl ends up in the function section then, not the global types and 
decls one.

Richard.

>
>Honza
>> 
>> Richard.
>> 
>> > Richard.
>> >
>> >> Thanks.
>> >> Aldy




Re: LTO inhibiting dwarf lexical blocks output

2014-08-18 Thread Aldy Hernandez

On 08/18/14 07:31, Richard Biener wrote:

On Mon, Aug 18, 2014 at 12:46 PM, Richard Biener
 wrote:

On Fri, Aug 15, 2014 at 9:59 PM, Aldy Hernandez  wrote:



  For the rest them on the floor instead of ICEing in dwarf2out.c.  */


Should that read "For the rest, drop them on the floor..."???

I'm having a hard time parsing the above.


The following seems to fix it.  In testing now.


Sweet!  Thanks a lot!

And thanks for the explanations.

Aldy



-mkernel argument documentation question

2014-08-18 Thread Joel Sherrill
Hi

-mkernel is documented as:

 Enable kernel development mode.  The '-mkernel' option sets
 '-static', '-fno-common', '-fno-cxa-atexit', '-fno-exceptions',
 '-fno-non-call-exceptions', '-fapple-kext', '-fno-weak' and
 '-fno-rtti' where applicable.  This mode also sets '-mno-altivec',
 '-msoft-float', '-fno-builtin' and '-mlong-branch' for PowerPC
 targets.


Unfortunately, -fapple-kext does not appear anywhere in the
manual.  It is listed in darwin.opt with this:

fapple-kext
Target Report C++ Var(flag_apple_kext)
Generate code for darwin loadable kernel extensions

Who is comfortable fixing that?

-- 
Joel Sherrill, Ph.D. Director of Research & Development
joel.sherr...@oarcorp.comOn-Line Applications Research
Ask me about RTEMS: a free RTOS  Huntsville AL 35805
Support Available(256) 722-9985