Re: Parallelize the compilation using Threads

2018-12-13 Thread Bin.Cheng
On Wed, Dec 12, 2018 at 11:46 PM Giuliano Augusto Faulin Belinassi
 wrote:
>
> Hi, I have some news. :-)
>
> I replicated the Martin Liška experiment [1] on a 64-cores machine for
> gcc [2] and Linux kernel [3] (Linux kernel was fully parallelized),
> and I am excited to dive into this problem. As a result, I want to
> propose GSoC project on this issue, starting with something like:
> 1- Systematically create a benchmark for easily information
> gathering. Martin Liška already made the first version of it, but I
> need to improve it.
> 2- Find and document the global states (Try to reduce the gcc's
> global states as well).
> 3- Define the parallelization strategy.
> 4- First parallelization attempt.
Hi Giuliano,

Thanks very much for working on this.  It could be very useful, for
example, one bottleneck we have is slow compilation of big single
source file after intensively using distribution compilation.  Of
course, a good parallelization strategy is needed.

Thanks,
bin
>
> I also proposed this issue as a research project to my advisor and he
> supported me on this idea. So I can work for at least one year on
> this, and other things related to it.
>
> Would anyone be willing to mentor me on this?
>
> [1] https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> [2] https://www.ime.usp.br/~belinass/64cores-experiment.svg
> [3] https://www.ime.usp.br/~belinass/64cores-kernel-experiment.svg
> On Mon, Nov 19, 2018 at 8:53 AM Richard Biener
>  wrote:
> >
> > On Fri, Nov 16, 2018 at 8:00 PM Giuliano Augusto Faulin Belinassi
> >  wrote:
> > >
> > > Hi! Sorry for the late reply again :P
> > >
> > > On Thu, Nov 15, 2018 at 8:29 AM Richard Biener
> > >  wrote:
> > > >
> > > > On Wed, Nov 14, 2018 at 10:47 PM Giuliano Augusto Faulin Belinassi
> > > >  wrote:
> > > > >
> > > > > As a brief introduction, I am a graduate student that got interested
> > > > >
> > > > > in the "Parallelize the compilation using threads"(GSoC 2018 [1]). I
> > > > > am a newcommer in GCC, but already have sent some patches, some of
> > > > > them have already been accepted [2].
> > > > >
> > > > > I brought this subject up in IRC, but maybe here is a proper place to
> > > > > discuss this topic.
> > > > >
> > > > > From my point of view, parallelizing GCC itself will only speed up the
> > > > > compilation of projects which have a big file that creates a
> > > > > bottleneck in the whole project compilation (note: by big, I mean the
> > > > > amount of code to generate).
> > > >
> > > > That's true.  During GCC bootstrap there are some of those (see 
> > > > PR84402).
> > > >
> > >
> > > > One way to improve parallelism is to use link-time optimization where
> > > > even single source files can be split up into multiple link-time units. 
> > > >  But
> > > > then there's the serial whole-program analysis part.
> > >
> > > Did you mean this: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84402 ?
> > > That is a lot of data :-)
> > >
> > > It seems that 'phase opt and generate' is the most time-consuming
> > > part. Is that the 'GIMPLE optimization pipeline' you were talking
> > > about in this thread:
> > > https://gcc.gnu.org/ml/gcc/2018-03/msg00202.html
> >
> > It's everything that comes after the frontend parsing bits, thus this
> > includes in particular RTL optimization and early GIMPLE optimizations.
> >
> > > > > Additionally, I know that GCC must not
> > > > > change the project layout, but from the software engineering 
> > > > > perspective,
> > > > > this may be a bad smell that indicates that the file should be broken
> > > > > into smaller files. Finally, the Makefiles will take care of the
> > > > > parallelization task.
> > > >
> > > > What do you mean by GCC must not change the project layout?  GCC
> > > > happily re-orders functions and link-time optimization will reorder
> > > > TUs (well, linking may as well).
> > > >
> > >
> > > That was a response to a comment made on IRC:
> > >
> > > On Thu, Nov 15, 2018 at 9:44 AM Jonathan Wakely  
> > > wrote:
> > > >I think this is in response to a comment I made on IRC. Giuliano said
> > > >that if a project has a very large file that dominates the total build
> > > >time, the file should be split up into smaller pieces. I said  "GCC
> > > >can't restructure people's code. it can only try to compile it
> > > >faster". We weren't referring to code transformations in the compiler
> > > >like re-ordering functions, but physically refactoring the source
> > > >code.
> > >
> > > Yes. But from one of the attachments from PR84402, it seems that such
> > > files exist on GCC,
> > > https://gcc.gnu.org/bugzilla/attachment.cgi?id=43440
> > >
> > > > > My questions are:
> > > > >
> > > > >  1. Is there any project compilation that will significantly be 
> > > > > improved
> > > > > if GCC runs in parallel? Do someone has data about something related
> > > > > to that? How about the Linux Kernel? If not, I can try to bring some.
> > > >
> > > > We do not have any data about this

just_select in combine.c:force_to_mode

2018-12-13 Thread SenthilKumar.Selvaraj
Hi,

  When debugging PR 88253, I found that force_to_mode uses a parameter
  (just_select) to prevent the function from returning a const0_rtx even
  if none of the bits set by the rtx are needed. The comment says

   "If JUST_SELECT is nonzero, don't optimize by noticing that bits in MASK
   are all off in X.  This is used when X will be complemented, by either
   NOT, NEG, or XOR."

   and the code behaves the same way, but could someone help me
   understand why?

   I ran into this, when I found that force_to_mode converts
   
   (ior:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
(const_int 8 [0x8]))
   
(reg:HI 55 [ D.1627 ])) 0)
(const_int 42 [0x2a]))

into

   (set (reg:QI 44 [ D.1626 ])
  (ior:QI (subreg:QI (reg:HI 55 [ D.1627 ]) 0)
  (const_int 42 [0x2a])))

   but is unable to do the same thing for the below rtx.

   (xor:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))   
  
(const_int 8 [0x8]))
   
(reg:HI 55 [ D.1627 ])) 0)  
   
(reg:QI 58))  

   The only difference is the xor instead of ior at the outermost sexp,
   and force_to_mode returns

   (set (reg:QI 44 [ D.1626 ])  
  
(xor:QI (ior:QI (subreg:QI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))  
   
(const_int 8 [0x8])) 0) 
   
(subreg:QI (reg:HI 55 [ D.1627 ]) 0))   
   
(reg:QI 58)))

   This more complicated pattern doesn't match, and combine
   moves on to another combination of insns, eventually resulting in PR 88253.

   Isn't the simplification of 

   (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))   
  
(const_int 8 [0x8]))
   
(reg:HI 55 [ D.1627 ])) 

   to

   (subreg:QI (reg:HI 55 [ D.1627 ])

   safe to do, wheter the outer insn code is XOR or IOR?

Regards
Senthil




Re: Optimizing C++ Move Functions in Stl

2018-12-13 Thread Jonathan Wakely

On 12/12/18 15:05 -0500, nick wrote:



On 2018-12-12 10:24 a.m., Jonathan Wakely wrote:

On 12/12/18 17:17 +0200, Ville Voutilainen wrote:

On Wed, 12 Dec 2018 at 17:14, nick  wrote:


> I think there's an attempt to ascertain that mostly constructors and
> assignment operators need noexcept-fixes,
> because that noexcept-ness is directly trait-detectable.
> That would match my current understanding of the situation for at
> least pair and tuple.
>


Yes that's true. I was also asking about is there a TODO list for the current 
release
of gcc 9 as Jonathan mentioned this work is a stage 1 fix or feature and should 
wait
until gcc 10 stage 1 so was wondering what work is needed in the current stage 
3.

Sorry for the confusion with the previous email and hopefully this makes more 
sense,


We don't have a specific TODO list for gcc 9. For general stuff, we have
https://gcc.gnu.org/wiki/LibstdcxxTodo
which is a bit out of date...


I think he's asking about GCC in general, not just libstdc++. The
answer is that fixing bugs is appropriate for stage 3, so pick any
open bugs from Bugzilla.




That's right I was asking about all of gcc. Sorry I thought I CCed the gcc devel
list so no wonder so confused.


You did CC the GCC list but I removed it from the CC because I was
only responding to the part of your email about libstdc++.

I see no reason to use a single thread for "what is libstdc++'s
noexcept policy" and "what EasyHacks are there in the rest of the
compiler". They should be two separate threads.



Re: just_select in combine.c:force_to_mode

2018-12-13 Thread Segher Boessenkool
Hi!

On Thu, Dec 13, 2018 at 09:39:52AM +, senthilkumar.selva...@microchip.com 
wrote:
>   When debugging PR 88253, I found that force_to_mode uses a parameter
>   (just_select) to prevent the function from returning a const0_rtx even
>   if none of the bits set by the rtx are needed. The comment says
> 
>"If JUST_SELECT is nonzero, don't optimize by noticing that bits in MASK
>are all off in X.  This is used when X will be complemented, by either
>NOT, NEG, or XOR."
> 
>and the code behaves the same way, but could someone help me
>understand why?

This was introduced in https://gcc.gnu.org/r6342 .


>I ran into this, when I found that force_to_mode converts
>
>(ior:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
> (const_int 8 [0x8]))  
>  
> (reg:HI 55 [ D.1627 ])) 0)
> (const_int 42 [0x2a]))
> 
> into
> 
>(set (reg:QI 44 [ D.1626 ])
>   (ior:QI (subreg:QI (reg:HI 55 [ D.1627 ]) 0)
>   (const_int 42 [0x2a])))
> 
>but is unable to do the same thing for the below rtx.
> 
>(xor:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ])) 
> 
> (const_int 8 [0x8]))  
>  
> (reg:HI 55 [ D.1627 ])) 0)
>  
> (reg:QI 58))  
> 
>The only difference is the xor instead of ior at the outermost sexp,
>and force_to_mode returns
> 
>(set (reg:QI 44 [ D.1626 ])
> 
> (xor:QI (ior:QI (subreg:QI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h 
> ])) 
> (const_int 8 [0x8])) 0)   
>  
> (subreg:QI (reg:HI 55 [ D.1627 ]) 0)) 
>  
> (reg:QI 58)))
> 
>This more complicated pattern doesn't match, and combine
>moves on to another combination of insns, eventually resulting in PR 88253.

Combine not doing some combination is never incorrect, of course (just not
what you want, possibly, but not the cause of a bug :-) )

>Isn't the simplification of 
> 
>(subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ])) 
> 
> (const_int 8 [0x8]))  
>  
> (reg:HI 55 [ D.1627 ])) 
> 
>to
> 
>(subreg:QI (reg:HI 55 [ D.1627 ])
> 
>safe to do, wheter the outer insn code is XOR or IOR?

Probably.

(Please don't paste lines full of spaces in your emails, it is hard to read).


Segher


Not Sure about best way to fix the Null Pointer

2018-12-13 Thread nick
Greetings All,

I seem to have probably traced this bug down but am not sure what is the best 
way 
to fix it being new here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88395#add_comment

Seems the issue is in va_heap::reserve in vec.h as we aren't checking if v is 
equal to N like in vaheap::release. Seems a few odd to me that it's not 
checking that as it calculates a allocation afterwards and seems more than
likely it crashes due to v being Null. I am assuming this is the proper fix
but let me know if I am fixing something like the allocation is known
to be good. 

Before the vec_prefix::calculate_allocation call do:

if (v == NULL)
return

Thanks,

Nick


gcc-7-20181213 is now available

2018-12-13 Thread gccadmin
Snapshot gcc-7-20181213 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/7-20181213/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 7 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-7-branch 
revision 267110

You'll find:

 gcc-7-20181213.tar.xzComplete GCC

  SHA256=e199937fc3cf8f4bdd5d3efa4aac8fb6efb0dc3f3e2ecfef25433198d0edef87
  SHA1=efcba495e1368374ec9e4c2364837bca18ad8325

Diffs from 7-20181206 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-7
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: just_select in combine.c:force_to_mode

2018-12-13 Thread SenthilKumar.Selvaraj


Segher Boessenkool writes:

> Hi!
>
> On Thu, Dec 13, 2018 at 09:39:52AM +, senthilkumar.selva...@microchip.com 
> wrote:
>>   When debugging PR 88253, I found that force_to_mode uses a parameter
>>   (just_select) to prevent the function from returning a const0_rtx even
>>   if none of the bits set by the rtx are needed. The comment says
>>
>>"If JUST_SELECT is nonzero, don't optimize by noticing that bits in MASK
>>are all off in X.  This is used when X will be complemented, by either
>>NOT, NEG, or XOR."
>>
>>and the code behaves the same way, but could someone help me
>>understand why?
>
> This was introduced in https://gcc.gnu.org/r6342 .

Yep :). Was hoping someone would have run into a similar situation,
although trawling the gcc mailing lists didn't turn up anything useful.
>
>
>>I ran into this, when I found that force_to_mode converts
>>
>>(ior:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
>> (const_int 8 [0x8]))
>> (reg:HI 55 [ D.1627 ])) 0)
>> (const_int 42 [0x2a]))
>>
>> into
>>
>>(set (reg:QI 44 [ D.1626 ])
>>   (ior:QI (subreg:QI (reg:HI 55 [ D.1627 ]) 0)
>>   (const_int 42 [0x2a])))
>>
>>but is unable to do the same thing for the below rtx.
>>
>>(xor:QI (subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
>> (const_int 8 [0x8]))
>> (reg:HI 55 [ D.1627 ])) 0)
>> (reg:QI 58))
>>
>>The only difference is the xor instead of ior at the outermost sexp,
>>and force_to_mode returns
>>
>>(set (reg:QI 44 [ D.1626 ])
>> (xor:QI (ior:QI (subreg:QI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h 
>> ]))
>> (const_int 8 [0x8])) 0)
>> (subreg:QI (reg:HI 55 [ D.1627 ]) 0))
>> (reg:QI 58)))
>>
>>This more complicated pattern doesn't match, and combine
>>moves on to another combination of insns, eventually resulting in PR 
>> 88253.
>
> Combine not doing some combination is never incorrect, of course (just not
> what you want, possibly, but not the cause of a bug :-) )

Understood. The bug report said the PR does not show up for a bitwise
OR, so just wanted to convey that this is where the divergence
between the RTL for a bitwise OR vs XOR starts.
>
>>Isn't the simplification of
>>
>>(subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
>> (const_int 8 [0x8]))
>> (reg:HI 55 [ D.1627 ]))
>>
>>to
>>
>>(subreg:QI (reg:HI 55 [ D.1627 ])
>>
>>safe to do, wheter the outer insn code is XOR or IOR?
>
> Probably.

I was wondering if I should put out a patch that resets just_select for
(the nested) IOR's operands, but maybe I should leave this untouched. As
you said, this is more a missed optimization, and shouldn't really cause
any bugs.
>
> (Please don't paste lines full of spaces in your emails, it is hard to read).

Apologies, didn't notice that.

Regards
Senthil


Re: just_select in combine.c:force_to_mode

2018-12-13 Thread Segher Boessenkool
On Fri, Dec 14, 2018 at 06:32:32AM +, senthilkumar.selva...@microchip.com 
wrote:
> Segher Boessenkool writes:
> > On Thu, Dec 13, 2018 at 09:39:52AM +, 
> > senthilkumar.selva...@microchip.com wrote:
> >>   When debugging PR 88253, I found that force_to_mode uses a parameter
> >>   (just_select) to prevent the function from returning a const0_rtx even
> >>   if none of the bits set by the rtx are needed. The comment says
> >>
> >>"If JUST_SELECT is nonzero, don't optimize by noticing that bits in MASK
> >>are all off in X.  This is used when X will be complemented, by either
> >>NOT, NEG, or XOR."
> >>
> >>and the code behaves the same way, but could someone help me
> >>understand why?
> >
> > This was introduced in https://gcc.gnu.org/r6342 .
> 
> Yep :). Was hoping someone would have run into a similar situation,
> although trawling the gcc mailing lists didn't turn up anything useful.

The core problem is that nonzero_bits isn't symmetrical: it tells you
which bits are not guaranteed to be zero, but nothing about which bits
are one (other than that the bits that *are* zero are not one, of course).

> >>This more complicated pattern doesn't match, and combine
> >>moves on to another combination of insns, eventually resulting in PR 
> >> 88253.
> >
> > Combine not doing some combination is never incorrect, of course (just not
> > what you want, possibly, but not the cause of a bug :-) )
> 
> Understood. The bug report said the PR does not show up for a bitwise
> OR, so just wanted to convey that this is where the divergence
> between the RTL for a bitwise OR vs XOR starts.

Ah, okay.

> >>Isn't the simplification of
> >>
> >>(subreg:QI (ior:HI (ashift:HI (zero_extend:HI (reg/v:QI 46 [ h ]))
> >> (const_int 8 [0x8]))
> >> (reg:HI 55 [ D.1627 ]))
> >>
> >>to
> >>
> >>(subreg:QI (reg:HI 55 [ D.1627 ])
> >>
> >>safe to do, wheter the outer insn code is XOR or IOR?
> >
> > Probably.
> 
> I was wondering if I should put out a patch that resets just_select for
> (the nested) IOR's operands, but maybe I should leave this untouched. As
> you said, this is more a missed optimization, and shouldn't really cause
> any bugs.

It seems to me that the latter expression always is a correct simplification,
so a patch to make that happen is welcome.  If it is simple enough it may
go in for GCC 9 still, otherwise, it will have to wait for GCC 10.

Thanks,


Segher