Re: ChangeLog's: do we have to?

2018-07-23 Thread Richard Biener
On Fri, Jul 20, 2018 at 11:04 PM Joseph Myers  wrote:
>
> On Thu, 5 Jul 2018, Aldy Hernandez wrote:
>
> > However, even if you could "git log --grep" the commit messages, I assume 
> > your
> > current use is grepping for function names and such, right? Being able to 
> > grep
> > a commit message won't solve that problem, or am I missing something?
>
> If you know what function and file you're interested in, you can use git
> log -L to find changes to it (I've used that in the GCC context before),
> or of course other tools such as git blame.

Does -L "work" with codebases that become more and more C++?  I do realize
that grepping ChangeLogs gets more difficult here as well given there are no
clear rules how to "mangle" C++ function/entity names in ChangeLog entries.

Fortunately we're still mostly C 

Richard.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: O2 Agressive Optimisation by GCC

2018-07-23 Thread David Brown
Hi,

This is nothing to do with undefined behaviour, but a matter of
scheduling of effects that are visible in different circumstances.  In
particular, i and j are declared in a way that tells the compiler that
the compiler, in its current thread of execution has full control of
them.  The compiler knows that while it is executing the code in test,
nothing else can affect the value of i or j, nor can they be affected by
the values of i and j.  The compiler knows that code from elsewhere may
read or write them, but only before test() is called, during functions
called from test(), or after test() returns.  It knows for sure that
there are no other threads of execution that interact via i and j.

So how do you inhibit these kinds of optimisations?  Stop lying to your
compiler.

If you want them to be visible in other threads, tell your compiler that
they are visible in other threads.  You already know how to do that -
using volatile accesses, atomic accesses, other  features,
or operating system features.  You can also use somewhat "hack"
techniques such as Linux's "ACCESS_ONCE" macro or inline assembly
dependency controls, but it would be better to define and declare the
data correctly.

Messing around with optimisation settings is just a way of hiding your
coding and design errors until they get more subtle and harder to spot
in the future.

mvh.,

David


On 22/07/18 17:00, Umesh Kalappa wrote:
> Hi Richard,
> 
> making i unsigned still  the  optimization is effective ,no luck.
> and yes test() is the threaded  routine and since i and j are global
> ,we need the side effects take place like assignment etc ,that are
> observed by other threads .
> 
> By making volatile or thread safe or atomic operations ,the
> optimization inhibited ,but still we  didn't  get  why its valid
> optimization for UB and tried with -fno-strict-overflow too ,no luck
> here .
> 
> Jakub and anyone can we inhibit these kind optimizations,that consider
> the UB and optimize .
> 
> Thank you
> ~Umesh
> 
> On Fri, Jul 20, 2018 at 11:47 PM, Richard Biener
>  wrote:
>> On July 20, 2018 7:59:10 PM GMT+02:00, Martin Sebor  wrote:
>>> On 07/20/2018 06:19 AM, Umesh Kalappa wrote:
 Hi All ,

 We are looking at the C sample i.e

 extern int i,j;

 int test()
 {
 while(1)
 {   i++;
 j=20;
 }
 return 0;
 }

 command used :(gcc 8.1.0)
 gcc -S test.c -O2

 the generated asm for x86

 .L2:
 jmp .L2

 we understand that,the infinite loop is not  deterministic ,compiler
 is free to treat as that as UB and do aggressive optimization ,but we
 need keep the side effects like j=20 untouched by optimization .

 Please note that using the volatile qualifier for i and j  or empty
 asm("") in the while loop,will stop the optimizer ,but we don't want
 do  that.

 Anyone from the community ,please share their insights why above
 transformation is right ?
>>>
>>> The loop isn't necessarily undefined (and compilers don't look
>>> for undefined behavior as opportunities to optimize code), but
>>
>> The variable i overflows.
>>
>>> because it doesn't terminate it's not possible for a conforming
>>> C program to detect the side-effects in its body.  The only way
>>> to detect it is to examine the object code as you did.
>>
>> I'm not sure we perform this kind of dead code elimination but yes, we 
>> could. Make i unsigned and check whether that changes behavior.
>>
>>> Compilers are allowed (and expected) to transform source code
>>> into efficient object code as long as the transformations don't
>>> change the observable effects of the program.  That's just what
>>> happens in this case.
>>>
>>> Martin
>>
> 



Re: ChangeLog's: do we have to?

2018-07-23 Thread Segher Boessenkool
On Mon, Jul 23, 2018 at 11:52:03AM +0200, Richard Biener wrote:
> Does -L "work" with codebases that become more and more C++?  I do realize
> that grepping ChangeLogs gets more difficult here as well given there are no
> clear rules how to "mangle" C++ function/entity names in ChangeLog entries.
> 
> Fortunately we're still mostly C 

For example for .md files you can use

[diff "md"]
xfuncname = "^\\(define.*$"

in your local clone's .git/config

and

*.md  diff=md

in .gitattributes (somewhere in the source tree).

I should make a patch for that.  Hrm.


Segher


Re: O2 Agressive Optimisation by GCC

2018-07-23 Thread Segher Boessenkool
Hi!

On Mon, Jul 23, 2018 at 12:36:50PM +0200, David Brown wrote:
> This is nothing to do with undefined behaviour, but a matter of
> scheduling of effects that are visible in different circumstances.  In
> particular, i and j are declared in a way that tells the compiler that
> the compiler, in its current thread of execution has full control of
> them.  The compiler knows that while it is executing the code in test,
> nothing else can affect the value of i or j, nor can they be affected by
> the values of i and j.  The compiler knows that code from elsewhere may
> read or write them, but only before test() is called, during functions
> called from test(), or after test() returns.  It knows for sure that
> there are no other threads of execution that interact via i and j.

It could in theory know that, yes, but in this case it just hoists the
assignment to after the loop.  And it's an infinite loop, so it just
disappears.

> So how do you inhibit these kinds of optimisations?  Stop lying to your
> compiler.

Yup :-)


Segher


Re: ChangeLog's: do we have to?

2018-07-23 Thread Alexander Monakov
On Mon, 23 Jul 2018, Segher Boessenkool wrote:

> For example for .md files you can use
> 
> [diff "md"]
> xfuncname = "^\\(define.*$"
> 
> in your local clone's .git/config
> 
> and
> 
> *.md  diff=md
> 
> in .gitattributes (somewhere in the source tree).

Not necessarily in the source tree: individual users can put that into their
$XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes.

Likewise the previous snippet can go into $HOME/.gitconfig rather than each
individual cloned tree.

(the point is, there's no need to split this quality-of-life change between the
repository and the user's setup - it can be done once by a user and will work
for all future checkouts)

Alexander


Re: That light at the end of the tunnel?

2018-07-23 Thread Richard Earnshaw (lists)
On 21/07/18 03:04, Eric S. Raymond wrote:
> That light at the end of the tunnel turned out to be an oncoming train.
> 
> Until recently I thought the conversion was near finished. I'd had
> verified clean conversions across trunk and all branches, except for
> one screwed-up branch that the management agreed we could discard.
> 
> I had some minor issues left with execute-permission propagation and how
> to interpret mid-branch deletes  I solved the former and was working
> on the latter.  I expected to converge on a final result well before
> the end of the year, probably in August or September.
> 
> Then, as I reported here, my most recent test conversion produced
> incorrect content on trunk.  That's very bad, because the sheer size
> of the GCC repository makes bug forensics extremely slow. Just loading
> the SVN dump file for examination in reposurgeon takes 4.5 hours; full
> conversions are back up to 9 hours now.  The repository is growing
> about as fast as my ability to find speed optimizations.
> 
> Then it got worse. I backed up to a commit that I remembered as
> producing a clean conversion, and it didn't. This can only mean that
> the reposurgeon changes I've been making to handle weird branch-copy
> cases have been fighting each other.
> 
> For those of you late to the party, interpreting the operation
> sequences in Subversion dump files is simple and produces results that
> are easy to verify - except near branch copy operations. The way those
> interact with each other and other operations is extremely murky.
> 
> There is *a* correct semantics defined by what the Subversion code
> does.  But if any of the Subversion devs ever fully understood it,
> they no longer do. The dump format was never documented by them. It is
> only partly documented now because I reverse-engineered it.  But the
> document I wrote has questions in it that the Subversion devs can't
> answer.
> 
> It's not unusual for me to trip over a novel branch-copy-related
> weirdness while converting a repo.  Normally the way I handle this is
> by performing a bisection procedure to pin down the bad commit.  Then I:
> 
> (1) Truncate the dump to the shortest leading segment that
> reproduces the problem.
> 
> (2) Perform a strip operation that replaces all content blobs with
> unique small cookies that identify their source commit. Verify that it still
> reproduces...
> 
> (3) Perform a topological reduce that drops out all uninteresting
> commits, that is pure content changes not adjacent to any branch
> copies or property changes. Verify that it still reproduces...
> 
> (4) Manually remove irrelevant branches with reposurgeon.
> Verify that it still reproduces...
> 
> At this point I normally have a fairly small test repository (never,
> previously, more than 200 or so commits) that reproduces
> the issue. I watch conversions at increasing debug levels until I
> figure out what is going on. Then I fix it and the reduced dump
> becomes a new regression test.
> 
> In this way I make monotonic progress towards a dumpfile analyzer
> that ever more closely emulates what the Subversion code is doing.
> It's not anything like easy, and gets less so as the edge cases I'm
> probing get more recondite.  But until now it worked.
> 
> The size of the GCC repository defeats this strategy. By back of the
> envelope calculation, a single full bisection would take a minimum of
> 18 days.  Realistically it would probably be closer to a month.
> 

So traditional git bisect is inherently serial, but we can be more
creative here, surely.  A single run halves the search space each time.
But three machines working together can split it into 4 each run, 7
machines into 8, etc.  You don't even need a precise 2^N - 1 to get a
speedup.

It's not as efficient computationally as running on a single machine,
but it can be more efficient in terms of elapsed time.

We just need to find some way of divvying up the work and then machines
that are capable of running the job.  They don't have to be 'big beasts,
just have enough ram and not be so puny that they overall hold up the
main process.

Think seti@home for git bisect

Surely collectively we can solve this problem...

R.

> That means that, under present assumptions, it's game over
> and we've lost.  The GCC repo is just too large and weird.
> 
> My tools need to get a lot faster, like more than an order of
> magnitude faster, before digging out of the bad situation the
> conversion is now in will be practical.
> 
> Hardware improvements won't do that.  Nobody knows how to build a
> machine that can crank a single process enough faster than 1.3GHz.
> And the problem doesn't parallelize.
> 
> There is a software change that might do it.  I have been thinking
> about translating reposurgeon from Python to Go. Preliminary
> experiments with a Go version of repocutter show that it has a
> 40x speed advantage over the Python version.  I don't think I'll
> get quite that much speedup on reposurgeon, but

Re: That light at the end of the tunnel?

2018-07-23 Thread Richard Biener
On Sat, Jul 21, 2018 at 1:39 PM Eric S. Raymond  
wrote:
>
> On Sat, Jul 21, 2018 at 09:26:10AM +0200, Richard Biener wrote:
> > Can you summarize what is wrong with our current git mirror which was IIRC 
> > created by git-svn importing?
>
> git-svn tends to do subtle danage damage to the back history.  See my PSA at
> http://esr.ibiblio.org/?p=6778
>
> Partial quote:
>
> The problem with git-svn as a full importer is that it is not robust in 
> the presence
> of repository malformations and edge cases – and these are all too 
> common, both as
> a result of operator errors and scar tissue left by previous conversions 
> from CVS. If
> anyone on your project has ever done a plain cp rather than “svn cp” when 
> creating a
> tag directory, or deleted a branch or tag and then recreated it, or 
> otherwise
> offended against the gods of the Subversion data model, git-svn will 
> cheerfully,
> silently seize on that flaw and amplify the hell out of it in your git 
> translation.
>
>
> The result is likely to be a repository that looks just right enough at 
> the head end
> to hide damage further back in the history. People often fail to notice 
> this because
> they don’t actually spend much time looking at old revisions after a 
> repository
> conversion – but on the rare occasions when history damage bites you it’s 
> going to
> bite hard.
>
> Since I wrote that I have learned that git-svn full conversions also have a 
> tendency
> to screw up the location of branch joins.

Ok, so let me ask whether you can currently convert trunk and gcc-{6,7,8}-branch
successfully, ignoring "merges" into them (shouldn't have happened).  All other
branches can in theory be converted later if required, right?

Richard.


Re: [RFC] Adding Python as a possible language and it's usage

2018-07-23 Thread Joseph Myers
On Tue, 17 Jul 2018, Martin Liška wrote:

> I've recently touched AWK option generate machinery and it's quite 
> unpleasant to make any adjustments. My question is simple: can we 
> starting using a scripting language like Python and replace usage of the 
> AWK scripts? It's probably question for Steering committee, but I would 
> like to see feedback from community.

I'd prefer Python to Awk for this code.

> 4) we can come up with new sanity checks of options:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81397

More generally, I don't think there are any checks that flags specified 
for options are known flags at all; I expect a typo in a flag to result in 
it being silently ignored.

Common code that reads .opt files into some logical datastructure, 
complete with validation including that all flags specified are in the 
list of valid flags, followed by converting those structures to whatever 
output is required, seems appropriate to me.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: ICE building a libsupc++ file, pdp11 target

2018-07-23 Thread Joseph Myers
On Tue, 17 Jul 2018, Paul Koning wrote:

> That reveals some things but nothing jumps out at me.  However... pdp11 
> is an a.out target, not an ELF target.  Would that explain the problem?  
> If yes, is there a workaround (short of implementing ELF)?

As there are hardly any targets left without named section support, using 
ELF might be a good idea so you don't have to deal with the 
no-named-sections issues.

The ELF e_machine value EM_PDP11 was assigned to Lars Brinkoff, 
l...@nocrew.org, 30 May 2002, according to the comments in 
ch4.eheader.html.  I don't know if an actual ELF ABI has been defined.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [RFC] Adding Python as a possible language and it's usage

2018-07-23 Thread Joseph Myers
On Wed, 18 Jul 2018, David Malcolm wrote:

> Python 3.3 reintroduced the 'u' prefix for unicode string literals (PEP
> 414), which makes it much easier to write scripts that work with both
> 2.* and 3.*.  Python 3.3 is almost 6 years old.

I can't see u'' as of any relevance to .opt parsing.  Both the .opt files, 
and the generated output from them, should be pure ASCII, and using native 
str throughout (never using Python 2 unicode) should work fine.

(I don't see much value in declaring support for EOL versions of Python, 
i.e. anything before 2.7 and 3.4, but if we do, I don't think u'' will be 
a feature that controls which versions are supported.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: ICE building a libsupc++ file, pdp11 target

2018-07-23 Thread Paul Koning



> On Jul 23, 2018, at 10:21 AM, Joseph Myers  wrote:
> 
> On Tue, 17 Jul 2018, Paul Koning wrote:
> 
>> That reveals some things but nothing jumps out at me.  However... pdp11 
>> is an a.out target, not an ELF target.  Would that explain the problem?  
>> If yes, is there a workaround (short of implementing ELF)?
> 
> As there are hardly any targets left without named section support, using 
> ELF might be a good idea so you don't have to deal with the 
> no-named-sections issues.
> 
> The ELF e_machine value EM_PDP11 was assigned to Lars Brinkoff, 
> l...@nocrew.org, 30 May 2002, according to the comments in 
> ch4.eheader.html.  I don't know if an actual ELF ABI has been defined.

I don't know of pdp11 ELF code in binutils.  

The named-section stuff itself doesn't seem to be directly related.  If I run 
the test with the target flag -mdec-asm, it still fails.  That mode does 
support named sections.

I can easily see the issue with the debugger and compare with a target that 
works (vax).  So I should be able to find this, at least once I figure out how 
to turn off address space randomization on my host.

paul



Re: ChangeLog's: do we have to?

2018-07-23 Thread Segher Boessenkool
On Mon, Jul 23, 2018 at 04:03:31PM +0300, Alexander Monakov wrote:
> Not necessarily in the source tree: individual users can put that into their
> $XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes.

And then that user has *all* .md files treated as GCC machine description
files.  But .md is a quite common extension.


Segher


Re: ChangeLog's: do we have to?

2018-07-23 Thread Eric Gallager
On 7/23/18, Segher Boessenkool  wrote:
> On Mon, Jul 23, 2018 at 04:03:31PM +0300, Alexander Monakov wrote:
>> Not necessarily in the source tree: individual users can put that into
>> their
>> $XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes.
>
> And then that user has *all* .md files treated as GCC machine description
> files.  But .md is a quite common extension.
>
>
> Segher
>

Yeah, it's usually used for markdown. This file extension clash came
up previously when trying to generate doxygen documentation for gcc
sources: https://gcc.gnu.org/ml/gcc/2017-06/msg00063.html


Re: ChangeLog's: do we have to?

2018-07-23 Thread Joseph Myers
On Mon, 23 Jul 2018, Richard Biener wrote:

> Does -L "work" with codebases that become more and more C++?  I do realize

Well, you can specify an arbitrary regular expression for your funcname 
line with -L if you need to.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: That light at the end of the tunnel?

2018-07-23 Thread Joseph Myers
On Mon, 23 Jul 2018, Richard Earnshaw (lists) wrote:

> So traditional git bisect is inherently serial, but we can be more
> creative here, surely.  A single run halves the search space each time.
> But three machines working together can split it into 4 each run, 7
> machines into 8, etc.  You don't even need a precise 2^N - 1 to get a
> speedup.

Exactly.  Given an appropriate recipe for testing whether the conversion 
of history up to a given revision is OK or not, I can run tests in 
parallel for nine different revisions on nine different machines (each 
with 128 GB memory) at the same time as easily as running one such test.  
(And the conversions for shorter initial segments of history should be 
faster, so if the bug turns out to relate to conversion of cvs2svn scar 
tissue, you don't even need to wait for the conversions of longer portions 
of history to complete before you've narrowed down where the bug is and 
can start another such bisection.)

I think parallelising the bisection process is a better approach than 
trying to convert only a subset of branches (which I don't think would 
help with the present problem - though we can always consider killing 
selected branches with too many mid-branch deletealls, if appropriate) or 
waiting for a move to Go.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: That light at the end of the tunnel?

2018-07-23 Thread Jeff Law
On 07/23/2018 08:53 AM, Joseph Myers wrote:
> On Mon, 23 Jul 2018, Richard Earnshaw (lists) wrote:
> 
>> So traditional git bisect is inherently serial, but we can be more
>> creative here, surely.  A single run halves the search space each time.
>> But three machines working together can split it into 4 each run, 7
>> machines into 8, etc.  You don't even need a precise 2^N - 1 to get a
>> speedup.
> 
> Exactly.  Given an appropriate recipe for testing whether the conversion 
> of history up to a given revision is OK or not, I can run tests in 
> parallel for nine different revisions on nine different machines (each 
> with 128 GB memory) at the same time as easily as running one such test.  
> (And the conversions for shorter initial segments of history should be 
> faster, so if the bug turns out to relate to conversion of cvs2svn scar 
> tissue, you don't even need to wait for the conversions of longer portions 
> of history to complete before you've narrowed down where the bug is and 
> can start another such bisection.)
> 
> I think parallelising the bisection process is a better approach than 
> trying to convert only a subset of branches (which I don't think would 
> help with the present problem - though we can always consider killing 
> selected branches with too many mid-branch deletealls, if appropriate) or 
> waiting for a move to Go.
Hell, I'd live with doing a "reasonable effort" for the vast majority of
our branches.  Other than the trunk, active release branches and a few
active development branches I don't think we really care about 'em.

jeff



Re: That light at the end of the tunnel?

2018-07-23 Thread Paul Koning



> On Jul 23, 2018, at 12:21 PM, Jeff Law  wrote:
> 
>> 
> Hell, I'd live with doing a "reasonable effort" for the vast majority of
> our branches.  Other than the trunk, active release branches and a few
> active development branches I don't think we really care about 'em.
> 
> jeff

There are two approaches to conversion: (1) convert what's active and preserve 
the old system indefinitely for reference access; (2) convert everything 100% 
so the old system can be retired.

It seems that Eric has been trying for #2, which is fine if doable.  But #1 is 
also a reasonable option and if the nature of the beast makes #2 acceptable, 
going for #1 is a plan I would definitely support.

paul



Re: That light at the end of the tunnel?

2018-07-23 Thread Jeff Law
On 07/23/2018 10:29 AM, Paul Koning wrote:
> 
> 
>> On Jul 23, 2018, at 12:21 PM, Jeff Law  wrote:
>>
>>>
>> Hell, I'd live with doing a "reasonable effort" for the vast majority of
>> our branches.  Other than the trunk, active release branches and a few
>> active development branches I don't think we really care about 'em.
>>
>> jeff
> 
> There are two approaches to conversion: (1) convert what's active and 
> preserve the old system indefinitely for reference access; (2) convert 
> everything 100% so the old system can be retired.
> 
> It seems that Eric has been trying for #2, which is fine if doable.  But #1 
> is also a reasonable option and if the nature of the beast makes #2 
> acceptable, going for #1 is a plan I would definitely support.
Yea.  I suspect we'll keep the SVN repo around read-only essentially
forever so the links in bugzilla continue to work.  There's probably
other uses of the SVN version #s that we'd like to preserve.

jeff


Re: That light at the end of the tunnel?

2018-07-23 Thread Joseph Myers
On Mon, 23 Jul 2018, Jeff Law wrote:

> > There are two approaches to conversion: (1) convert what's active and 
> > preserve the old system indefinitely for reference access; (2) convert 
> > everything 100% so the old system can be retired.
> > 
> > It seems that Eric has been trying for #2, which is fine if doable.  
> > But #1 is also a reasonable option and if the nature of the beast 
> > makes #2 acceptable, going for #1 is a plan I would definitely 
> > support.
> Yea.  I suspect we'll keep the SVN repo around read-only essentially
> forever so the links in bugzilla continue to work.  There's probably
> other uses of the SVN version #s that we'd like to preserve.

We'll obviously keep SVN around readonly just as with CVS.

The problem described is not one for which only keeping a few selected 
branches would actually help at all.  If we get to a point where 
everything converts OK except for a few obscure non-trunk branches that 
have problems (possibly mid-branch deletealls), then we can consider 
excluding those branches (starting from a baseline of keeping all branches 
that are still present in SVN, then identify particular problem branches 
to exclude).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: ChangeLog's: do we have to?

2018-07-23 Thread Trevor Saunders
On Mon, Jul 23, 2018 at 02:48:12PM +, Joseph Myers wrote:
> On Mon, 23 Jul 2018, Richard Biener wrote:
> 
> > Does -L "work" with codebases that become more and more C++?  I do realize
> 
> Well, you can specify an arbitrary regular expression for your funcname 
> line with -L if you need to.

or use -L start_line,end_line:file to get the history of some arbitrary
chunk of a file.

Trev

> 
> -- 
> Joseph S. Myers
> jos...@codesourcery.com