Re: ChangeLog's: do we have to?
On Fri, Jul 20, 2018 at 11:04 PM Joseph Myers wrote: > > On Thu, 5 Jul 2018, Aldy Hernandez wrote: > > > However, even if you could "git log --grep" the commit messages, I assume > > your > > current use is grepping for function names and such, right? Being able to > > grep > > a commit message won't solve that problem, or am I missing something? > > If you know what function and file you're interested in, you can use git > log -L to find changes to it (I've used that in the GCC context before), > or of course other tools such as git blame. Does -L "work" with codebases that become more and more C++? I do realize that grepping ChangeLogs gets more difficult here as well given there are no clear rules how to "mangle" C++ function/entity names in ChangeLog entries. Fortunately we're still mostly C Richard. > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: O2 Agressive Optimisation by GCC
Hi, This is nothing to do with undefined behaviour, but a matter of scheduling of effects that are visible in different circumstances. In particular, i and j are declared in a way that tells the compiler that the compiler, in its current thread of execution has full control of them. The compiler knows that while it is executing the code in test, nothing else can affect the value of i or j, nor can they be affected by the values of i and j. The compiler knows that code from elsewhere may read or write them, but only before test() is called, during functions called from test(), or after test() returns. It knows for sure that there are no other threads of execution that interact via i and j. So how do you inhibit these kinds of optimisations? Stop lying to your compiler. If you want them to be visible in other threads, tell your compiler that they are visible in other threads. You already know how to do that - using volatile accesses, atomic accesses, other features, or operating system features. You can also use somewhat "hack" techniques such as Linux's "ACCESS_ONCE" macro or inline assembly dependency controls, but it would be better to define and declare the data correctly. Messing around with optimisation settings is just a way of hiding your coding and design errors until they get more subtle and harder to spot in the future. mvh., David On 22/07/18 17:00, Umesh Kalappa wrote: > Hi Richard, > > making i unsigned still the optimization is effective ,no luck. > and yes test() is the threaded routine and since i and j are global > ,we need the side effects take place like assignment etc ,that are > observed by other threads . > > By making volatile or thread safe or atomic operations ,the > optimization inhibited ,but still we didn't get why its valid > optimization for UB and tried with -fno-strict-overflow too ,no luck > here . > > Jakub and anyone can we inhibit these kind optimizations,that consider > the UB and optimize . > > Thank you > ~Umesh > > On Fri, Jul 20, 2018 at 11:47 PM, Richard Biener > wrote: >> On July 20, 2018 7:59:10 PM GMT+02:00, Martin Sebor wrote: >>> On 07/20/2018 06:19 AM, Umesh Kalappa wrote: Hi All , We are looking at the C sample i.e extern int i,j; int test() { while(1) { i++; j=20; } return 0; } command used :(gcc 8.1.0) gcc -S test.c -O2 the generated asm for x86 .L2: jmp .L2 we understand that,the infinite loop is not deterministic ,compiler is free to treat as that as UB and do aggressive optimization ,but we need keep the side effects like j=20 untouched by optimization . Please note that using the volatile qualifier for i and j or empty asm("") in the while loop,will stop the optimizer ,but we don't want do that. Anyone from the community ,please share their insights why above transformation is right ? >>> >>> The loop isn't necessarily undefined (and compilers don't look >>> for undefined behavior as opportunities to optimize code), but >> >> The variable i overflows. >> >>> because it doesn't terminate it's not possible for a conforming >>> C program to detect the side-effects in its body. The only way >>> to detect it is to examine the object code as you did. >> >> I'm not sure we perform this kind of dead code elimination but yes, we >> could. Make i unsigned and check whether that changes behavior. >> >>> Compilers are allowed (and expected) to transform source code >>> into efficient object code as long as the transformations don't >>> change the observable effects of the program. That's just what >>> happens in this case. >>> >>> Martin >> >
Re: ChangeLog's: do we have to?
On Mon, Jul 23, 2018 at 11:52:03AM +0200, Richard Biener wrote: > Does -L "work" with codebases that become more and more C++? I do realize > that grepping ChangeLogs gets more difficult here as well given there are no > clear rules how to "mangle" C++ function/entity names in ChangeLog entries. > > Fortunately we're still mostly C For example for .md files you can use [diff "md"] xfuncname = "^\\(define.*$" in your local clone's .git/config and *.md diff=md in .gitattributes (somewhere in the source tree). I should make a patch for that. Hrm. Segher
Re: O2 Agressive Optimisation by GCC
Hi! On Mon, Jul 23, 2018 at 12:36:50PM +0200, David Brown wrote: > This is nothing to do with undefined behaviour, but a matter of > scheduling of effects that are visible in different circumstances. In > particular, i and j are declared in a way that tells the compiler that > the compiler, in its current thread of execution has full control of > them. The compiler knows that while it is executing the code in test, > nothing else can affect the value of i or j, nor can they be affected by > the values of i and j. The compiler knows that code from elsewhere may > read or write them, but only before test() is called, during functions > called from test(), or after test() returns. It knows for sure that > there are no other threads of execution that interact via i and j. It could in theory know that, yes, but in this case it just hoists the assignment to after the loop. And it's an infinite loop, so it just disappears. > So how do you inhibit these kinds of optimisations? Stop lying to your > compiler. Yup :-) Segher
Re: ChangeLog's: do we have to?
On Mon, 23 Jul 2018, Segher Boessenkool wrote: > For example for .md files you can use > > [diff "md"] > xfuncname = "^\\(define.*$" > > in your local clone's .git/config > > and > > *.md diff=md > > in .gitattributes (somewhere in the source tree). Not necessarily in the source tree: individual users can put that into their $XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes. Likewise the previous snippet can go into $HOME/.gitconfig rather than each individual cloned tree. (the point is, there's no need to split this quality-of-life change between the repository and the user's setup - it can be done once by a user and will work for all future checkouts) Alexander
Re: That light at the end of the tunnel?
On 21/07/18 03:04, Eric S. Raymond wrote: > That light at the end of the tunnel turned out to be an oncoming train. > > Until recently I thought the conversion was near finished. I'd had > verified clean conversions across trunk and all branches, except for > one screwed-up branch that the management agreed we could discard. > > I had some minor issues left with execute-permission propagation and how > to interpret mid-branch deletes I solved the former and was working > on the latter. I expected to converge on a final result well before > the end of the year, probably in August or September. > > Then, as I reported here, my most recent test conversion produced > incorrect content on trunk. That's very bad, because the sheer size > of the GCC repository makes bug forensics extremely slow. Just loading > the SVN dump file for examination in reposurgeon takes 4.5 hours; full > conversions are back up to 9 hours now. The repository is growing > about as fast as my ability to find speed optimizations. > > Then it got worse. I backed up to a commit that I remembered as > producing a clean conversion, and it didn't. This can only mean that > the reposurgeon changes I've been making to handle weird branch-copy > cases have been fighting each other. > > For those of you late to the party, interpreting the operation > sequences in Subversion dump files is simple and produces results that > are easy to verify - except near branch copy operations. The way those > interact with each other and other operations is extremely murky. > > There is *a* correct semantics defined by what the Subversion code > does. But if any of the Subversion devs ever fully understood it, > they no longer do. The dump format was never documented by them. It is > only partly documented now because I reverse-engineered it. But the > document I wrote has questions in it that the Subversion devs can't > answer. > > It's not unusual for me to trip over a novel branch-copy-related > weirdness while converting a repo. Normally the way I handle this is > by performing a bisection procedure to pin down the bad commit. Then I: > > (1) Truncate the dump to the shortest leading segment that > reproduces the problem. > > (2) Perform a strip operation that replaces all content blobs with > unique small cookies that identify their source commit. Verify that it still > reproduces... > > (3) Perform a topological reduce that drops out all uninteresting > commits, that is pure content changes not adjacent to any branch > copies or property changes. Verify that it still reproduces... > > (4) Manually remove irrelevant branches with reposurgeon. > Verify that it still reproduces... > > At this point I normally have a fairly small test repository (never, > previously, more than 200 or so commits) that reproduces > the issue. I watch conversions at increasing debug levels until I > figure out what is going on. Then I fix it and the reduced dump > becomes a new regression test. > > In this way I make monotonic progress towards a dumpfile analyzer > that ever more closely emulates what the Subversion code is doing. > It's not anything like easy, and gets less so as the edge cases I'm > probing get more recondite. But until now it worked. > > The size of the GCC repository defeats this strategy. By back of the > envelope calculation, a single full bisection would take a minimum of > 18 days. Realistically it would probably be closer to a month. > So traditional git bisect is inherently serial, but we can be more creative here, surely. A single run halves the search space each time. But three machines working together can split it into 4 each run, 7 machines into 8, etc. You don't even need a precise 2^N - 1 to get a speedup. It's not as efficient computationally as running on a single machine, but it can be more efficient in terms of elapsed time. We just need to find some way of divvying up the work and then machines that are capable of running the job. They don't have to be 'big beasts, just have enough ram and not be so puny that they overall hold up the main process. Think seti@home for git bisect Surely collectively we can solve this problem... R. > That means that, under present assumptions, it's game over > and we've lost. The GCC repo is just too large and weird. > > My tools need to get a lot faster, like more than an order of > magnitude faster, before digging out of the bad situation the > conversion is now in will be practical. > > Hardware improvements won't do that. Nobody knows how to build a > machine that can crank a single process enough faster than 1.3GHz. > And the problem doesn't parallelize. > > There is a software change that might do it. I have been thinking > about translating reposurgeon from Python to Go. Preliminary > experiments with a Go version of repocutter show that it has a > 40x speed advantage over the Python version. I don't think I'll > get quite that much speedup on reposurgeon, but
Re: That light at the end of the tunnel?
On Sat, Jul 21, 2018 at 1:39 PM Eric S. Raymond wrote: > > On Sat, Jul 21, 2018 at 09:26:10AM +0200, Richard Biener wrote: > > Can you summarize what is wrong with our current git mirror which was IIRC > > created by git-svn importing? > > git-svn tends to do subtle danage damage to the back history. See my PSA at > http://esr.ibiblio.org/?p=6778 > > Partial quote: > > The problem with git-svn as a full importer is that it is not robust in > the presence > of repository malformations and edge cases – and these are all too > common, both as > a result of operator errors and scar tissue left by previous conversions > from CVS. If > anyone on your project has ever done a plain cp rather than “svn cp” when > creating a > tag directory, or deleted a branch or tag and then recreated it, or > otherwise > offended against the gods of the Subversion data model, git-svn will > cheerfully, > silently seize on that flaw and amplify the hell out of it in your git > translation. > > > The result is likely to be a repository that looks just right enough at > the head end > to hide damage further back in the history. People often fail to notice > this because > they don’t actually spend much time looking at old revisions after a > repository > conversion – but on the rare occasions when history damage bites you it’s > going to > bite hard. > > Since I wrote that I have learned that git-svn full conversions also have a > tendency > to screw up the location of branch joins. Ok, so let me ask whether you can currently convert trunk and gcc-{6,7,8}-branch successfully, ignoring "merges" into them (shouldn't have happened). All other branches can in theory be converted later if required, right? Richard.
Re: [RFC] Adding Python as a possible language and it's usage
On Tue, 17 Jul 2018, Martin Liška wrote: > I've recently touched AWK option generate machinery and it's quite > unpleasant to make any adjustments. My question is simple: can we > starting using a scripting language like Python and replace usage of the > AWK scripts? It's probably question for Steering committee, but I would > like to see feedback from community. I'd prefer Python to Awk for this code. > 4) we can come up with new sanity checks of options: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81397 More generally, I don't think there are any checks that flags specified for options are known flags at all; I expect a typo in a flag to result in it being silently ignored. Common code that reads .opt files into some logical datastructure, complete with validation including that all flags specified are in the list of valid flags, followed by converting those structures to whatever output is required, seems appropriate to me. -- Joseph S. Myers jos...@codesourcery.com
Re: ICE building a libsupc++ file, pdp11 target
On Tue, 17 Jul 2018, Paul Koning wrote: > That reveals some things but nothing jumps out at me. However... pdp11 > is an a.out target, not an ELF target. Would that explain the problem? > If yes, is there a workaround (short of implementing ELF)? As there are hardly any targets left without named section support, using ELF might be a good idea so you don't have to deal with the no-named-sections issues. The ELF e_machine value EM_PDP11 was assigned to Lars Brinkoff, l...@nocrew.org, 30 May 2002, according to the comments in ch4.eheader.html. I don't know if an actual ELF ABI has been defined. -- Joseph S. Myers jos...@codesourcery.com
Re: [RFC] Adding Python as a possible language and it's usage
On Wed, 18 Jul 2018, David Malcolm wrote: > Python 3.3 reintroduced the 'u' prefix for unicode string literals (PEP > 414), which makes it much easier to write scripts that work with both > 2.* and 3.*. Python 3.3 is almost 6 years old. I can't see u'' as of any relevance to .opt parsing. Both the .opt files, and the generated output from them, should be pure ASCII, and using native str throughout (never using Python 2 unicode) should work fine. (I don't see much value in declaring support for EOL versions of Python, i.e. anything before 2.7 and 3.4, but if we do, I don't think u'' will be a feature that controls which versions are supported.) -- Joseph S. Myers jos...@codesourcery.com
Re: ICE building a libsupc++ file, pdp11 target
> On Jul 23, 2018, at 10:21 AM, Joseph Myers wrote: > > On Tue, 17 Jul 2018, Paul Koning wrote: > >> That reveals some things but nothing jumps out at me. However... pdp11 >> is an a.out target, not an ELF target. Would that explain the problem? >> If yes, is there a workaround (short of implementing ELF)? > > As there are hardly any targets left without named section support, using > ELF might be a good idea so you don't have to deal with the > no-named-sections issues. > > The ELF e_machine value EM_PDP11 was assigned to Lars Brinkoff, > l...@nocrew.org, 30 May 2002, according to the comments in > ch4.eheader.html. I don't know if an actual ELF ABI has been defined. I don't know of pdp11 ELF code in binutils. The named-section stuff itself doesn't seem to be directly related. If I run the test with the target flag -mdec-asm, it still fails. That mode does support named sections. I can easily see the issue with the debugger and compare with a target that works (vax). So I should be able to find this, at least once I figure out how to turn off address space randomization on my host. paul
Re: ChangeLog's: do we have to?
On Mon, Jul 23, 2018 at 04:03:31PM +0300, Alexander Monakov wrote: > Not necessarily in the source tree: individual users can put that into their > $XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes. And then that user has *all* .md files treated as GCC machine description files. But .md is a quite common extension. Segher
Re: ChangeLog's: do we have to?
On 7/23/18, Segher Boessenkool wrote: > On Mon, Jul 23, 2018 at 04:03:31PM +0300, Alexander Monakov wrote: >> Not necessarily in the source tree: individual users can put that into >> their >> $XDG_CONFIG_HOME/git/attributes or $HOME/.config/git/attributes. > > And then that user has *all* .md files treated as GCC machine description > files. But .md is a quite common extension. > > > Segher > Yeah, it's usually used for markdown. This file extension clash came up previously when trying to generate doxygen documentation for gcc sources: https://gcc.gnu.org/ml/gcc/2017-06/msg00063.html
Re: ChangeLog's: do we have to?
On Mon, 23 Jul 2018, Richard Biener wrote: > Does -L "work" with codebases that become more and more C++? I do realize Well, you can specify an arbitrary regular expression for your funcname line with -L if you need to. -- Joseph S. Myers jos...@codesourcery.com
Re: That light at the end of the tunnel?
On Mon, 23 Jul 2018, Richard Earnshaw (lists) wrote: > So traditional git bisect is inherently serial, but we can be more > creative here, surely. A single run halves the search space each time. > But three machines working together can split it into 4 each run, 7 > machines into 8, etc. You don't even need a precise 2^N - 1 to get a > speedup. Exactly. Given an appropriate recipe for testing whether the conversion of history up to a given revision is OK or not, I can run tests in parallel for nine different revisions on nine different machines (each with 128 GB memory) at the same time as easily as running one such test. (And the conversions for shorter initial segments of history should be faster, so if the bug turns out to relate to conversion of cvs2svn scar tissue, you don't even need to wait for the conversions of longer portions of history to complete before you've narrowed down where the bug is and can start another such bisection.) I think parallelising the bisection process is a better approach than trying to convert only a subset of branches (which I don't think would help with the present problem - though we can always consider killing selected branches with too many mid-branch deletealls, if appropriate) or waiting for a move to Go. -- Joseph S. Myers jos...@codesourcery.com
Re: That light at the end of the tunnel?
On 07/23/2018 08:53 AM, Joseph Myers wrote: > On Mon, 23 Jul 2018, Richard Earnshaw (lists) wrote: > >> So traditional git bisect is inherently serial, but we can be more >> creative here, surely. A single run halves the search space each time. >> But three machines working together can split it into 4 each run, 7 >> machines into 8, etc. You don't even need a precise 2^N - 1 to get a >> speedup. > > Exactly. Given an appropriate recipe for testing whether the conversion > of history up to a given revision is OK or not, I can run tests in > parallel for nine different revisions on nine different machines (each > with 128 GB memory) at the same time as easily as running one such test. > (And the conversions for shorter initial segments of history should be > faster, so if the bug turns out to relate to conversion of cvs2svn scar > tissue, you don't even need to wait for the conversions of longer portions > of history to complete before you've narrowed down where the bug is and > can start another such bisection.) > > I think parallelising the bisection process is a better approach than > trying to convert only a subset of branches (which I don't think would > help with the present problem - though we can always consider killing > selected branches with too many mid-branch deletealls, if appropriate) or > waiting for a move to Go. Hell, I'd live with doing a "reasonable effort" for the vast majority of our branches. Other than the trunk, active release branches and a few active development branches I don't think we really care about 'em. jeff
Re: That light at the end of the tunnel?
> On Jul 23, 2018, at 12:21 PM, Jeff Law wrote: > >> > Hell, I'd live with doing a "reasonable effort" for the vast majority of > our branches. Other than the trunk, active release branches and a few > active development branches I don't think we really care about 'em. > > jeff There are two approaches to conversion: (1) convert what's active and preserve the old system indefinitely for reference access; (2) convert everything 100% so the old system can be retired. It seems that Eric has been trying for #2, which is fine if doable. But #1 is also a reasonable option and if the nature of the beast makes #2 acceptable, going for #1 is a plan I would definitely support. paul
Re: That light at the end of the tunnel?
On 07/23/2018 10:29 AM, Paul Koning wrote: > > >> On Jul 23, 2018, at 12:21 PM, Jeff Law wrote: >> >>> >> Hell, I'd live with doing a "reasonable effort" for the vast majority of >> our branches. Other than the trunk, active release branches and a few >> active development branches I don't think we really care about 'em. >> >> jeff > > There are two approaches to conversion: (1) convert what's active and > preserve the old system indefinitely for reference access; (2) convert > everything 100% so the old system can be retired. > > It seems that Eric has been trying for #2, which is fine if doable. But #1 > is also a reasonable option and if the nature of the beast makes #2 > acceptable, going for #1 is a plan I would definitely support. Yea. I suspect we'll keep the SVN repo around read-only essentially forever so the links in bugzilla continue to work. There's probably other uses of the SVN version #s that we'd like to preserve. jeff
Re: That light at the end of the tunnel?
On Mon, 23 Jul 2018, Jeff Law wrote: > > There are two approaches to conversion: (1) convert what's active and > > preserve the old system indefinitely for reference access; (2) convert > > everything 100% so the old system can be retired. > > > > It seems that Eric has been trying for #2, which is fine if doable. > > But #1 is also a reasonable option and if the nature of the beast > > makes #2 acceptable, going for #1 is a plan I would definitely > > support. > Yea. I suspect we'll keep the SVN repo around read-only essentially > forever so the links in bugzilla continue to work. There's probably > other uses of the SVN version #s that we'd like to preserve. We'll obviously keep SVN around readonly just as with CVS. The problem described is not one for which only keeping a few selected branches would actually help at all. If we get to a point where everything converts OK except for a few obscure non-trunk branches that have problems (possibly mid-branch deletealls), then we can consider excluding those branches (starting from a baseline of keeping all branches that are still present in SVN, then identify particular problem branches to exclude). -- Joseph S. Myers jos...@codesourcery.com
Re: ChangeLog's: do we have to?
On Mon, Jul 23, 2018 at 02:48:12PM +, Joseph Myers wrote: > On Mon, 23 Jul 2018, Richard Biener wrote: > > > Does -L "work" with codebases that become more and more C++? I do realize > > Well, you can specify an arbitrary regular expression for your funcname > line with -L if you need to. or use -L start_line,end_line:file to get the history of some arbitrary chunk of a file. Trev > > -- > Joseph S. Myers > jos...@codesourcery.com