Developing GCC
What's the best way to learn about developing GCC? Not developing in GCC, but understanding and extending the compiler's design itself? Thanks, Rick C. Hodgin
Re: [Bulk] Re: Edit-and-continue
Terrence, Procedure entry points, global and local variable locations in memory, structure definitions and offsets, etc. These would all have to be updated as changes are made, and that means each reference used in the executable would need to be updated, and that could mean several source files are recompiled with a single change which affects it. And if there are shared resources loaded, each of those would have to be updated as well, that is if we wanted to go "whole hog" like that. Otherwise, we could limit the changes to only the currently-executing program. The idea of having function entry points across the board for all executed code would be required, allowing those links to be updated dynamically at run-time. We could even use a memory-based lookup table that's updated by gdb to the new entry points for the executable code. It would be slower for execution, but for development the time savings would be there because changes could be made on the fly, recompiled, memory variables changed as needed, and then continue execution without restarting the entire app. The ability would also have to be created to allow local variable re-mapping across these updates, so that if the stack changed, the data on the stack is migrated from its old locations to the new ones. This would be a table built by gcc that's passed to gdb for each function, where gdb updates the stack in that way. If parent functions on the call stack were updated, they would have to be altered on the stack as well. This could be accommodated by automatically including a specified amount of "extra space" in local variable space per function entry point, something like 32 bytes (eight 4-byte variables) by default, with a compiler switch to increase that block size. This extra space would allow for a certain number of new variables before a full recompile is required again. We could also have #pragma-like statements for individual functions where we know some heavy changes will be used, to give them an extra 512 bytes, or whatever's specified. - Rick C. Hodgin On Sun, 2010-07-18 at 12:36 -0700, Terrence Miller wrote: > If you are willing to restrict edit-and-continue to whole procedures > then minimal changes to the compiled > code for procedure entry points is all that is required (well that and > dlopen). > >Terrence MIller > > On 7/18/2010 12:14 PM, Dave Korn wrote: > > On 18/07/2010 16:28, Robert Dewar wrote: > > > >> Rick Hodgin wrote: > >> > >>> Ian, > >>> > >>> The idea is to create a program database of the compiled program on a > >>> full compile. Then when asked to re-compile with the > >>> edit-and-continue switch, it only looks for changed code and compiles > >>> those few lines. Everything else it needs to carry out compilation is > >>> there from previous full-compile as was originally parsed, or from > >>> subsequent edit-and-continue compiles which updated the database. > >>> > >> Unlikely to be feasible in my view without slowing down compilation > >> substantially. > >> > >I think you're probably assuming too much. Tom T. is working on an > > incremental compiler, isn't he? I expect that that and LTO between them > > would > > / could give us all the tools we needed to make an EAC-friendly compiler. > > > >But yes, OP, it's a long-term project. > > > > cheers, > >DaveK > > > > > > > >
Re: Edit-and-continue
On Sun, 2010-07-18 at 19:46 +0100, Jonathan Wakely wrote: > On 18 July 2010 16:25, Rick Hodgin wrote: > > Ian, > > > > The idea is to create a program database of the compiled program on a full > > compile. Then when asked to re-compile with the edit-and-continue switch, > > it only looks for changed code and compiles those few lines. Everything > > else it needs to carry out compilation is there from previous full-compile > > as was originally parsed, or from subsequent edit-and-continue compiles > > which updated the database. > > > > The resulting changes are passed to gdb for insertion into the running > > program's memory in real-time. > > That might be harder to do for optimised code, where there isn't > necessarily a direct correspondence between individual source lines > and executable code. IIUC Visual Studio will only debug unoptimised > code, gcc doesn't have the same distinction between "debug build" and > "release build" - you can debug optimised code. It would also need > more integration between gcc and gdb than currently exists. Jonathan, Visual Studio will debug optimized code, but it is difficult to do because of the loss of 1:1 ratio between source code and executable instructions. This is especially difficult in mixed-mode where you see source code alongside disassembled machine code (assembly instructions). Plus, the VS optimizations move loop tests to unusual locations for the target CPU, etc. But edit-and-continue is always available in Visual Studio if the original program database was specified when last compiled. The integration would have to be added, but if we can produce through gcc (and ultimately g++) a fixed kind of output that describes "what's changed" since it was re-compiled, then it would be easy to add those features to gdb, because the only ones required would be: 1) Update to a memory table for function offsets 2) Update to global memory space to move old variables to new locations 3) Update to local memory space to move old variables to new locations 4) Ability to add new global, local memory variables. 5) Ability to add new functions to the table. Everything else should just be loading the newly changed functions to some location in memory that gdb will likely assign, so as to derive its location for the memory-based table, and to leave everything that did not change where it was. To be clear: I'm talking about creating this ability to generate function call code that does not directly call its target function offset, but instead calls a known location in memory that is rigid and unchanging always, which itself references either dynamically updated code as needed, or calls a reference into a memory-based table which points to the new function locations. Just my thoughts. - Rick C. Hodgin
Re: [Bulk] Re: Edit-and-continue
Jonathan, If you run Linux, you can download VMware, and install a version of Windows XP or later) and download Visual Studio Express from Microsoft for free. You can experiment with it and see how useful it is. It's pretty darned amazing actually. Once you use it, you'll always miss it. Simple little mistakes like: for (i=0; i On 18 July 2010 20:52, Rick C. Hodgin wrote: > > > > The idea of having function entry points across the board for all > > executed code would be required, allowing those links to be updated > > dynamically at run-time. We could even use a memory-based lookup table > > that's updated by gdb to the new entry points for the executable code. > > It would be slower for execution, but for development the time savings > > would be there because changes could be made on the fly, recompiled, > > memory variables changed as needed, and then continue execution without > > restarting the entire app. > > I run the compiler a lot more than I run the debugger, so I'm not > entirely sold on the idea of development time savings, but I haven't > used Edit-and-Continue so I can't say how useful it is.
x86 assembler syntax
All, Is there an Intel-syntax compatible option for GCC or G++? And if not, why not? It's so much cleaner than AT&T's. - Rick C. Hodgin
Re: [Bulk] Re: x86 assembler syntax
Tim, Nice. It reads: "3.2.3. Intel syntax - Good news are that starting from binutils 2.10 release, GAS supports Intel syntax too. It can be triggered with .intel_syntax directive. Unfortunately this mode is not documented (yet?) in the official binutils manual, so if you want to use it, try to examine http://www.lxhp.in-berlin.de/lhpas86.html, which is an extract from AMD 64bit port of binutils 2.11." I tried a sample with asm(".intel_syntax; int 3") and it seemed to compile/assemble that line correctly, instead of asm("int $0x3"). But my other AT&T syntax commands all failed after that. So, this directive must be a global setting, and not an instance-by-instance setting. Thanks for the search, Tim. :-) - Rick On Sun, 2010-08-08 at 23:37 -0700, Tim Prince wrote: > On 8/8/2010 10:21 PM, Rick C. Hodgin wrote: > > All, > > > > Is there an Intel-syntax compatible option for GCC or G++? And if not, > > why not? It's so much cleaner than AT&T's. > > > > - Rick C. Hodgin > > > > > > > I don't know how you get along without a search engine. What about > http://tldp.org/HOWTO/Assembly-HOWTO/gas.html ? >
Re: x86 assembler syntax
> "Rick C. Hodgin" writes: > > Is there an Intel-syntax compatible option for GCC or G++? And if not, > > why not? It's so much cleaner than AT&T's. > -masm=intel > This question would have been more appropriate on the gcc-help mailing > list. -Ian Lance Taylor My apologies to everyone. I did not know such a list existed. - Rick C. Hodgin
Simple development GCC and G++
A while back I posted a question about adding edit-and-continue abilities. Since then I've begun work on a rapid development compiler (rdc) which has these abilities, but I've had some thoughts I'd like to share, and receive some feedback from the GCC experts: What if a subset of GCC could be used for this? A portion which uses the parser to push properly parsed instructions to a new code generator module which handles the entirety of the edit-and-continue requirements? I would offer to write these. These abilities would be one part of the other part, which would be changes to GDB which allow the changes to be updated against the executable code image in memory. The goal here would not be to have ANY code speed, code size, or anything glorious in the generated code, but would be only a true and wholly accurate representation in execution as dictated by the original source code, with all traditional step-through line-by-line debugging abilities being generated, but with the added new ability to allow compilation of changed source code, merged into the existing executable and image in memory without leaving or restarting the app. The product I've been developing is a simple C/C++ compiler that basically does what I've described. It's in early stages, and it occurred to me that it might be easy or easier for me to add this new edit-and-continue module to both GCC and GDB, and then have the existing GCC community reap the benefits of this added feature, being maintained from here on out -- by me at first, but whoever else would want to do it at some point. Any thoughts? Thank you in advance. - Rick C. Hodgin - Email me for phone contact info if you want to talk to me personally.
Re: Simple development GCC and G++
> Most of the interesting bits happen in the linker+debugger... Agreed, which is why the compiler side doesn't matter much in terms of how much it does, just so long as it is correct and compliant. This is the hard part I'm finding, which is why I'm looking to the GCC community which spends a great deal of effort maintaining that compliance. The GCC portion (which generates the input to this process which writes the alternate kind of output which accommodates everything necessary to feed the linker and program database with enough information that it may handle multiple compiles over time, etc.) coupled to this new "managing back-end" would enable the ability, when coupled also to changes in gdb which allow it. > What you then need is a linker that will relink an image, changing it as > little as possible relative to a previous run. Your debugger then needs to > be > able to replace these images in-place (which may require interaction with the > dynamic linker). Exactly. That is exactly it. > Be warned that this ends up being extremely complex, even for relatively > simple changes. If the design was implemented in such a way that each source code line or function was contained in its own logical unit, then it becomes a simple matter of replacement. The linker would determine code generation size to see if it could overlay the existing block that was previously at that location before the code change, and if so then overwrite it, adjusting the instruction pointer to the new instruction location in the process, or if it could not, then it would NOP-out everything that's there and insert a branching instruction to some new location tagged on to the end of the executable, which executes the new, larger amount of code, before branching back then to continue in the program where it would've been originally. If you look at the way Microsoft handles this by tracing through the disassembly view, it is very straight-forward, though I agree with you that it can be very complex. - Rick
Re: atomicity of x86 bt/bts/btr/btc?
> ;; %%% bts, btr, btc, bt. > ;; In general these instructions are *slow* when applied to memory, > ;; since they enforce atomic operation. When applied to registers, > > I haven't found documented confirmation that these instructions are atomic > without a lock prefix, > having checked Intel and AMD documentation and random web searching. > They are mentioned as instructions that can be used with lock prefix. They do not automatically lock the bus. They will lock the bus with the explicit LOCK prefix, and BTS is typically used for an atomic read/write operation. - Rick
RE: atomicity of x86 bt/bts/btr/btc?
> > They do not automatically lock the bus. They will lock the bus with the > > explicit LOCK prefix, and BTS is typically used for an atomic read/write > > operation. > Thanks Rick. > I'll go back to using them. > I'm optimizing mainly for size. > The comment should perhaps be amended. > The "since they enforce atomic operation" part seems wrong. Np. For citation, see here (page 166). http://www.intel.com/Assets/PDF/manual/253666.pdf - Rick
"self" keyword
How hard would it be to implement a "self" keyword extension which references the contextual function name wherein it was referenced? int foo(int a) { // recursion self(a + 1); } int food(int a) { // recursion self(a + 1); } Obviously not a useful example, but demonstrates that to call each function it's in again that it can be done without knowing the function name. Best regards, Rick C. Hodgin
Re: "self" keyword
Ian, I was thinking C and C++. int myclass::foo(int a) { // recursion self(a + 1); } Just out of curiosity, why wouldn't it be accepted back into mainline? Thanks for your help. :-) Best regards, Rick C. Hodgin On 06/14/2012 12:48 PM, Ian Lance Taylor wrote: "Rick C. Hodgin" writes: How hard would it be to implement a "self" keyword extension which references the contextual function name wherein it was referenced? int foo(int a) { // recursion self(a + 1); } int food(int a) { // recursion self(a + 1); } Obviously not a useful example, but demonstrates that to call each function it's in again that it can be done without knowing the function name. I assume you are asking about C? It would be easy to implement. The compiler always know what function it is compiling. But I don't think the extension would be accepted back into GCC mainline. Ian
Re: "self" keyword
David, Oh! Well, it doesn't have to be called self. :-) It could be __self__ or whatever would be fine. I see C99 has __FUNC__ for the current function name used in strings. But, I was thinking more of an actual reference to the current function as a function entity, sort of like a name substitution. Best regards, Rick C. Hodgin On 06/14/2012 01:08 PM, David Malcolm wrote: FWIW "self" today is a perfectly good variable name, and practically all C and C++ code that interacts with Python (including the C implementation of Python itself) uses "self" to name variables throughout: many thousands of projects, many millions of lines of code. Having this snatched away as a keyword under some compiler settings would be a major PITA. On Thu, 2012-06-14 at 12:53 -0400, Rick C. Hodgin wrote: Ian, I was thinking C and C++. int myclass::foo(int a) { // recursion self(a + 1); } Just out of curiosity, why wouldn't it be accepted back into mainline? Thanks for your help. :-) Best regards, Rick C. Hodgin On 06/14/2012 12:48 PM, Ian Lance Taylor wrote: "Rick C. Hodgin" writes: How hard would it be to implement a "self" keyword extension which references the contextual function name wherein it was referenced? int foo(int a) { // recursion self(a + 1); } int food(int a) { // recursion self(a + 1); } Obviously not a useful example, but demonstrates that to call each function it's in again that it can be done without knowing the function name. I assume you are asking about C? It would be easy to implement. The compiler always know what function it is compiling. But I don't think the extension would be accepted back into GCC mainline. Ian
Re: "self" keyword
David, Well, I probably don't have a NEED for it. I've gotten along for 25+ years without it. :-) However, what prompted my inquiry is using it would've saved me tracking down a few bugs in recent weeks. Some prior code was re-used for a similar function, but the name of the recursive calls weren't updated in every case. It didn't take long to debug, but I realized that had it always been written as self() it never would've been an issue. I can also see a use for generated code where there's a base source code template in use with an embedded include file reference that changes as it's generated per pass, such as: int step1(int a, int b) { #include "\current_task\step1.cpp" } int step2(int a, int b) { #include "\current_task\step2.cpp" } Using the self() reference for recursion, one could modify stepN.cpp's generator algorithms without having to know or care anything in the wrapper code. Likewise, the wrapper could be modified without having to concern itself with anything in the generated code, save some requirements of an API like a "print_notice()" or "print_error()" message function, which could just be a requirement of the app to always be there. The rest, however, could be fluid. A few other uses I can think of as well. Minor ones. Best regards, Rick C. Hodgin On 06/14/2012 04:24 PM, David Brown wrote: On 14/06/12 19:31, Joe Buck wrote: It only saves one character in any case: your "self" is just "*this". No, "this" points to the object in C++. The OP's "self" is referring to the function being compiled. So here "self" would be the same as "foo". I don't think there is any way to get this without making a language extension, unless there is some way of turning the string __FUNC__ into the function. But I also don't see any advantage over simply using the function name directly. After all, how often do you need recursion - and what is the problem with writing out the function name in full on those occasions? mvh., David ________ From: gcc-ow...@gcc.gnu.org [gcc-ow...@gcc.gnu.org] on behalf of Ian Lance Taylor [i...@google.com] Sent: Thursday, June 14, 2012 10:19 AM To: Rick C. Hodgin Cc: gcc@gcc.gnu.org Subject: Re: "self" keyword "Rick C. Hodgin" writes: I was thinking C and C++. int myclass::foo(int a) { // recursion self(a + 1); } Just out of curiosity, why wouldn't it be accepted back into mainline? In general these days GCC discourages language extensions. They would have to have a compelling advantage. I don't see that here. Even if I did, I would recommend running it through a language standards body first. Ian
Re: "self" keyword
Andreas, That would work. But now I'm back to remembering to fix something when I copy / re-use code. I'll admit it's minor. But we have tools to help us for a reason, right? :-) Best regards, Rick C. Hodgin On 06/14/2012 04:38 PM, Andreas Schwab wrote: "Rick C. Hodgin" writes: I can also see a use for generated code where there's a base source code template in use with an embedded include file reference that changes as it's generated per pass, such as: int step1(int a, int b) { #define self step1 #include "\current_task\step1.cpp" #undef self } int step2(int a, int b) { #define self step2 #include "\current_task\step2.cpp" #undef self } Andreas.
Re: "self" keyword
That would work. Yet now I'm back to remembering to update that line of code equating self to its function name at each use. My desire for "self" as a keyword is in looking for a way to use contextual information the compiler already knows about and can easily employ. Best regards, Rick C. Hodgin Original Message From: Václav Zeman Sent: Fri, Jun 15, 2012 08:08 AM To: Oleg Endo CC: Rick C. Hodgin ; David Brown ; Joe Buck ; Ian Lance Taylor ; gcc@gcc.gnu.org Subject: Re: "self" keyword >On 14 June 2012 22:42, Oleg Endo wrote: >> On Thu, 2012-06-14 at 16:34 -0400, Rick C. Hodgin wrote: >>> David, >>> >>> Well, I probably don't have a NEED for it. I've gotten along for 25+ >>> years without it. :-) >>> >>> However, what prompted my inquiry is using it would've saved me tracking >>> down a few bugs in recent weeks. Some prior code was re-used for a >>> similar function, but the name of the recursive calls weren't updated in >>> every case. It didn't take long to debug, but I realized that had it >>> always been written as self() it never would've been an issue. >>> >>> I can also see a use for generated code where there's a base source code >>> template in use with an embedded include file reference that changes as >>> it's generated per pass, such as: >>> >>> int step1(int a, int b) >>> { >>> #include "\current_task\step1.cpp" >>> } >>> >>> int step2(int a, int b) >>> { >>> #include "\current_task\step2.cpp" >>> } >>> >>> Using the self() reference for recursion, one could modify stepN.cpp's >>> generator algorithms without having to know or care anything in the >>> wrapper code. >> >> Wouldn't this do? >> >> #define __self__ step1 >> int __self__ (int a, int b) >> { >> #include "something" >> __self__ (x, y); >> } >> #undef __self__ >You can already do this with GCC in C and C++ (minus problems with >overloaded functions) like this: > >#define DECLSELF(f,self) __typeof__ (&f) self = f > >int foo (int n) >{ >DECLSELF(foo, self); > >if (n == 0) >return 0; >else >{ >return 1 + self (n - 1); >} >} > >-- >VZ
Re: Add corollary extension
How would you handle: isSystemClosed = true; You're getting into nasty looking/non-obvious code to use language-existing features for an ability that 1) is fundamental to software and 2) should have native support without kludges. Many CPUs even support the write-back NOT of a flag condition natively. x86 has the SETcc instructions, for example, which are native to this concept. ARM has predicates. It is already there at the CPU level. Best regards, Rick C. Hodgin Original Message From: James Dennett Sent: Thu, Jun 28, 2012 04:14 PM To: Rick Hodgin CC: Jonathan Wakely ; gcc Subject: Re: Add corollary extension >On Thu, Jun 28, 2012 at 12:39 PM, Rick Hodgin wrote: >>> Why do you want to bother with a non-standard, >>> unportable extension instead of just writing: >>> >>> inline bool isSystemClosed() >>> { return !isSystemOpen; } >>> >>> Which is simple, conventional, easy to understand >>> and portable. >>> >>> Or in C++ just define a suitable type, instead of >>> needing changes to the core language: >>> >>> struct inv_bool { >>> bool& b; >>> operator bool() const { return !b; } >>> }; >>> >>> inv_bool isSystemClosed = { isSystemOpen }; >> >> There are certain fundamentals in data processing. The inverse bool is one >> of them. Why not be able to reference it more naturally in code utilizing >> something the compiler already knows about and can wield effortlessly? >> >> I've thought more about the syntax, and I see this making more sense: >> bool isSystemOpen[!isSystemClosed]; >> >> As the inverse bool relationship is fundamental in software, I hope this >> will become a C/C++ standard. > >I really can't imagine that happening. As the logical not operation >is fundamental, we already have notation for it. Why would we add the >complexity of something that looks like a variable but acts like a >function? In C++ you can already write > auto isSystemOpen = [&isSystemClosed] { return !isSystemClosed; }; >and then use isSystemOpen(), without doing violence to variable >notation. You can obscure that behind a macro in your own code if you >wish, as in > #define OPPOSITE(realVariable) (&realVariable] { return !realVariable; }) > auto isSystemOpen = OPPOSITE(isSystemClosed); >but wanting to make something that's a function look like a variable >isn't likely to get much traction in either language. > >-- James
Re: Add corollary extension
In a boolean variable, there are two fundamental ways to examine: as it is, or !(as it is). Using the same memory location to access those two base / fundamental extents of its very nature, while new in concept to C/C++ hackers, is not new in any degree of concept. Five year olds use this in speech everyday. The only thing I propose is to give boolean variables their full roundness of use in the C and C++ languages. As they are today, half of their abilities are natively exposed to the developer, the other half are suppressed and hidden behind reverse logic, yielding more icky code than need be. This is my last comment on the post. It's just that I just believe that anything worth doing is worth doing rightly, and completely. Not having native inverse bool support seems incomplete to me. Best regards, Rick C. Hodgin Original Message From: James Dennett Sent: Thu, Jun 28, 2012 06:24 PM To: Rick C. Hodgin CC: Jonathan Wakely ; gcc Subject: Re: Add corollary extension >On Thu, Jun 28, 2012 at 3:08 PM, Rick C. Hodgin wrote: >> How would you handle: >> >> isSystemClosed = true; > >A good clean error message is ideal, and should be easy. (A proxy >object such as inv_bool can do this easily enough, but it's still >going to hurt readability.) > >> You're getting into nasty looking/non-obvious code to use language-existing >> features for an ability that 1) is fundamental to software and 2) should >> have native support without kludges. > >No, the ability to have two different variables with different >semantics but shared backing store is not fundamental, it's rather a >violation of the law of least surprise. > >(I'm not buying that lambdas are "nasty looking" or "non-obvious" for >C++ users. They're a fundamental part of the language.) > >I understand that you really like your notation, and think that it's a >clear win. I don't think you'll have luck persuading compiler writers >or language committees, but maybe you'll prove me wrong. > >-- James