On Tue, Jul 14, 2015 at 9:55 AM, Adrian Prantl <[email protected]> wrote:
> > On Jul 14, 2015, at 8:25 AM, David Blaikie <[email protected]> wrote: > > > > On Mon, Jul 13, 2015 at 7:25 PM, Richard Smith <[email protected]> > wrote: > >> On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <[email protected]> wrote: >> >>> >>> On Jul 13, 2015, at 5:47 PM, Richard Smith <[email protected]> >>> wrote: >>> >>> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <[email protected]> >>> wrote: >>> >>>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <[email protected]> >>>> wrote: >>>> > >>>> > Hi Adrian, >>>> > >>>> > Finally getting around to looking at some of this and I think it's >>>> going in slightly the wrong direction. In general I think begin -able- to >>>> put modules in object files to simplify wrapping, use, etc is a good thing. >>>> I think being required to do so is somewhat problematic. >>>> > >>>> >>>> Let me start with that the current infrastructure already allows >>>> selecting whether you want wrapped modules or not by passing the >>>> appropriate PCHContainerOperations object to CompilerInstance. Clang >>>> currently unconditionally uses an object file wrapper, all of >>>> clang-tools-extra doesn’t. We could easily control the behavior of clang >>>> based on a (new) command line option. >>>> >>>> But.. on a platform with a shared module cache you always have to >>>> assume that a module once built will eventually be used by a client that >>>> wants to read the debug info. Think llvm-dsymutil — it does not know and >>>> does not want to know how to build clang modules, but does want to read all >>>> the debug info from a clang module. >>>> >>>> > Imagine, for example, you have a giant distributed build system... >>>> > >>>> > You'd want to create a pile of modules (that may >>>> reference/include/etc other modules) that aren't don't or may not have >>>> debug information as part of them (because you might want to build without >>>> it or have the debug info alongside it as a separate compilation). Waiting >>>> on the full build of the module including debug is going to adversely >>>> affect your overall build time and so shouldn't be necessary - especially >>>> if you want to be able to have information separate ultimately. >>>> > >>>> > Make sense? >>>> >>>> Not sure if you would be saving much by having the debug info >>>> separately, from what I’ve measured so far the debug info for a module >>>> makes up less than 10% of the total size. Admittedly, build-time-wise going >>>> through the backend to emit the object file is a lot more expensive than >>>> just dumping the raw PCH. [1] >>>> >>>> Yeah, I think wanting to be able to control the behavior is reasonable, >>>> we just need to be careful what the implications for consumers are. If we >>>> add a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the >>>> object file wrapping, I’d strongly suggest that we add the value of this >>>> switch to the module hash (or add a an optional “-g” to the module file >>>> name after the hash or something like that) to avoid ugly race conditions >>>> between debug info and non-debug-info builds of the same module. This way >>>> we’d have essentially two separate module caches, with and without debug >>>> info. >>>> >>> >>> That's fine, I think (we don't use a module cache at all in our build >>> system; it doesn't really make much sense for a distributed build) and most >>> command-line flag changes already have this effect. >>> >>> >>> Great! >>> >>> >>> >>>> would that work for you? >>>> -- adrian >>>> >>>> [1] If you want to be serious about building the module debug info in >>>> parallel to the rest of the build, you could even have a clang-based tool >>>> import the just-built raw clang module and emit the debug info without >>>> having to parse the headers again :-) >>>> >>> >>> That is what we intend to do :) (Assuming this turns out to actually be >>> faster than re-parsing; faulting in the entire contents of a module has >>> much worse locality than parsing.) >>> >>> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...? >>>> I would imagine that the driver enables module debug info when >>>> "-gmodules” is present and by default on Darwin. >>> >>> >>> That seems reasonable to me. For the frontend flag, I think a flag to >>> turn this on or to select the module format makes more sense than a flag to >>> switch to the raw format. >>> >>> >>> Okay then let’s narrow this down. Other possibilities in that direction >>> include (sorted from subjectively best to worst) >>> >>> -fmodule-format=obj >>> -fmodule-debug-info >>> -ffat-modules >>> -fmodule-container >>> -fmodule-container-object >>> >> >> It's a -cc1 flag, so it doesn't really matter much. If this will >> eventually govern whether we put code for inline functions into the module, >> then I think we should avoid names like -fmodule-debug-info. Other than >> that, I don't really have a preference. >> > > > Unless the “=“ part turns out to be an implementation nightmare, I think > I’ll be going with -fmodule-format=[raw,obj] then and implicitly emit debug > info in the obj case. If necessary, we can make this more fine grained > later. > > What you're picturing there is essentially a flag that would indicate if > we should build all module-related-object-things into the module, or not? > That seems like a useful broad flag (with an eventual corresponding > compiler mode where we pass another flag and explicitly pass just the > module and say "build a separate object with all the > module-related-object-things - for use in a non-implicit-cache build) > > (Hmm, we're going to have a weird middle ground in here - where the IR for > the inline functions needs to go in the module itself (as an > available_externally definition for use in non-LTO compilations of > dependent object files) and then the > build-separate-module-related-object-things would turn those into (weak?) > definitions, compile them (& the debug info) into a separate object file, > to be linked in at the end) > > > Can you elaborate this use-case? > So the use cases that have often been bandied about, that I'm referring to here are: 1) including inline function IR in the module to be used by each compilation that depends on the module - so each inline function doesn't have to be IRGen'd in every /use/ of a module, just once when the module is built 2) include a single definition of the actual machine code for inline functions from modules and link that into the final program (so that the functions in (1) can be available_externally, used for inlining opportunities during compilation, but never generate machine code in the object files that depend on the module) > Are you saying you’d want a module object file with ast+bitcode and > another one with bitcode'+debug info built from the first one? Or one raw > ast file and two object files? > What I'd be picturing would be ast+bitcode and object code+(optional debug info (if it's a debug build)). > > > Should this just be keyed/defaulted off implicit/explicit modules, or > orthogonal to that choice? > >> [One other thing... I think we may have made a mistake by putting the >>> reader and writer code behind the same interface: it forces tools that want >>> to read the module format to link against all of LLVM IR, code generation, >>> and so on, when all they really need is something like libObject.] >>> >>> >>> We can always split it into two implementations of the interface or two >>> interfaces, that’s not a very big deal. My assumption was that every tool >>> that wants to read the clang module format also wants to create modules >>> (because module cache... but as you noted that’s a Darwin-centric view) and >>> more low-level tools like llvm-bcanalyzer could be piped through >>> llvm-objdump. >>> >> > -- adrian > >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
