Re: r241620 - Wrap clang modules and pch files in an object file container.

Adrian Prantl Tue, 14 Jul 2015 10:00:29 -0700

> On Jul 14, 2015, at 8:25 AM, David Blaikie <[email protected]> wrote:
> 
> 
> 
> On Mon, Jul 13, 2015 at 7:25 PM, Richard Smith <[email protected] 
> <mailto:[email protected]>> wrote:
> On Mon, Jul 13, 2015 at 6:02 PM, Adrian Prantl <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Jul 13, 2015, at 5:47 PM, Richard Smith <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> On Mon, Jul 13, 2015 at 3:06 PM, Adrian Prantl <[email protected] 
>> <mailto:[email protected]>> wrote:
>> > On Jul 13, 2015, at 2:00 PM, Eric Christopher <[email protected] 
>> > <mailto:[email protected]>> wrote:
>> >
>> > Hi Adrian,
>> >
>> > Finally getting around to looking at some of this and I think it's going 
>> > in slightly the wrong direction. In general I think begin -able- to put 
>> > modules in object files to simplify wrapping, use, etc is a good thing. I 
>> > think being required to do so is somewhat problematic.
>> >
>> 
>> Let me start with that the current infrastructure already allows selecting 
>> whether you want wrapped modules or not by passing the appropriate 
>> PCHContainerOperations object to CompilerInstance. Clang currently 
>> unconditionally uses an object file wrapper, all of clang-tools-extra 
>> doesn’t. We could easily control the behavior of clang based on a (new) 
>> command line option.
>> 
>> But.. on a platform with a shared module cache you always have to assume 
>> that a module once built will eventually be used by a client that wants to 
>> read the debug info. Think llvm-dsymutil — it does not know and does not 
>> want to know how to build clang modules, but does want to read all the debug 
>> info from a clang module.
>> 
>> > Imagine, for example, you have a giant distributed build system...
>> >
>> > You'd want to create a pile of modules (that may reference/include/etc 
>> > other modules) that aren't don't or may not have debug information as part 
>> > of them (because you might want to build without it or have the debug info 
>> > alongside it as a separate compilation). Waiting on the full build of the 
>> > module including debug is going to adversely affect your overall build 
>> > time and so shouldn't be necessary - especially if you want to be able to 
>> > have information separate ultimately.
>> >
>> > Make sense?
>> 
>> Not sure if you would be saving much by having the debug info separately, 
>> from what I’ve measured so far the debug info for a module makes up less 
>> than 10% of the total size. Admittedly, build-time-wise going through the 
>> backend to emit the object file is a lot more expensive than just dumping 
>> the raw PCH. [1]
>> 
>> Yeah, I think wanting to be able to control the behavior is reasonable, we 
>> just need to be careful what the implications for consumers are. If we add 
>> a, e.g., an “-fraw-modules” [2] or switch to clang to turn off the object 
>> file wrapping, I’d strongly suggest that we add the value of this switch to 
>> the module hash (or add a an optional “-g” to the module file name after the 
>> hash or something like that) to avoid ugly race conditions between debug 
>> info and non-debug-info builds of the same module. This way we’d have 
>> essentially two separate module caches, with and without debug info.
>> 
>> That's fine, I think (we don't use a module cache at all in our build 
>> system; it doesn't really make much sense for a distributed build) and most 
>> command-line flag changes already have this effect.
> 
> Great!
>>  
>> would that work for you?
>> -- adrian
>> 
>> [1] If you want to be serious about building the module debug info in 
>> parallel to the rest of the build, you could even have a clang-based tool 
>> import the just-built raw clang module and emit the debug info without 
>> having to parse the headers again :-)
>> 
>> That is what we intend to do :) (Assuming this turns out to actually be 
>> faster than re-parsing; faulting in the entire contents of a module has much 
>> worse locality than parsing.)
>> 
>> [2] -fraw-modules, -fmodule-format-raw, -fmodule-debug-info, ...?
>>     I would imagine that the driver enables module debug info when 
>> "-gmodules” is present and by default on Darwin.
>> 
>> That seems reasonable to me. For the frontend flag, I think a flag to turn 
>> this on or to select the module format makes more sense than a flag to 
>> switch to the raw format.
> 
> Okay then let’s narrow this down. Other possibilities in that direction 
> include (sorted from subjectively best to worst)
> 
> -fmodule-format=obj
> -fmodule-debug-info
> -ffat-modules
> -fmodule-container
> -fmodule-container-object
> 
> It's a -cc1 flag, so it doesn't really matter much. If this will eventually 
> govern whether we put code for inline functions into the module, then I think 
> we should avoid names like -fmodule-debug-info. Other than that, I don't 
> really have a preference.
>


Unless the “=“ part turns out to be an implementation nightmare, I think I’ll 
be going with -fmodule-format=[raw,obj] then and implicitly emit debug info in 
the obj case. If necessary, we can make this more fine grained later.

> What you're picturing there is essentially a flag that would indicate if we 
> should build all module-related-object-things into the module, or not? That 
> seems like a useful broad flag (with an eventual corresponding compiler mode 
> where we pass another flag and explicitly pass just the module and say "build 
> a separate object with all the module-related-object-things - for use in a 
> non-implicit-cache build)
> 
> (Hmm, we're going to have a weird middle ground in here - where the IR for 
> the inline functions needs to go in the module itself (as an 
> available_externally definition for use in non-LTO compilations of dependent 
> object files) and then the build-separate-module-related-object-things would 
> turn those into (weak?) definitions, compile them (& the debug info) into a 
> separate object file, to be linked in at the end)

Can you elaborate this use-case?
Are you saying you’d want a module object file with ast+bitcode and another one 
with bitcode'+debug info built from the first one? Or one raw ast file and two 
object files?

> 
> Should this just be keyed/defaulted off implicit/explicit modules, or 
> orthogonal to that choice?
>> [One other thing... I think we may have made a mistake by putting the reader 
>> and writer code behind the same interface: it forces tools that want to read 
>> the module format to link against all of LLVM IR, code generation, and so 
>> on, when all they really need is something like libObject.]
> 
> We can always split it into two implementations of the interface or two 
> interfaces, that’s not a very big deal. My assumption was that every tool 
> that wants to read the clang module format also wants to create modules 
> (because module cache... but as you noted that’s a Darwin-centric view) and 
> more low-level tools like llvm-bcanalyzer could be piped through llvm-objdump.


-- adrian

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: r241620 - Wrap clang modules and pch files in an object file container.

Reply via email to