[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Lang Hames via Phabricator via cfe-commits Tue, 23 Feb 2021 00:46:14 -0800

lhames added a comment.

> The other aspect of this is that upon unloading of these pieces of code we 
> need to run the destructors (that's why we need some non-canonical handling 
> of when we run the atexit handlers).


I just noticed this comment. I think long term you could handle this by 
introducing an "initialization generation" -- each time you run 
`jit_dlopen_repl` you would increment the generation. You'd point the 
`__cxa_atexit` alias at a custom function that keeps a map: `__dso_handle -> 
(generation -> [ atexits ])`. Then you could selectively run atexits for each 
generation before removing them.



================
Comment at: clang/lib/CodeGen/CodeGenAction.cpp:908
 
+CodeGenerator *CodeGenAction::getCodeGenerator() const {
+  return BEConsumer->getCodeGenerator();
----------------
v.g.vassilev wrote:
> lhames wrote:
> > v.g.vassilev wrote:
> > > sgraenitz wrote:
> > > > v.g.vassilev wrote:
> > > > > @rjmccall, we were wondering if there is a better way to ask CodeGen 
> > > > > to start a new module. The current approach seems to be drilling hole 
> > > > > in a number of abstraction layers.
> > > > > 
> > > > > In the past we have touched that area a little in 
> > > > > https://reviews.llvm.org/D34444 and the answer may be already there 
> > > > > but I fail to connect the dots.
> > > > > 
> > > > > Recently, we thought about having a new FrontendAction callback for 
> > > > > beginning a new phase when compiling incremental input. We need to 
> > > > > keep track of the created objects (needed for error recovery) in our 
> > > > > Transaction. We can have a map of `Transaction*` to `llvm::Module*` 
> > > > > in CodeGen. The issue is that new JITs take ownership of the 
> > > > > `llvm::Module*` which seems to make it impossible to support jitted 
> > > > > code removal with that model (cc: @lhames, @rsmith).
> > > > When compiling incrementally, doeas a "new phase" mean that all 
> > > > subsequent code will go into a new module from then on? How will 
> > > > dependencies to previous symbols be handled? Are all symbols external?
> > > > 
> > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > 
> > > > That's true, but you can still keep a raw pointer to it, which will be 
> > > > valid at least as long as the module wasn't linked. Afterwards it 
> > > > depends on the linker:
> > > > * RuntimeDyld can return ownership of the object's memory range via 
> > > > `NotifyEmittedFunction`
> > > > * JITLink provides the `ReturnObjectBufferFunction` for the same purpose
> > > > 
> > > > > seems to make it impossible to support jitted code removal with that 
> > > > > model
> > > > 
> > > > Can you figure out what symbols are affected and remove these? A la: 
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > 
> > > > I think @anarazel has ported a client with code removal to OrcV2 
> > > > successfully in the past. Maybe there's something we can learn from it.
> > > > When compiling incrementally, doeas a "new phase" mean that all 
> > > > subsequent code will go into a new module from then on? How will 
> > > > dependencies to previous symbols be handled? Are all symbols external?
> > > 
> > > There is some discussion on this here 
> > > https://reviews.llvm.org/D34444#812418
> > > 
> > > I think the relevant bit is that 'we have just one ever growing TU [...] 
> > > which we send to the RuntimeDyLD allowing only JIT to resolve symbols 
> > > from it.  We aid the JIT when resolving symbols with internal linkage by 
> > > changing all internal linkage to external (We haven't seen issues with 
> > > that approach)'.
> > > 
> > > > 
> > > > > The issue is that new JITs take ownership of the llvm::Module*
> > > > 
> > > > That's true, but you can still keep a raw pointer to it, which will be 
> > > > valid at least as long as the module wasn't linked. 
> > > 
> > > That was my first implementation when I upgraded cling to llvm9 where the 
> > > `shared_ptr`s went to `unique_ptr`s. This was quite problematic for many 
> > > of the use cases we support as the JIT is somewhat unpredictable to the 
> > > high-level API user. 
> > > 
> > > 
> > > >Afterwards it depends on the linker:
> > > > * RuntimeDyld can return ownership of the object's memory range via 
> > > > `NotifyEmittedFunction`
> > > > * JITLink provides the `ReturnObjectBufferFunction` for the same purpose
> > > > 
> > > 
> > > That's exactly what we ended up doing (I would like to thank Lang here 
> > > who gave a similar advice).
> > > 
> > > > > seems to make it impossible to support jitted code removal with that 
> > > > > model
> > > > 
> > > > Can you figure out what symbols are affected and remove these? A la: 
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/include/llvm/ExecutionEngine/Orc/Core.h#L1020
> > > > 
> > > > I think @anarazel has ported a client with code removal to OrcV2 
> > > > successfully in the past. Maybe there's something we can learn from it.
> > > 
> > > Indeed. That's not yet on my radar as seemed somewhat distant in time.
> > > 
> > > Recently, we thought about having a new FrontendAction callback for 
> > > beginning a new phase when compiling incremental input. We need to keep 
> > > track of the created objects (needed for error recovery) in our 
> > > Transaction. We can have a map of Transaction* to llvm::Module* in 
> > > CodeGen. The issue is that new JITs take ownership of the llvm::Module* 
> > > which seems to make it impossible to support jitted code removal with 
> > > that model (cc: @lhames, @rsmith).
> > 
> > In the new APIs, in order to enable removable code, you can associate 
> > Modules with ResourceTrackers when they're added to the JIT. The 
> > ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > 
> >   auto Mod = /* create module */;
> >   auto RT = JD.createResourceTracker();
> >   J.addModule(RT, std::move(Mod));
> >   //...
> >   if (auto Err = RT.remove())
> >     /* handle Err */;
> > 
> > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > 
> > So is a TU the same as an llvm::Module in this context? If so, how do you 
> > reconcile that with the JIT taking ownership of modules? Are you just 
> > copying the Module each time before adding it?
> > 
> > > We need to keep track of the created objects (needed for error recovery) 
> > > in our Transaction.
> > 
> > Do you need the Module* for error recovery? Or just the Decls?
> > > Recently, we thought about having a new FrontendAction callback for 
> > > beginning a new phase when compiling incremental input. We need to keep 
> > > track of the created objects (needed for error recovery) in our 
> > > Transaction. We can have a map of Transaction* to llvm::Module* in 
> > > CodeGen. The issue is that new JITs take ownership of the llvm::Module* 
> > > which seems to make it impossible to support jitted code removal with 
> > > that model (cc: @lhames, @rsmith).
> > 
> > In the new APIs, in order to enable removable code, you can associate 
> > Modules with ResourceTrackers when they're added to the JIT. The 
> > ResourceTrackers then allow for removal. Idiomatic usage looks like:
> > 
> >   auto Mod = /* create module */;
> >   auto RT = JD.createResourceTracker();
> >   J.addModule(RT, std::move(Mod));
> >   //...
> >   if (auto Err = RT.remove())
> >     /* handle Err */;
> 
> Nice, thanks!
> 
> > 
> > > we have just one ever growing TU [...] which we send to RuntimeDyld...
> > 
> > So is a TU the same as an llvm::Module in this context? If so, how do you 
> > reconcile that with the JIT taking ownership of modules? Are you just 
> > copying the Module each time before adding it?
> 
> Each incremental chunk with which the TU grows has a corresponding 
> `llvm::Module`. Once clang's CodeGen is done for the particular module it 
> transfers the ownership to the `Transaction` which, in turn, hands it to the 
> JIT and once the JIT is done it retains the ownership again.
> 
> > 
> > > We need to keep track of the created objects (needed for error recovery) 
> > > in our Transaction.
> > 
> > Do you need the Module* for error recovery? Or just the Decls?
> 
> Yes, we need a `llvm::Module` that corresponds to the Decls as sometimes 
> CodeGen will decide not to emit a Decl.
> Each incremental chunk with which the TU grows has a corresponding 
> llvm::Module. Once clang's CodeGen is done for the particular module it 
> transfers the ownership to the Transaction which, in turn, hands it to the 
> JIT and once the JIT is done it retains the ownership again.

> Yes, we need a llvm::Module that corresponds to the Decls as sometimes 
> CodeGen will decide not to emit a Decl.

Can you elaborate on this? (Or point me to the relevant discussion / code?)

Does CodeGen aggregate code into the Module as you CodeGen each incremental 
chunk? Or do you Link the previously CodeGen'd module into a new one?


================
Comment at: clang/lib/Interpreter/IncrementalExecutor.cpp:56-59
+llvm::Error IncrementalExecutor::addModule(std::unique_ptr<llvm::Module> M) {
+  llvm::orc::ThreadSafeContext TSCtx(std::make_unique<llvm::LLVMContext>());
+  return Jit->addIRModule(llvm::orc::ThreadSafeModule(std::move(M), TSCtx));
+}
----------------
v.g.vassilev wrote:
> lhames wrote:
> > This doesn't look right. The ThreadSafeContext has to contain the 
> > LLVMContext for the module, but here you're creating a new unrelated 
> > ThreadSafeContext.
> Thanks. I think I fixed it now. Can you take a look?
Yep -- This looks right now.


================
Comment at: clang/lib/Interpreter/IncrementalExecutor.h:36
+  llvm::Error addModule(std::unique_ptr<llvm::Module> M);
+  llvm::Error runCtors() const;
+};
----------------
v.g.vassilev wrote:
> lhames wrote:
> > v.g.vassilev wrote:
> > > sgraenitz wrote:
> > > > v.g.vassilev wrote:
> > > > > teemperor wrote:
> > > > > > Should we maybe merge `runCtors` and `addModule`? Not sure if there 
> > > > > > is a use case for adding a Module but not running Ctors. Also 
> > > > > > documentation.
> > > > > The case we have is when there is no JIT -- currently we have such a 
> > > > > case in IncrementalProcessingTest I think. Another example, which 
> > > > > will show up in future patches, is the registration of atexit 
> > > > > handlers. That is, before we `runCtors` we make a pass over the LLVM 
> > > > > IR and collect some specific details and (possibly change the IR and 
> > > > > then run).
> > > > > 
> > > > > I'd rather keep it separate for now if that's okay.
> > > > > Should we maybe merge runCtors and addModule?
> > > > 
> > > > +1 even though there may be open questions regarding incremental 
> > > > initialization.
> > > > 
> > > > > The case we have is when there is no JIT -- currently we have such a 
> > > > > case in IncrementalProcessingTest
> > > > 
> > > > Can you run anything if there is no JIT? I think what you have in 
> > > > `IncrementalProcessing.EmitCXXGlobalInitFunc` is 
> > > > `getGlobalInit(llvm::Module*)`, which checks for symbol names with a 
> > > > specific prefix.
> > > > 
> > > > > before we runCtors we make a pass over the LLVM IR and collect some 
> > > > > specific details and (possibly change the IR and then run).
> > > > 
> > > > The idiomatic solution for such modifications would use an 
> > > > IRTransformLayer as in:
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/examples/OrcV2Examples/LLJITWithOptimizingIRTransform/LLJITWithOptimizingIRTransform.cpp#L108
> > > > 
> > > > > Another example, which will show up in future patches, is the 
> > > > > registration of atexit handlers
> > > > 
> > > > `atexit` handlers as well as global ctors/dtors should be covered by 
> > > > LLJIT PlatformSupport. The LLJITBuilder will inject a 
> > > > GenericLLVMIRPlatformSupport instance by default:
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L125
> > > > 
> > > > It's not as comprehensive as e.g. the MachO implementation, but should 
> > > > be sufficient for your use-case as you have IR for all your JITed code. 
> > > > (It would NOT work if you cached object files, reloaded them in a 
> > > > subsequent session and wanted to run their ctors.) So, your below call 
> > > > to `initialize()` should do it already.
> > > > 
> > > > > Should we maybe merge runCtors and addModule?
> > > > 
> > > > +1 even though there may be open questions regarding incremental 
> > > > initialization.
> > > > 
> > > > > The case we have is when there is no JIT -- currently we have such a 
> > > > > case in IncrementalProcessingTest
> > > > 
> > > > Can you run anything if there is no JIT? I think what you have in 
> > > > `IncrementalProcessing.EmitCXXGlobalInitFunc` is 
> > > > `getGlobalInit(llvm::Module*)`, which checks for symbol names with a 
> > > > specific prefix.
> > > 
> > > Yes, I'd think such mode is useful for testing but also for other cases 
> > > where the user is handed a Transaction* and allowed to make some 
> > > modification before processing the `llvm::Module`
> > > 
> > > > 
> > > > > before we runCtors we make a pass over the LLVM IR and collect some 
> > > > > specific details and (possibly change the IR and then run).
> > > > 
> > > > The idiomatic solution for such modifications would use an 
> > > > IRTransformLayer as in:
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/examples/OrcV2Examples/LLJITWithOptimizingIRTransform/LLJITWithOptimizingIRTransform.cpp#L108
> > > 
> > > That looks very nice. It assumes the JIT is open to the users, here we 
> > > open only the `llvm::Module` (I am not arguing if that's a good idea in 
> > > general).
> > > 
> > > > 
> > > > > Another example, which will show up in future patches, is the 
> > > > > registration of atexit handlers
> > > > 
> > > > `atexit` handlers as well as global ctors/dtors should be covered by 
> > > > LLJIT PlatformSupport. The LLJITBuilder will inject a 
> > > > GenericLLVMIRPlatformSupport instance by default:
> > > > https://github.com/llvm/llvm-project/blob/13f4448ae7db1a47/llvm/lib/ExecutionEngine/Orc/LLJIT.cpp#L125
> > > 
> > > Does that give me control over when the `atexit` handlers are called? Can 
> > > the interpreter call them at its choice?
> > > 
> > > > 
> > > > It's not as comprehensive as e.g. the MachO implementation, but should 
> > > > be sufficient for your use-case as you have IR for all your JITed code. 
> > > > (It would NOT work if you cached object files, reloaded them in a 
> > > > subsequent session and wanted to run their ctors.) So, your below call 
> > > > to `initialize()` should do it already.
> > > > 
> > > 
> > > 
> > >> Should we maybe merge runCtors and addModule?
> > > +1 even though there may be open questions regarding incremental 
> > > initialization.
> > 
> > In the long term constructors should be run via the Orc runtime (currently 
> > planned for initial release in LLVM 13 later this year). I like the idea of 
> > keeping "add module" and "run initializers" as two separate steps, with 
> > initializers being run only when you execute a top level expression. It 
> > would allow for workflows like this:
> > 
> >   interpreter% :load a.cpp
> >   interpreter% :load b.cpp
> > 
> > where an initializer in a.cpp depends on code in b.cpp. It would also allow 
> > for defining constructors with forward references in the REPL itself. 
> > 
> > The Orc runtime is currently focused on emulating the usual execution 
> > environment: The canonical way to execute initializers is by calling 
> > jit_dlopen on the target JITDylib. I think the plan should be to generalize 
> > this behavior (either in the jit_dlopen contract, or by introducing a 
> > jit_dlopen_repl function) to allow for repeated calls to dlopen, with each 
> > subsequent dlopen call executing any discovered-but-not-yet-run 
> > initializers.
> > 
> > 
> > 
> > >> Does that give me control over when the atexit handlers are called? Can 
> > >> the interpreter call them at its choice?
> > > 
> > > It's not as comprehensive as e.g. the MachO implementation, but should be 
> > > sufficient for your use-case as you have IR for all your JITed code. (It 
> > > would NOT work if you cached object files, reloaded them in a subsequent 
> > > session and wanted to run their ctors.) So, your below call to 
> > > initialize() should do it already.
> > 
> > Yep -- initialize should run the constructors, which should call 
> > cxa_atexit. The cxa_atexit calls should be interposed by 
> > GenericLLVMIRPlatform, and the atexits run when you call 
> > LLJIT::deinitialize on the JITDylib. There are some basic regression tests 
> > for this, but it hasn't been stress tested yet.
> > 
> > GenericLLVMIRPlatform should actually support initializers in cached object 
> > files that were compiled from Modules added to LLJIT: The platform replaces 
> > llvm.global_ctors with an init function with a known name, then looks for 
> > that name in objects loaded for the cache. At least that was the plan, I 
> > don't recall whether it has actually been tested. What definitely doesn't 
> > work is running initializers in objects produced outside LLJIT. That will 
> > be fixed by JITLink/ELF and the Orc Runtime though (and already works for 
> > MachO in the runtime prototype).
> @sgraenitz, @lhames, thanks for the clarifications.
> 
> I am marking your comments as resolved (for easier tracking on my end). If 
> the intent was to change something in this patch could you elaborate a little 
> more what specifically I need to do here?
I don't think there's anything to do here -- those notes were just background 
info.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96033/new/

https://reviews.llvm.org/D96033

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96033: [clang-repl] Land initial infrastructure for incremental parsing

Reply via email to