[PATCH] D51314: Parse compile commands lazily in InterpolatingCompilationDatabase

Sam McCall via Phabricator via cfe-commits Tue, 28 Aug 2018 01:37:17 -0700

sammccall added a comment.

Thanks for finding this problem, this fix *mostly* looks good (though I think 
we can probably drop memoization).


I'm a bit uncomfortable about the places where we need the type, because this 
is the only thing forcing us to parse before we've picked a command to 
transfer, and the number of commands we need to parse is data-dependent and 
hard to reason about.

Let me think about this a little - I suspect slightly more invasive changes 
(change the concept of type, tweak the heuristics, or do a "lightweight parse" 
to get the type) might make this cleaner and performance more predictable.



================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:440
       Result.emplace_back(std::move(Command));
-      if (Result.back().Type == types::TY_INVALID)
-        Result.pop_back();
----------------
ilya-biryukov wrote:
> We can't look at 'Type' at this point anymore, because it needs parsing of 
> TranserableCommands. Not sure what's the best way to replace it. @sammccall, 
> any ideas?
> 
So filtering out this has a couple of effects
 - it's a performance optimization (don't bother indexing filenames for useless 
files). We don't need this
 - it prevents a TY_INVALID command being chosen for transfer. I'm not sure 
whether this would occur often enough to be a problem in practice.


================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:133
+    assert(TraitsComputed && "calling transferTo on moved-from object");
+    const CommandTraits &T = getTraits();
+    CompileCommand Result = T.Cmd;
----------------
I think you're overthinking things with the memoization here (of course I say 
this as the person who underthought it!)

AIUI, the problem is that *eagerly* parsing all the compile commands takes 3x 
as long as reading them, which hurts startup time with big 
`compile_commands.json`.

But I think we can afford to just parse them when `transferTo` is called, 
without memoization. (Remember we only hit this code path when querying a file 
*not in the CDB*, so it should never get called in a tight loop).

The benefit of slightly reducing the latency of`getCompileCommand` for unknown 
files when we happen to pick the same template file for the second time... it's 
unclear to me, and the code would be easier to follow without it.


================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:159
+  llvm::StringRef filename() const {
+    assert(TraitsComputed && "calling filename on moved-from object");
+    return OriginalCmd.Filename;
----------------
Is this so important to dynamically check? Most types don't.


================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:171
+  // compute, so we only compute on first access.
+  struct CommandTraits {
+    // Flags that should not apply to all files are stripped from CommandLine.
----------------
Traits is a bit vague, and a bit template-nightmare-y!
maybe ParsedCommand?



================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:383
       S.Preferred = PreferredLanguage == types::TY_INVALID ||
-                    PreferredLanguage == Commands[S.Index].Type;
+                    PreferredLanguage == Commands[S.Index].type();
       S.Points = Candidate.second;
----------------
I think this is going to force parsing of all candidates that get any points at 
all, with a flat directory structure this could be quite a lot :-(


================
Comment at: lib/Tooling/InterpolatingCompilationDatabase.cpp:383
       S.Preferred = PreferredLanguage == types::TY_INVALID ||
-                    PreferredLanguage == Commands[S.Index].Type;
+                    PreferredLanguage == Commands[S.Index].type();
       S.Points = Candidate.second;
----------------
sammccall wrote:
> I think this is going to force parsing of all candidates that get any points 
> at all, with a flat directory structure this could be quite a lot :-(
ah, now I see that the memoization also allows us to pretend that this is an 
eagerly computed value, without considering exactly when it's computed.

I'm not sure I like this if we do consider it performance sensitive - it 
obfuscates exactly which set of commands we parse, it would be nice if we were 
upfront about this.


Repository:
  rC Clang

https://reviews.llvm.org/D51314



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D51314: Parse compile commands lazily in InterpolatingCompilationDatabase

Reply via email to