[PATCH] D94472: [WIP][clang][cli] Command line round-trip for HeaderSearch options

Duncan P. N. Exon Smith via Phabricator via cfe-commits Tue, 19 Jan 2021 14:01:09 -0800

dexonsmith added a comment.

In D94472#2507361 <https://reviews.llvm.org/D94472#2507361>, @jansvoboda11 
wrote:


> Thanks for putting your time into the comment, @dexonsmith.
>
> I agree that adding a `-round-trip-args` flag to enable/disable 
> round-tripping is more sensible than hard-coding it via something like 
> `#ifndef NDEBUG`.
> Providing `-round-trip-args-debug-mode` to help with debugging would be a 
> great addition too. Thanks for pointing that out.
>
> I would be interested to hear why exactly are you concerned by the current 
> approach being too low level.
> From my point of view, having to opt-in into round-tripping a part of 
> `CompilerInvocation` is a feature and I think it should work relatively well 
> with the current roll-out plan:
>
> 1. Write documentation on porting existing options to the new marshalling 
> system.
> 2. Send a message to the mailing list pointing out that we'll be enabling 
> round-tripping in assert builds by default and what that means. Downstream 
> projects can prepare to port their custom options or disable the 
> round-tripping in assert builds.
> 3. Some time later, commit this patch. Downstream projects already have 
> enough information how to deal with the change. Because we've enabled 
> round-tripping only for `HeaderSearchOptions`, adopting it isn't that 
> time-consuming if a downstream project decides to do so.
> 4. (Any patches that add a command line option affecting 
> `HeaderSearchOptions` will from now on be forced to include the 
> generating/marshalling code.)
> 5. Slowly commit following patches, each enabling round-tripping for another 
> chunk of `CompilerInvocation`.
> 6. After round-tripping has been enabled for all of `CompilerInvocation`, 
> simplify the code (remove `ParseHS`, `GenerateHS`, `ResetHS`, etc.) and 
> round-trip everything (when told to do so with `-round-trip-args`).
>
> I'd be interested in hearing your thoughts on that ^.

I have three concerns with the approach:

1. Churn. This adds a lot of code that will eventually be removed / refactored 
away. One example is that shifting the `NDEBUG` logic to the driver requires 
threading a Boolean (or tristate/enum) for round-trip mode through a few 
layers; once we're finished, we'll end up stripping that argument back out. A 
second example is that this seems to require splitting up 
`OPTIONS_WITH_MARSHALLING` for each option. Once this work is finished, someone 
will need to clean them up / unify them, or maybe they'll unnecessarily stay 
split / duplicated.
2. Boilerplate. Somewhat related to churn; there's a fair bit of additional 
boilerplate required in this approach. This makes it harder to read / 
understand / modify the code.
3. Correctness. I'm not sure ResetHS is quite right. It'll probably work for a 
normal command-line `-cc1` invocation, but perhaps not for all tooling 
applications, since it's changing the pointer identity of `HeaderSearchOptions`.

On the other hand, one thing I really like about your approach, which I don't 
see a way of getting with a top-level check, is the ability to turn on "strict" 
mode for subsets of the command-line, which helps to hold the line in the face 
of new options being added (I think this is the feature you're (rightly) 
calling out). I'm not sure how important it is in practice, as long as we still 
think getting them all is going to happen in a reasonable time frame. There 
aren't that many new options being added: filtering out `[cli]` (your patches), 
`[flang]`, `[[D]driver]`, reverts, and relandings, there are 39 commits to 
Options.td in the last three months (3 / week). Some of those are deleting 
options or changing values, and not important to catch. Even for new options, I 
imagine most people copy/paste nearby options. I think it's critical to hold 
the line once we have them all, but until then I'm not sure.

Both approaches seem possible for downstreams to adapt / adopt in their own 
time, although reverting a one line patch switching from "relaxed" to "strict" 
would be easier than maintaining all the Parse/Generate/Reset infrastructure 
after upstream deletes it.

(I'm mostly concerned about #1 and #2; I assume #3 can be fixed somehow, 
although I haven't thought about how.)

> To help me understand the reasoning behind your proposal, can you answer 
> these questions?
>
> 2. What does `GeneratedArgs1 != GeneratedArgs2` prove? If we forget to 
> generate some option, it will be in neither `GeneratedArgs1` nor 
> `GeneratedArgs2`. We'll catch that only in "strict" mode anyways. I guess 
> this can help us discover other kinds of lossy generation, but I think the 
> guarantees of "strict" mode are nicer.

Firstly, `GeneratedArgs1 == GeneratedArgs2` proves that generate is an exact 
inversion of parse, for the subset of args that are handled by the generator 
(eventually, this will be all args). IOW, it's adding coverage (in one 
direction) that parse+generate "match" for every option that has generation 
code. Because of the automation provided by the marshalling infrastructure, I'd 
expect this to mostly catch bugs in hand-written code (custom marshalling code, 
manual generation code, or end-of-parse cleanup / follow-up code). Secondly, 
this adds coverage that command-line generation is deterministic.

`strict` mode additionally uses the `GeneratedArgs1` to fill 
CompilerInvocation, indirectly checking both directions by requiring tests to 
pass both with and without this round-trip. However, during development people 
generally only run the tests one way and the failure mode won't be ideal. If a 
new option has been left out entirely, it'll usually be clear enough what went 
wrong and how to fix it; most people build with asserts in development, and the 
option won't work until they add generation code. But failures from a 
mismatched parse and generate might not get caught until you compare an Asserts 
build to a NoAsserts build, which may not happen until the bots run. This kind 
of failure is also difficult to debug / understand, compared to the assertion 
on once- and twice-generated args. IOW, even though `GeneratedArgs1 == 
GeneratedArgs2` doesn't catch all issues, it's probably an easier failure to 
respond to when it does fail.

> 1. What is the benefit or "relaxed" round-tripping compared to the selective 
> strict round-tripping in this patch?

Mainly, less churn, less boilerplate, and no correctness problem to solve for 
Reset. I assume you can adapt what you have to also check `GeneratedArgs1 == 
GeneratedArgs2` (it'd be nice to have this at the top level, so you get this 
check even for options where selective strict mode hasn't yet been enabled), 
but if that's harder than it sounds then lack of that would miss some coverage 
and make debugging harder, as described above.



================
Comment at: clang/lib/Frontend/CompilerInvocation.cpp:3211-3214
+  auto ResetHS = [](CompilerInvocation &Res) {
+    auto EmptyHeaderSearchOpts = std::make_shared<HeaderSearchOptions>();
+    Res.HeaderSearchOpts.swap(EmptyHeaderSearchOpts);
+  };
----------------
I'm not sure this works correctly. Callers may depend on the identity of the 
`HeaderSearchOptions`. It's a shared pointer that could have other references. 
This will only update / reset one reference.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D94472/new/

https://reviews.llvm.org/D94472

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D94472: [WIP][clang][cli] Command line round-trip for HeaderSearch options

Reply via email to