ken-matsui added inline comments.
================ Comment at: clang/test/Preprocessor/suggest-typoed-directive.c:10 +// expected-warning@+11 {{'#elfidef' directive not found, did you mean '#elifdef'?}} +// expected-warning@+11 {{'#elfindef' directive not found, did you mean '#elifdef'?}} +// expected-warning@+11 {{'#elsi' directive not found, did you mean '#else'?}} ---------------- aaron.ballman wrote: > ken-matsui wrote: > > aaron.ballman wrote: > > > erichkeane wrote: > > > > aaron.ballman wrote: > > > > > ken-matsui wrote: > > > > > > aaron.ballman wrote: > > > > > > > ken-matsui wrote: > > > > > > > > aaron.ballman wrote: > > > > > > > > > ken-matsui wrote: > > > > > > > > > > aaron.ballman wrote: > > > > > > > > > > > It's interesting that this one suggested `#elifdef` > > > > > > > > > > > instead of `#elifndef` -- is there anything that can be > > > > > > > > > > > done for that? > > > > > > > > > > > > > > > > > > > > > > Also, one somewhat interesting question is whether we > > > > > > > > > > > want to recommend `#elifdef` and `#elifndef` outside of > > > > > > > > > > > C2x/C++2b mode. Those directives only exist in the latest > > > > > > > > > > > language standard, but Clang supports them as a > > > > > > > > > > > conforming extension in all language modes. Given that > > > > > > > > > > > this diagnostic is about typos, I think I'm okay > > > > > > > > > > > suggesting the directives even in older language modes. > > > > > > > > > > > That's as likely to be a correct suggestion as not, IMO. > > > > > > > > > > > It's interesting that this one suggested `#elifdef` > > > > > > > > > > > instead of `#elifndef` -- is there anything that can be > > > > > > > > > > > done for that? > > > > > > > > > > > > > > > > > > > > I found I have to use `std::min_element` instead of > > > > > > > > > > `std::max_element` because we are finding the nearest (most > > > > > > > > > > minimum distance) string. Meanwhile, `#elfindef` has 2 > > > > > > > > > > distance with both `#elifdef` and `#elifndef`. After > > > > > > > > > > replacing `std::max_element` with `std::min_element`, I > > > > > > > > > > could suggest `#elifndef` from `#elfinndef`. > > > > > > > > > > > > > > > > > > > > > Also, one somewhat interesting question is whether we > > > > > > > > > > > want to recommend `#elifdef` and `#elifndef` outside of > > > > > > > > > > > C2x/C++2b mode. Those directives only exist in the latest > > > > > > > > > > > language standard, but Clang supports them as a > > > > > > > > > > > conforming extension in all language modes. Given that > > > > > > > > > > > this diagnostic is about typos, I think I'm okay > > > > > > > > > > > suggesting the directives even in older language modes. > > > > > > > > > > > That's as likely to be a correct suggestion as not, IMO. > > > > > > > > > > > > > > > > > > > > I agree with you because Clang implements those directives, > > > > > > > > > > and the suggested code will also be compilable. I > > > > > > > > > > personally think only not compilable suggestions should be > > > > > > > > > > avoided. (Or, we might place additional info for outside of > > > > > > > > > > C2x/C++2b mode like `this is a C2x/C++2b feature but > > > > > > > > > > compilable on Clang`?) > > > > > > > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > > > > > > > According to the algorithm of `std::min_element`, we only > > > > > > > > > > get an iterator of the first element even if there is > > > > > > > > > > another element that has the same distance. So, `#elfindef` > > > > > > > > > > only suggests `#elifdef` in accordance with the order of > > > > > > > > > > `Candidates`, and I don't think it is beautiful to depend > > > > > > > > > > on the order of candidates. I would say that we can suggest > > > > > > > > > > all the same distance like the following, but I'm not sure > > > > > > > > > > this is preferable: > > > > > > > > > > > > > > > > > > > > ``` > > > > > > > > > > #elfindef // diag: unknown directive, did you mean #elifdef > > > > > > > > > > or #elifndef? > > > > > > > > > > ``` > > > > > > > > > > > > > > > > > > > > I agree with you because Clang implements those directives, > > > > > > > > > > and the suggested code will also be compilable. I > > > > > > > > > > personally think only not compilable suggestions should be > > > > > > > > > > avoided. (Or, we might place additional info for outside of > > > > > > > > > > C2x/C++2b mode like this is a C2x/C++2b feature but > > > > > > > > > > compilable on Clang?) > > > > > > > > > > > > > > > > > > I may be changing my mind on this a bit. I now see we don't > > > > > > > > > issue an extension warning when using `#elifdef` or > > > > > > > > > `#elifndef` in older language modes. That means suggesting > > > > > > > > > those will be correct but only for Clang, and the user won't > > > > > > > > > have any other diagnostics to tell them about the portability > > > > > > > > > issue. And those particular macros are reasonably likely to > > > > > > > > > be used in a header where the user is trying to aim for > > > > > > > > > portability. So I'm starting to think we should only suggest > > > > > > > > > those two in C2x mode (and we should probably add a > > > > > > > > > portability warning for #elifdef and #elifndef, so I filed: > > > > > > > > > https://github.com/llvm/llvm-project/issues/55306) > > > > > > > > > > > > > > > > > > > I would say that we can suggest all the same distance like > > > > > > > > > > the following, but I'm not sure this is preferable: > > > > > > > > > > > > > > > > > > The way we typically handle this is to attach FixIt hints to > > > > > > > > > a note instead of to the diagnostic. This way, automatic > > > > > > > > > fixes aren't applied (because there are multiple choices to > > > > > > > > > pick from) but the user is still able to apply whichever fix > > > > > > > > > they want in an IDE or other tool. It might be worth trying > > > > > > > > > that approach (e.g., if there's only one candidate, attach it > > > > > > > > > to the warning, and if there are two or more, emit a warning > > > > > > > > > without a "did you mean" in it and use a new note for the > > > > > > > > > fixit. e.g., > > > > > > > > > ``` > > > > > > > > > #elfindef // expected-warning {{invalid preprocessing > > > > > > > > > directive}} \ > > > > > > > > > expected-note {{did you mean '#elifdef'?}} \ > > > > > > > > > expected-note {{did you mean '#elifndef'?}} > > > > > > > > > ``` > > > > > > > > > WDYT? > > > > > > > > > I may be changing my mind on this a bit. I now see we don't > > > > > > > > > issue an extension warning when using `#elifdef` or > > > > > > > > > `#elifndef` in older language modes. That means suggesting > > > > > > > > > those will be correct but only for Clang, and the user won't > > > > > > > > > have any other diagnostics to tell them about the portability > > > > > > > > > issue. And those particular macros are reasonably likely to > > > > > > > > > be used in a header where the user is trying to aim for > > > > > > > > > portability. So I'm starting to think we should only suggest > > > > > > > > > those two in C2x mode (and we should probably add a > > > > > > > > > portability warning for #elifdef and #elifndef, so I filed: > > > > > > > > > https://github.com/llvm/llvm-project/issues/55306) > > > > > > > > > > > > > > > > > > > > > > > > > Certainly, it would be less confusing to the user to avoid > > > > > > > > suggestions regarding those two. > > > > > > > > I'm going to fix my code to avoid suggesting them in not C2x > > > > > > > > mode. > > > > > > > > > > > > > > > > Thank you for submitting the issue, I also would like to work > > > > > > > > on it. > > > > > > > > > > > > > > > > > The way we typically handle this is to attach FixIt hints to > > > > > > > > > a note instead of to the diagnostic. This way, automatic > > > > > > > > > fixes aren't applied (because there are multiple choices to > > > > > > > > > pick from) but the user is still able to apply whichever fix > > > > > > > > > they want in an IDE or other tool. It might be worth trying > > > > > > > > > that approach (e.g., if there's only one candidate, attach it > > > > > > > > > to the warning, and if there are two or more, emit a warning > > > > > > > > > without a "did you mean" in it and use a new note for the > > > > > > > > > fixit. e.g., > > > > > > > > > ``` > > > > > > > > > #elfindef // expected-warning {{invalid preprocessing > > > > > > > > > directive}} \ > > > > > > > > > expected-note {{did you mean '#elifdef'?}} \ > > > > > > > > > expected-note {{did you mean '#elifndef'?}} > > > > > > > > > ``` > > > > > > > > > WDYT? > > > > > > > > > > > > > > > > This is really cool, but don't you care about the redundancy of > > > > > > > > `did you mean` in terms of the llvm team culture? > > > > > > > > If not, I will implement notes like the above. > > > > > > > > Certainly, it would be less confusing to the user to avoid > > > > > > > > suggestions regarding those two. I'm going to fix my code to > > > > > > > > avoid suggesting them in not C2x mode. > > > > > > > > > > > > > > +1, thank you! > > > > > > > > > > > > > > > This is really cool, but don't you care about the redundancy of > > > > > > > > did you mean in terms of the llvm team culture? If not, I will > > > > > > > > implement notes like the above. > > > > > > > > > > > > > > I would care if the list were potentially unbounded (like, say, > > > > > > > with identifiers in general), but because we know this list will > > > > > > > only have a max of two entries on it in this case, it seems > > > > > > > reasonable to me. I double-checked with @erichkeane to see if he > > > > > > > thought it would be an issue, and he agreed that it being a fixed > > > > > > > list makes it pretty reasonable. > > > > > > > > > > > > > > However, that discussion did raise a question -- why are there > > > > > > > two suggestions? elifdef requires a swap + delete and elifndef > > > > > > > requires just a swap, so we would have thought that it would have > > > > > > > been the only option in the list. > > > > > > With the implementation of Lev distances used in llvm, I could > > > > > > simply suggest `#elifdef` from `#elfidef` and `#elifndef` from > > > > > > `#elfindef`. > > > > > > > > > > > > So, in this situation, do you think that we still need to add two > > > > > > notes for conflicted distances? > > > > > No, let's skip the two note behavior. If we find ourselves with > > > > > multiple suggestions, we'll just leave off the "did you mean?" part > > > > > of the diagnostic entirely. > > > > Might I suggest we just emit the 'first' suggestion? This isn't really > > > > perfect, but I would think that our users put very little thought into > > > > it when we suggest the wrong thing. > > > Notionally that makes sense, but the situation I'm worried about is when > > > users pass `-fixit` to the compiler -- then it automatically does > > > whatever, but we have no idea whether that whatever is reasonable or not > > > when we have more than one option. > > I could suggest `#if` from `#id`, but I had to change `#elfindef` to > > `#elfinndef` to suggest `#elifndef`. > > After setting the `AllowReplacements` option to `true`, `#elfindef` > > suggested `#elifdef` (I think it might have the same distance with > > `#elifndef`). > Thanks! I'm coming around more to the idea that we'll never get the "perfect" > set of suggestions, and so whatever suggestions we get are quite likely good > enough. However, I think it would be helpful to also add the tests where the > behavior seems a bit strange, like `#elfindef`, just to show that we know the > behavior is what it is and it doesn't do bad things like crash or whatnot. Thank you :) Ok, I'll add the test as well. ================ Comment at: llvm/lib/Support/StringRef.cpp:102 +// Find a similar string in `Candidates`. +Optional<std::string> StringRef::find_similar_str(const std::vector<StringRef> &Candidates, size_t Dist) const { + // We need to check if `rng` has the exact case-insensitive string because the Levenshtein distance match does not ---------------- aaron.ballman wrote: > I am less convinced that we want this as a general interface on `StringRef`. > I think callers need to decide whether something is similar enough or not. > For example, case sensitivity may not matter for this case but it might > matter to another case. Others may wish to limit the max edit_distance for > each of the candidates for performance reasons, others may not. We already > saw there were questions about whether to allow replacements or not. That > sort of thing. > > I think this functionality should be moved to PPDirectives.cpp as a static > helper function that's specific to this use. Also, because this returns one > of the candidates passed in, and those are a `StringRef`, I think this > function should return a `StringRef` -- this removes allocation costs. I'd > also drop the `Dist` parameter as the function is no longer intended to be a > general purpose one. I am also in favor of it, but where should I put tests for this functionality? Is it possible to write unit tests for static functions implemented in `.cpp` files? ================ Comment at: llvm/lib/Support/StringRef.cpp:118-135 + std::vector<std::pair<std::string, size_t>> Cand; + for (StringRef C : Candidates) { + size_t CurDist = edit_distance(C, true); + if (CurDist <= MaxDist) { + Cand.emplace_back(C, CurDist); + } + } ---------------- aaron.ballman wrote: > I think we can skip the vector entirely by keeping track of the min edit > distance we've seen thus far and which candidate that corresponds to when > looping over the candidates. :+1: Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D124726/new/ https://reviews.llvm.org/D124726 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits