sammccall added inline comments.
================
Comment at: clangd/FuzzyMatch.cpp:96
return None;
return ScoreScale * std::min(PerfectBonus * PatN, std::max<int>(0, Best));
}
----------------
MaskRay wrote:
> sammccall wrote:
> > MaskRay wrote:
> > > I also don't understand why it clamps the value to zero here. Negative
> > > values are also meaningful to me. Given that perfectBonus is only 3 it is
> > > very easy to get a negative value here.
> > An important part of the contract of `match()` is that it returns a value
> > in `[0,1]`.
> > We rely on this range to combine this with other scoring signals - we
> > multiply this by a quality signal in code completion.
> > (Currently the quality signal is just derived from Sema, but the global
> > index will provide the number of uses).
> >
> > It would be possible to use a different squash function here, but I found
> > max(kFloor,x) worked well for the examples I looked at - anything <= some
> > floor value was "not really a useful match at all", and most of the
> > variance below the floor seemed to be noise to me.
> > (Then I tuned the bonuses/penalties so the floor was at zero)
> We could try other criteria in the future. I believe the current one can be
> improved because negative scores may be returned but the scoring shouldn't
> return 0 for all the cases.
Sure, we can try other things, and to gather more data.
(To be clear though - with the data I *did* look at, including the scores <0
did not add more information, only noise)
================
Comment at: clangd/FuzzyMatch.cpp:230
void FuzzyMatcher::buildGraph() {
+ Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};
for (int W = 0; W < WordN; ++W) {
----------------
MaskRay wrote:
> sammccall wrote:
> > why this change?
> > this has also been moved from the cheaper constructor to the more expensive
> > per-match call. (also the diagonal assignment added in the next loop)
> >
> > Also, shouldn't [0][0][Match] be AwfulScore?
> >
> "The more expensive per-match call" is just two value assignments.
>
> I have removed the expensive table initialization in the constructor.
>
> [0][0][*] can be any value.
> "The more expensive per-match call" is just two value assignments.
Oops, sorry - by "more expensive" I mean "called thousands of times more often".
> I have removed the expensive table initialization in the constructor.
I don't want to be rude, but I asked why you changed this, and you didn't
answer. Unless there's a strong reason, I'd prefer to revert this change, as I
find this harder to reason about.
(Roughly: in the old version of the code, any data that didn't need to change
for the life of the object was initialized in the constructor. That way I
didn't need to worry what was performance-critical and what wasn't - match()
only did what was strictly necessary).
> [0][0][*] can be any value.
Can you please explain why?
================
Comment at: clangd/FuzzyMatch.cpp:325
+ int W = PatN;
+ for (int I = PatN; ++I <= WordN; )
+ if (Scores[PatN][I][Match].Score > Scores[PatN][W][Match].Score)
----------------
MaskRay wrote:
> sammccall wrote:
> > nit: I -> P, move increment to the increment expression of the for loop?
> I -> P.
>
> > move increment to the increment expression of the for loop?
>
> Not sure about the coding standard here, but if you insist I'll have to
> change it as you are the reviewer. If the loop variable was an iterator, `for
> (It I = std::next(...); I != E; ++I)` would be uglier than `for (It I = ...;
> ++I != E; )`
Uglier is subjective, but side-effects in the condition of a for-loop is
sufficiently unusual and surprising that I'd prefer to avoid it in both cases.
================
Comment at: clangd/FuzzyMatch.cpp:340
+ A[WordN] = Miss;
+ for (int I = WordN, P = PatN; I > 0; I--) {
+ if (I == W)
----------------
MaskRay wrote:
> sammccall wrote:
> > sammccall wrote:
> > > sammccall wrote:
> > > > W is the right name in this file for a variable iterating over word
> > > > indices, please don't change this.
> > > > The new variable above could be EndW or so?
> > > As far as I can see, this loop is setting `A[W+1:...] = Miss` and
> > > populating `A[0...W]` with the exsting logic.
> > > I think this would be clearer as two loops, currently there's a lot of
> > > conditionals around Last that obscure what's actually happening.
> > You've shifted P (and the old W, new I) by 1. This does reduce the number
> > of +1 and -1 in this function, but it's inconsistent with how these are
> > used elsewhere: P should be the index into Pat of the character that we're
> > considering.
> I don't understand the rationale not to use the shifted indices. The code
> actually use `Scores[P][W][*]` to mean the optimal match of the first `P`
> characters of the pattern with the first `W` characters of the word, not the
> position of the character.
>
> On the other hand, C++ reverse iterators use the shifted one for `for (I =
> rend(); I != rbegin(); ++I)`. The shifted one makes ending condition check
> easier.
> I don't understand the rationale not to use the shifted indices
The rationale is entirely consistency with the surrounding code. The
consistency helps avoid off-by-one errors when similar loops have different
conventions.
In this file, when looping over word or pattern dimensions, P and W
respectively are used for loop variables, and can be interpreted as indices
into Pat/Word.
Here the interpretation would be "did we match or miss character Word[W]"
================
Comment at: clangd/FuzzyMatch.cpp:93
+ for (int I = PatN; I <= WordN; I++)
+ Best = std::max(Best, Scores[PatN][I][Match].Score);
if (isAwful(Best))
----------------
MaskRay wrote:
> sammccall wrote:
> > MaskRay wrote:
> > > sammccall wrote:
> > > > this looks like a behavior change - why?
> > > This is a behavior change. Instead of choosing between `Match/Miss` in
> > > the last position, we enumerate the last matching position in `Word`.
> > >
> > > This saves `if (P < PatN - 1) {` check in the main loop at the cost of a
> > > for loop here (use sites of ending values)
> > Ah, I see - the case where we match only part of the word is handled up
> > here now.
> > (I think you mean this is not a behavior change? The result is the same
> > AFAICS)
> >
> > That does make more sense, but it's pretty subtle.
> > Can you add a comment like
> > `// The pattern doesn't have to match the whole word (but the whole
> > pattern must match).`
> Added
> ```
> // Find the optimal prefix of Word to match Pattern.
> ```
>
> I meant this is a behavior change but it makes the first row and the rest
> rows of the score table more consistent.
That comment really doesn't capture what's significant about this line - it's
the *policy*, rather than the mechanism, that needs highlighting here.
(Re: behavior change - I *think* there's no inputs for which we produce a
different match result/score because of this patch, right?)
================
Comment at: clangd/FuzzyMatch.h:61
bool allowMatch(int P, int W) const;
- int skipPenalty(int W, Action Last) const;
- int matchBonus(int P, int W, Action Last) const;
+ int missScore(int W, Action Last) const;
+ int matchScore(int P, int W, Action Last) const;
----------------
MaskRay wrote:
> MaskRay wrote:
> > sammccall wrote:
> > > FWIW, I don't think this is an improvement - I think the clarity of
> > > purpose in names is more useful than having consistent signs in this case.
> > Keep `matchBonus` but rename `skipPenalty` to `missPenalty` ?
> Also note in the original scheme, the skip score does not need to be
> negative. Because Scores[PatN][WordN][] is used and each path takes the same
> number of positions (WordN). We can add an offset to all positional scores to
> make all of them non-negative. In the new scheme it does make sense to make
> them negative, as each path has now different number of positions.
missPenalty SGTM.
(I don't see any particular reason to avoid negative numbers here - it has a
natural interpretation: a positive increment means the match is better than if
that action wasn't taken, negative means it's worse, etc)
Repository:
rCTE Clang Tools Extra
https://reviews.llvm.org/D44720
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits