MaskRay added a comment.

Glad you took another look. I don't want to yield, let's find another reviewer 
:)



================
Comment at: clangd/FuzzyMatch.cpp:230
 void FuzzyMatcher::buildGraph() {
+  Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};
   for (int W = 0; W < WordN; ++W) {
----------------
sammccall wrote:
> MaskRay wrote:
> > MaskRay wrote:
> > > sammccall wrote:
> > > > MaskRay wrote:
> > > > > sammccall wrote:
> > > > > > why this change?
> > > > > > this has also been moved from the cheaper constructor to the more 
> > > > > > expensive per-match call. (also the diagonal assignment added in 
> > > > > > the next loop)
> > > > > > 
> > > > > > Also, shouldn't [0][0][Match] be AwfulScore?
> > > > > > 
> > > > > "The more expensive per-match call" is just two value assignments.
> > > > > 
> > > > > I have removed the expensive table initialization in the constructor.
> > > > > 
> > > > > [0][0][*] can be any value.
> > > > > "The more expensive per-match call" is just two value assignments.
> > > > Oops, sorry - by "more expensive" I mean "called thousands of times 
> > > > more often".
> > > > 
> > > > > I have removed the expensive table initialization in the constructor.
> > > > I don't want to be rude, but I asked why you changed this, and you 
> > > > didn't answer. Unless there's a strong reason, I'd prefer to revert 
> > > > this change, as I find this harder to reason about.
> > > > (Roughly: in the old version of the code, any data that didn't need to 
> > > > change for the life of the object was initialized in the constructor. 
> > > > That way I didn't need to worry what was performance-critical and what 
> > > > wasn't - match() only did what was strictly necessary).
> > > > 
> > > > > [0][0][*] can be any value.
> > > > Can you please explain why?
> > > > Oops, sorry - by "more expensive" I mean "called thousands of times 
> > > > more often".
> > > 
> > > It is true that `Scores[0][0][Miss] = Scores[0][0][Match] = {0, Miss};` 
> > > is the cost incurred for each word.
> > > 
> > > But **it is not full table initialization**, it is just two variable 
> > > assignments. And we will assign to other values of the first row 
> > > `Scores[0][*][*]` in the following loop. The old scatters the table 
> > > construction to **two places**, the constructor and this dynamic 
> > > programming site.
> > > [0][0][*] can be any value.
> > 
> > Can you please explain why?
> > 
> > `Scores[0][0][*]` is the initial value which will be propagated to all 
> > other values in the table.
> > The relative difference of pairwise values in the table is a constant 
> > whatever initial value is chosen.
> > 
> > If you ignore the max clamp you used later, the initial value does not 
> > matter.
> > 
> > 
> Is it not possible that we'll choose a best path starting at 
> Scores[0][0][Match]?
> This is invalid, and previously that was signaled by giving that cell 
> AwfulScore, which ensures any path that emerges from it isAwful.
I think `Scores[0][0][Match]` is as valid as `Scores[0][0][Miss]`. The argument 
is that a `Miss` state should ensure a skipped position in Text. When there is 
zero position, it cannot be `Miss`ed. A dynamic programming algorithm does not 
necessarily have only one valid initial state (thinking about the constant term 
in an indefinite integral). I will choose the one which makes more sense if 
such initial state exists or if there is one that simplifies the case. In this 
case treating both of them as valid simplifies the code.


Repository:
  rCTE Clang Tools Extra

https://reviews.llvm.org/D44720



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to