On Sat, 2016-04-23 at 20:21 +0200, Bernhard Reutner-Fischer wrote:
> On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm <
> [email protected]> wrote:
> > On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
> > [...]
> >
> > > diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
> > > index 405bae0..72ed311 100644
> > > --- a/gcc/fortran/misc.c
> > > +++ b/gcc/fortran/misc.c
> > [...]
> >
> > > @@ -274,3 +275,41 @@ get_c_kind(const char
> > > *c_kind_name,teropKind_tki
> > > nds_table[])
> > >
> > > return ISOCBINDING_INVALID;
> > > }
> > > +
> > > +
> > > +/* For a given name TYPO, determine the best candidate from
> > > CANDIDATES
> > > + perusing Levenshtein distance. Frees CANDIDATES before
> > > returning. */
> > > +
> > > +const char *
> > > +gfc_closest_fuzzy_match (const char *typo, char **candidates)
> > > +{
> > > + /* Determine closest match. */
> > > + const char *best = NULL;
> > > + char **cand = candidates;
> > > + edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> > > +
> > > + while (cand && *cand)
> > > + {
> > > + edit_distance_t dist = levenshtein_distance (typo, *cand);
> > > + if (dist < best_distance)
> > > + {
> > > + best_distance = dist;
> > > + best = *cand;
> > > + }
> > > + cand++;
> > > + }
> > > + /* If more than half of the letters were misspelled, the
> > > suggestion is
> > > + likely to be meaningless. */
> > > + if (best)
> > > + {
> > > + unsigned int cutoff = MAX (strlen (typo), strlen (best)) /
> > > 2;
> > > +
> > > + if (best_distance > cutoff)
> > > + {
> > > + XDELETEVEC (candidates);
> > > + return NULL;
> > > + }
> > > + XDELETEVEC (candidates);
> > > + }
> > > + return best;
> > > +}
> >
> > FWIW, there are two overloaded variants of levenshtein_distance in
> > gcc/spellcheck.h, the first of which takes a pair of strlen values;
> > your patch uses the second one:
> >
> > extern edit_distance_t
> > levenshtein_distance (const char *s, int len_s,
> > const char *t, int len_t);
> >
> > extern edit_distance_t
> > levenshtein_distance (const char *s, const char *t);
> >
> > So one minor tweak you may want to consider here is to calculate
> > strlen (typo)
> > once at the top of gfc_closest_fuzzy_match, and then pass it in to
> > the
> > 4-arg variant of levenshtein_distance, which would avoid
> > recalculating
> > strlen (typo) for every candidate.
>
> I've pondered this back then but came to the conclusion to use the
> variant without len because to use the 4 argument variant I would
> have stored the candidates strlen in the vector too
Why would you need to do that? You can simply call strlen inside the
loop instead; something like:
size_t strlen_typo = strlen (typo);
while (cand && *cand)
{
edit_distance_t dist = levenshtein_distance (typo, strlen_typo,
*cand, strlen (*cand));
etc
> and was not convinced about the memory footprint for that would be
> justified. Maybe it is, but I would prefer the following tweak in the
> 4 argument variant:
> If you would amend the 4 argument variant with a
>
> if (len_t == -1)
> len_t = strlen (t);
> before the
> if (len_s == 0)
> return len_t;
> if (len_t == 0)
> return len_s;
>
> checks then I'd certainly use the 4 arg variant :)
>
> WDYT?
> >
> > I can't comment on the rest of the patch (I'm not a Fortran
> > expert),
> > though it seems sane to
> >
> > Hope this is constructive
>
> It is, thanks for your thoughts!
>
> cheers,
>