On Sat, 2016-04-23 at 20:21 +0200, Bernhard Reutner-Fischer wrote:
> On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm <
> dmalc...@redhat.com> wrote:
> > On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote:
> > [...]
> > 
> > > diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c
> > > index 405bae0..72ed311 100644
> > > --- a/gcc/fortran/misc.c
> > > +++ b/gcc/fortran/misc.c
> > [...]
> > 
> > > @@ -274,3 +275,41 @@ get_c_kind(const char
> > > *c_kind_name,teropKind_tki
> > > nds_table[])
> > >  
> > >    return ISOCBINDING_INVALID;
> > >  }
> > > +
> > > +
> > > +/* For a given name TYPO, determine the best candidate from
> > > CANDIDATES
> > > +   perusing Levenshtein distance.  Frees CANDIDATES before
> > > returning.  */
> > > +
> > > +const char *
> > > +gfc_closest_fuzzy_match (const char *typo, char **candidates)
> > > +{
> > > +  /* Determine closest match.  */
> > > +  const char *best = NULL;
> > > +  char **cand = candidates;
> > > +  edit_distance_t best_distance = MAX_EDIT_DISTANCE;
> > > +
> > > +  while (cand && *cand)
> > > +    {
> > > +      edit_distance_t dist = levenshtein_distance (typo, *cand);
> > > +      if (dist < best_distance)
> > > + {
> > > +    best_distance = dist;
> > > +    best = *cand;
> > > + }
> > > +      cand++;
> > > +    }
> > > +  /* If more than half of the letters were misspelled, the
> > > suggestion is
> > > +     likely to be meaningless.  */
> > > +  if (best)
> > > +    {
> > > +      unsigned int cutoff = MAX (strlen (typo), strlen (best)) /
> > > 2;
> > > +
> > > +      if (best_distance > cutoff)
> > > + {
> > > +   XDELETEVEC (candidates);
> > > +   return NULL;
> > > + }
> > > +      XDELETEVEC (candidates);
> > > +    }
> > > +  return best;
> > > +}
> > 
> > FWIW, there are two overloaded variants of levenshtein_distance in
> > gcc/spellcheck.h, the first of which takes a pair of strlen values;
> > your patch uses the second one:
> > 
> > extern edit_distance_t
> > levenshtein_distance (const char *s, int len_s,
> >                   const char *t, int len_t);
> > 
> > extern edit_distance_t
> > levenshtein_distance (const char *s, const char *t);
> > 
> > So one minor tweak you may want to consider here is to calculate
> >  strlen (typo)
> > once at the top of gfc_closest_fuzzy_match, and then pass it in to
> > the
> > 4-arg variant of levenshtein_distance, which would avoid
> > recalculating
> > strlen (typo) for every candidate.
> 
> I've pondered this back then but came to the conclusion to use the
> variant without len because to use the 4 argument variant I would
> have stored the candidates strlen in the vector too

Why would you need to do that?  You can simply call strlen inside the
loop instead; something like:

  size_t strlen_typo = strlen (typo);
  while (cand && *cand)
    {
      edit_distance_t dist = levenshtein_distance (typo, strlen_typo,
                                                   *cand, strlen (*cand));

etc

>  and was not convinced about the memory footprint for that would be
> justified. Maybe it is, but I would prefer the following tweak in the
> 4 argument variant:
> If you would amend the 4 argument variant with a
> 
>   if (len_t == -1)
>     len_t = strlen (t);
> before the
>    if (len_s == 0)
>      return len_t;
>    if (len_t == 0)
>      return len_s;
> 
> checks then I'd certainly use the 4 arg variant :)
> 
> WDYT?
> > 
> > I can't comment on the rest of the patch (I'm not a Fortran
> > expert),
> > though it seems sane to 
> > 
> > Hope this is constructive
> 
> It is, thanks for your thoughts!
> 
> cheers,
> 

Reply via email to