On Sat, 2016-04-23 at 20:21 +0200, Bernhard Reutner-Fischer wrote: > On March 7, 2016 3:57:16 PM GMT+01:00, David Malcolm < > dmalc...@redhat.com> wrote: > > On Sat, 2016-03-05 at 23:46 +0100, Bernhard Reutner-Fischer wrote: > > [...] > > > > > diff --git a/gcc/fortran/misc.c b/gcc/fortran/misc.c > > > index 405bae0..72ed311 100644 > > > --- a/gcc/fortran/misc.c > > > +++ b/gcc/fortran/misc.c > > [...] > > > > > @@ -274,3 +275,41 @@ get_c_kind(const char > > > *c_kind_name,teropKind_tki > > > nds_table[]) > > > > > > return ISOCBINDING_INVALID; > > > } > > > + > > > + > > > +/* For a given name TYPO, determine the best candidate from > > > CANDIDATES > > > + perusing Levenshtein distance. Frees CANDIDATES before > > > returning. */ > > > + > > > +const char * > > > +gfc_closest_fuzzy_match (const char *typo, char **candidates) > > > +{ > > > + /* Determine closest match. */ > > > + const char *best = NULL; > > > + char **cand = candidates; > > > + edit_distance_t best_distance = MAX_EDIT_DISTANCE; > > > + > > > + while (cand && *cand) > > > + { > > > + edit_distance_t dist = levenshtein_distance (typo, *cand); > > > + if (dist < best_distance) > > > + { > > > + best_distance = dist; > > > + best = *cand; > > > + } > > > + cand++; > > > + } > > > + /* If more than half of the letters were misspelled, the > > > suggestion is > > > + likely to be meaningless. */ > > > + if (best) > > > + { > > > + unsigned int cutoff = MAX (strlen (typo), strlen (best)) / > > > 2; > > > + > > > + if (best_distance > cutoff) > > > + { > > > + XDELETEVEC (candidates); > > > + return NULL; > > > + } > > > + XDELETEVEC (candidates); > > > + } > > > + return best; > > > +} > > > > FWIW, there are two overloaded variants of levenshtein_distance in > > gcc/spellcheck.h, the first of which takes a pair of strlen values; > > your patch uses the second one: > > > > extern edit_distance_t > > levenshtein_distance (const char *s, int len_s, > > const char *t, int len_t); > > > > extern edit_distance_t > > levenshtein_distance (const char *s, const char *t); > > > > So one minor tweak you may want to consider here is to calculate > > strlen (typo) > > once at the top of gfc_closest_fuzzy_match, and then pass it in to > > the > > 4-arg variant of levenshtein_distance, which would avoid > > recalculating > > strlen (typo) for every candidate. > > I've pondered this back then but came to the conclusion to use the > variant without len because to use the 4 argument variant I would > have stored the candidates strlen in the vector too
Why would you need to do that? You can simply call strlen inside the loop instead; something like: size_t strlen_typo = strlen (typo); while (cand && *cand) { edit_distance_t dist = levenshtein_distance (typo, strlen_typo, *cand, strlen (*cand)); etc > and was not convinced about the memory footprint for that would be > justified. Maybe it is, but I would prefer the following tweak in the > 4 argument variant: > If you would amend the 4 argument variant with a > > if (len_t == -1) > len_t = strlen (t); > before the > if (len_s == 0) > return len_t; > if (len_t == 0) > return len_s; > > checks then I'd certainly use the 4 arg variant :) > > WDYT? > > > > I can't comment on the rest of the patch (I'm not a Fortran > > expert), > > though it seems sane to > > > > Hope this is constructive > > It is, thanks for your thoughts! > > cheers, >