Thanks Walter. Are there any open source spell checkers that implement the Peter Norvig or Damerau-Levenshtein algorithms? I'm short on time so I have to keep the custom coding down to a minimum.
On Fri, Oct 30, 2015 at 8:02 PM, Walter Underwood <wun...@wunderwood.org> wrote: > Dedicated spell-checkers have better algorithms than Solr. They usually > handle transposed characters as well as inserted, deleted, or substituted > characters. This is an enhanced version of Levinshtein distance. It is > called Damerau-Levenshtein and is too expensive to use in Solr search. > Spell correctors can also use a bigger distance than 2, unlike Solr. > > The Peter Norvig corrector also handles words that have been run together. > The Norvig corrector has been translated to many different computer > languages. > > The Norvig corrector is an interesting approach. It is well worth reading > this short article to learn more about spelling correction. > > http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html > > > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch...@gmail.com> > wrote: > > > > Hello Walter and Mikhail, > > > > Thank you for your answers. Do those spell checkers have the same or > > better fuzzy matching capability that SOLR/Lucene has (Lichtenstein, max > > distance 2)? That's a critical requirement for my application. I take > it > > by your suggestion of these spell checker apps they can easily be > extended > > with a user defined, supplementary dictionary, yes? > > > > Thanks. > > > > On Fri, Oct 30, 2015 at 3:07 PM, Mikhail Khludnev < > > mkhlud...@griddynamics.com> wrote: > > > >> Perhaps > >> FileBasedSpellChecker > >> https://cwiki.apache.org/confluence/display/solr/Spell+Checking > >> > >> On Fri, Oct 30, 2015 at 9:37 PM, Robert Oschler < > robert.osch...@gmail.com> > >> wrote: > >> > >>> Hello everyone, > >>> > >>> I have a gigantic list of industry terms that I want to import into a > >>> Solr/Lucene instance running on an AWS box. What is the fastest way to > >>> import the list into my Solr/Lucene instance? I have admin/sudo > >> privileges > >>> on the box. > >>> > >>> Also, is there a document that shows me how to set up my Solr/Lucene > >> config > >>> file to be optimized for fast searches on single word entries using > fuzzy > >>> search? I intend to use this Solr/Lucene instance to do spell checking > >> on > >>> the big industry word list I mentioned above. Each data record will > be a > >>> single word from the file. I'll want to take a single word query and > do > >> a > >>> fuzzy search on the word against the index (Lichtenstein, max distance > 2 > >> as > >>> per Solr/Lucene's fuzzy search feature). So what parameters will > >> configure > >>> Solr/Lucene to be optimized for such a search? Also, if a document > shows > >>> the best index/read parameters to support single word fuzzy searching > >> then > >>> that would be a big help too. Note, the contents of the index will > >> change > >>> very infrequently if that affects the optimal parameter mix. > >>> > >>> > >>> -- > >>> Thanks, > >>> Robert Oschler > >>> Twitter -> http://twitter.com/roschler > >>> http://www.RobotsRule.com/ > >>> http://www.Robodance.com/ > >>> > >> > >> > >> > >> -- > >> Sincerely yours > >> Mikhail Khludnev > >> Principal Engineer, > >> Grid Dynamics > >> > >> <http://www.griddynamics.com> > >> <mkhlud...@griddynamics.com> > >> > > > > > > > > -- > > Thanks, > > Robert Oschler > > Twitter -> http://twitter.com/roschler > > http://www.RobotsRule.com/ > > http://www.Robodance.com/ > > -- Thanks, Robert Oschler Twitter -> http://twitter.com/roschler http://www.RobotsRule.com/ http://www.Robodance.com/