Read the links I have sent. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 7:10 PM, Robert Oschler <robert.osch...@gmail.com> wrote: > > Thanks Walter. Are there any open source spell checkers that implement the > Peter Norvig or Damerau-Levenshtein algorithms? I'm short on time so I > have to keep the custom coding down to a minimum. > > > On Fri, Oct 30, 2015 at 8:02 PM, Walter Underwood <wun...@wunderwood.org> > wrote: > >> Dedicated spell-checkers have better algorithms than Solr. They usually >> handle transposed characters as well as inserted, deleted, or substituted >> characters. This is an enhanced version of Levinshtein distance. It is >> called Damerau-Levenshtein and is too expensive to use in Solr search. >> Spell correctors can also use a bigger distance than 2, unlike Solr. >> >> The Peter Norvig corrector also handles words that have been run together. >> The Norvig corrector has been translated to many different computer >> languages. >> >> The Norvig corrector is an interesting approach. It is well worth reading >> this short article to learn more about spelling correction. >> >> http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html >>> >> >> wunder >> Walter Underwood >> wun...@wunderwood.org >> http://observer.wunderwood.org/ (my blog) >> >>> On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch...@gmail.com> >> wrote: >>> >>> Hello Walter and Mikhail, >>> >>> Thank you for your answers. Do those spell checkers have the same or >>> better fuzzy matching capability that SOLR/Lucene has (Lichtenstein, max >>> distance 2)? That's a critical requirement for my application. I take >> it >>> by your suggestion of these spell checker apps they can easily be >> extended >>> with a user defined, supplementary dictionary, yes? >>> >>> Thanks. >>> >>> On Fri, Oct 30, 2015 at 3:07 PM, Mikhail Khludnev < >>> mkhlud...@griddynamics.com> wrote: >>> >>>> Perhaps >>>> FileBasedSpellChecker >>>> https://cwiki.apache.org/confluence/display/solr/Spell+Checking >>>> >>>> On Fri, Oct 30, 2015 at 9:37 PM, Robert Oschler < >> robert.osch...@gmail.com> >>>> wrote: >>>> >>>>> Hello everyone, >>>>> >>>>> I have a gigantic list of industry terms that I want to import into a >>>>> Solr/Lucene instance running on an AWS box. What is the fastest way to >>>>> import the list into my Solr/Lucene instance? I have admin/sudo >>>> privileges >>>>> on the box. >>>>> >>>>> Also, is there a document that shows me how to set up my Solr/Lucene >>>> config >>>>> file to be optimized for fast searches on single word entries using >> fuzzy >>>>> search? I intend to use this Solr/Lucene instance to do spell checking >>>> on >>>>> the big industry word list I mentioned above. Each data record will >> be a >>>>> single word from the file. I'll want to take a single word query and >> do >>>> a >>>>> fuzzy search on the word against the index (Lichtenstein, max distance >> 2 >>>> as >>>>> per Solr/Lucene's fuzzy search feature). So what parameters will >>>> configure >>>>> Solr/Lucene to be optimized for such a search? Also, if a document >> shows >>>>> the best index/read parameters to support single word fuzzy searching >>>> then >>>>> that would be a big help too. Note, the contents of the index will >>>> change >>>>> very infrequently if that affects the optimal parameter mix. >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> Robert Oschler >>>>> Twitter -> http://twitter.com/roschler >>>>> http://www.RobotsRule.com/ >>>>> http://www.Robodance.com/ >>>>> >>>> >>>> >>>> >>>> -- >>>> Sincerely yours >>>> Mikhail Khludnev >>>> Principal Engineer, >>>> Grid Dynamics >>>> >>>> <http://www.griddynamics.com> >>>> <mkhlud...@griddynamics.com> >>>> >>> >>> >>> >>> -- >>> Thanks, >>> Robert Oschler >>> Twitter -> http://twitter.com/roschler >>> http://www.RobotsRule.com/ >>> http://www.Robodance.com/ >> >> > > > -- > Thanks, > Robert Oschler > Twitter -> http://twitter.com/roschler > http://www.RobotsRule.com/ > http://www.Robodance.com/