Thanks Walter. I believe I have what I need now. Have a great weekend. On Fri, Oct 30, 2015 at 11:13 PM, Walter Underwood <wun...@wunderwood.org> wrote:
> Read the links I have sent. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > > > On Oct 30, 2015, at 7:10 PM, Robert Oschler <robert.osch...@gmail.com> > wrote: > > > > Thanks Walter. Are there any open source spell checkers that implement > the > > Peter Norvig or Damerau-Levenshtein algorithms? I'm short on time so I > > have to keep the custom coding down to a minimum. > > > > > > On Fri, Oct 30, 2015 at 8:02 PM, Walter Underwood <wun...@wunderwood.org > > > > wrote: > > > >> Dedicated spell-checkers have better algorithms than Solr. They usually > >> handle transposed characters as well as inserted, deleted, or > substituted > >> characters. This is an enhanced version of Levinshtein distance. It is > >> called Damerau-Levenshtein and is too expensive to use in Solr search. > >> Spell correctors can also use a bigger distance than 2, unlike Solr. > >> > >> The Peter Norvig corrector also handles words that have been run > together. > >> The Norvig corrector has been translated to many different computer > >> languages. > >> > >> The Norvig corrector is an interesting approach. It is well worth > reading > >> this short article to learn more about spelling correction. > >> > >> http://norvig.com/spell-correct.html < > http://norvig.com/spell-correct.html > >>> > >> > >> wunder > >> Walter Underwood > >> wun...@wunderwood.org > >> http://observer.wunderwood.org/ (my blog) > >> > >>> On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch...@gmail.com> > >> wrote: > >>> > >>> Hello Walter and Mikhail, > >>> > >>> Thank you for your answers. Do those spell checkers have the same or > >>> better fuzzy matching capability that SOLR/Lucene has (Lichtenstein, > max > >>> distance 2)? That's a critical requirement for my application. I take > >> it > >>> by your suggestion of these spell checker apps they can easily be > >> extended > >>> with a user defined, supplementary dictionary, yes? > >>> > >>> Thanks. > >>> > >>> On Fri, Oct 30, 2015 at 3:07 PM, Mikhail Khludnev < > >>> mkhlud...@griddynamics.com> wrote: > >>> > >>>> Perhaps > >>>> FileBasedSpellChecker > >>>> https://cwiki.apache.org/confluence/display/solr/Spell+Checking > >>>> > >>>> On Fri, Oct 30, 2015 at 9:37 PM, Robert Oschler < > >> robert.osch...@gmail.com> > >>>> wrote: > >>>> > >>>>> Hello everyone, > >>>>> > >>>>> I have a gigantic list of industry terms that I want to import into a > >>>>> Solr/Lucene instance running on an AWS box. What is the fastest way > to > >>>>> import the list into my Solr/Lucene instance? I have admin/sudo > >>>> privileges > >>>>> on the box. > >>>>> > >>>>> Also, is there a document that shows me how to set up my Solr/Lucene > >>>> config > >>>>> file to be optimized for fast searches on single word entries using > >> fuzzy > >>>>> search? I intend to use this Solr/Lucene instance to do spell > checking > >>>> on > >>>>> the big industry word list I mentioned above. Each data record will > >> be a > >>>>> single word from the file. I'll want to take a single word query and > >> do > >>>> a > >>>>> fuzzy search on the word against the index (Lichtenstein, max > distance > >> 2 > >>>> as > >>>>> per Solr/Lucene's fuzzy search feature). So what parameters will > >>>> configure > >>>>> Solr/Lucene to be optimized for such a search? Also, if a document > >> shows > >>>>> the best index/read parameters to support single word fuzzy searching > >>>> then > >>>>> that would be a big help too. Note, the contents of the index will > >>>> change > >>>>> very infrequently if that affects the optimal parameter mix. > >>>>> > >>>>> > >>>>> -- > >>>>> Thanks, > >>>>> Robert Oschler > >>>>> Twitter -> http://twitter.com/roschler > >>>>> http://www.RobotsRule.com/ > >>>>> http://www.Robodance.com/ > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Sincerely yours > >>>> Mikhail Khludnev > >>>> Principal Engineer, > >>>> Grid Dynamics > >>>> > >>>> <http://www.griddynamics.com> > >>>> <mkhlud...@griddynamics.com> > >>>> > >>> > >>> > >>> > >>> -- > >>> Thanks, > >>> Robert Oschler > >>> Twitter -> http://twitter.com/roschler > >>> http://www.RobotsRule.com/ > >>> http://www.Robodance.com/ > >> > >> > > > > > > -- > > Thanks, > > Robert Oschler > > Twitter -> http://twitter.com/roschler > > http://www.RobotsRule.com/ > > http://www.Robodance.com/ > > -- Thanks, Robert Oschler Twitter -> http://twitter.com/roschler http://www.RobotsRule.com/ http://www.Robodance.com/