Re: Fastest way to import a giant word list into Solr/Lucene?

Robert Oschler Fri, 30 Oct 2015 19:11:23 -0700

Thanks Walter.  Are there any open source spell checkers that implement the
Peter Norvig or Damerau-Levenshtein algorithms?  I'm short on time so I
have to keep the custom coding down to a minimum.



On Fri, Oct 30, 2015 at 8:02 PM, Walter Underwood <wun...@wunderwood.org>
wrote:

> Dedicated spell-checkers have better algorithms than Solr. They usually
> handle transposed characters as well as inserted, deleted, or substituted
> characters. This is an enhanced version of Levinshtein distance. It is
> called Damerau-Levenshtein and is too expensive to use in Solr search.
> Spell correctors can also use a bigger distance than 2, unlike Solr.
>
> The Peter Norvig corrector also handles words that have been run together.
> The Norvig corrector has been translated to many different computer
> languages.
>
> The Norvig corrector is an interesting approach. It is well worth reading
> this short article to learn more about spelling correction.
>
> http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html
> >
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch...@gmail.com>
> wrote:
> >
> > Hello Walter and Mikhail,
> >
> > Thank you for your answers.  Do those spell checkers have the same or
> > better fuzzy matching capability that SOLR/Lucene has (Lichtenstein, max
> > distance 2)?  That's a critical requirement for my application.  I take
> it
> > by your suggestion of these spell checker apps they can easily be
> extended
> > with a user defined, supplementary dictionary, yes?
> >
> > Thanks.
> >
> > On Fri, Oct 30, 2015 at 3:07 PM, Mikhail Khludnev <
> > mkhlud...@griddynamics.com> wrote:
> >
> >> Perhaps
> >> FileBasedSpellChecker
> >> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
> >>
> >> On Fri, Oct 30, 2015 at 9:37 PM, Robert Oschler <
> robert.osch...@gmail.com>
> >> wrote:
> >>
> >>> Hello everyone,
> >>>
> >>> I have a gigantic list of industry terms that I want to import into a
> >>> Solr/Lucene instance running on an AWS box.  What is the fastest way to
> >>> import the list into my Solr/Lucene instance?  I have admin/sudo
> >> privileges
> >>> on the box.
> >>>
> >>> Also, is there a document that shows me how to set up my Solr/Lucene
> >> config
> >>> file to be optimized for fast searches on single word entries using
> fuzzy
> >>> search?  I intend to use this Solr/Lucene instance to do spell checking
> >> on
> >>> the big industry word list I mentioned above.  Each data record will
> be a
> >>> single word from the file.  I'll want to take a single word query and
> do
> >> a
> >>> fuzzy search on the word against the index (Lichtenstein, max distance
> 2
> >> as
> >>> per Solr/Lucene's fuzzy search feature).  So what parameters will
> >> configure
> >>> Solr/Lucene to be optimized for such a search?  Also, if a document
> shows
> >>> the best index/read parameters to support single word fuzzy searching
> >> then
> >>> that would be a big help too.  Note, the contents of the index will
> >> change
> >>> very infrequently if that affects the optimal parameter mix.
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Robert Oschler
> >>> Twitter -> http://twitter.com/roschler
> >>> http://www.RobotsRule.com/
> >>> http://www.Robodance.com/
> >>>
> >>
> >>
> >>
> >> --
> >> Sincerely yours
> >> Mikhail Khludnev
> >> Principal Engineer,
> >> Grid Dynamics
> >>
> >> <http://www.griddynamics.com>
> >> <mkhlud...@griddynamics.com>
> >>
> >
> >
> >
> > --
> > Thanks,
> > Robert Oschler
> > Twitter -> http://twitter.com/roschler
> > http://www.RobotsRule.com/
> > http://www.Robodance.com/
>
>


-- 
Thanks,
Robert Oschler
Twitter -> http://twitter.com/roschler
http://www.RobotsRule.com/
http://www.Robodance.com/

Re: Fastest way to import a giant word list into Solr/Lucene?

Reply via email to