If you know the misspellings you could prevent them from being added to the dictionary with a StopFilterFactory like so:
<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" > <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="misspelled_words.txt"/> <filter class="solr.PatternReplaceFilterFactory" pattern="([^a-z])" replacement="" replace="all"/> <filter class="solr.LengthFilterFactory" min="2" max="50"/> </analyzer> </fieldType> where misspelled_words.txt contains the misspellings. On Mon, Oct 18, 2010 at 5:14 PM, Pradeep Singh <pksing...@gmail.com> wrote: > I think a spellchecker based on your index has clear advantages. You can > spellcheck words specific to your domain which may not be available in an > outside dictionary. You can always dump the list from wordnet to get a > starter english dictionary. > > But then it also means that misspelled words from your domain become the > suggested correct word. Hmmm ... you'll need to have a way to prune out > such > words. Even then, your own domain based dictionary is a total go. > > On Mon, Oct 18, 2010 at 1:55 PM, Jonathan Rochkind <rochk...@jhu.edu> > wrote: > > > In general, the benefit of the built-in Solr spellcheck is that it can > use > > a dictionary based on your actual index. > > > > If you want to use some external API, you certainly can, in your actual > > client app -- but it doesn't really need to involve Solr at all anymore, > > does it? Is there any benefit I'm not thinking of to doing that on the > solr > > side, instead of just in your client app? > > > > I think Yahoo (and maybe Microsoft?) have similar APIs with more generous > > ToSs, but I haven't looked in a while. > > > > > > Xin Li wrote: > > > >> Oops, never mind. Just read Google API policy. 1000 queries per day > limit > >> & for non-commercial use only. > >> > >> > >> -----Original Message----- > >> From: Xin Li Sent: Monday, October 18, 2010 3:43 PM > >> To: solr-user@lucene.apache.org > >> Subject: Spell checking question from a Solr novice > >> > >> Hi, > >> I am looking for a quick solution to improve a search engine's spell > >> checking performance. I was wondering if anyone tried to integrate > Google > >> SpellCheck API with Solr search engine (if possible). Google spellcheck > came > >> to my mind because of two reasons. First, it is costly to clean up the > data > >> to be used as spell check baseline. Secondly, google probably has the > most > >> complete set of misspelled search terms. That's why I would like to know > if > >> it is a feasible way to go. > >> > >> Thanks, > >> Xin > >> This electronic mail message contains information that (a) is or may be > >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM > >> DISCLOSURE, and (b) is intended only for the use of the > >> addressee(s) named herein. If you are not an intended recipient, please > >> contact the sender immediately and take the steps necessary to delete > the > >> message completely from your computer system. > >> > >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform > >> Electronic Transaction Act or any other law of similar effect, absent an > >> express statement to the contrary, this e-mail message, its contents, > and > >> any attachments hereto are not intended to represent an offer or > acceptance > >> to enter into a contract and are not otherwise intended to bind this > sender, > >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or > >> entity. > >> This electronic mail message contains information that (a) is or may be > >> CONFIDENTIAL, PROPRIETARY IN NATURE, OR OTHERWISE PROTECTED BY LAW FROM > >> DISCLOSURE, and (b) is intended only for the use of the > >> addressee(s) named herein. If you are not an intended recipient, please > >> contact the sender immediately and take the steps necessary to delete > the > >> message completely from your computer system. > >> > >> Not Intended as a Substitute for a Writing: Notwithstanding the Uniform > >> Electronic Transaction Act or any other law of similar effect, absent an > >> express statement to the contrary, this e-mail message, its contents, > and > >> any attachments hereto are not intended to represent an offer or > acceptance > >> to enter into a contract and are not otherwise intended to bind this > sender, > >> barnesandnoble.com llc, barnesandnoble.com inc. or any other person or > >> entity. > >> > >> > > >