OK, I just need to define 2 spellcheckers in solrconfig.xml for my purpose.
On 8/11/07, climbingrose <[EMAIL PROTECTED]> wrote: > > After looking the SpellChecker code, I realised that it only supports > single-word. I made a very naive modification of SpellCheckerHandler to get > multi-word support. Now the other problem that I have is how to have > different fields in SpellChecker index. For example, since my query has two > parts: "description" and "location", I don't want to build a spellchecker > index which combines both "description" and "location" into one > termSourceField. I want to check "description" part with the "description" > field in the spellchecker index and "location" part with "location" field in > the index. Otherwise I might have irrelevant suggestions for the "location" > part since the number of terms in "location" is generally much smaller > compared with that of "description". Any ideas? > > Thanks. > > On 8/11/07, climbingrose <[EMAIL PROTECTED]> wrote: > > > > The spellchecker handler doesn't seem to work with multi-word query. For > > example, when I tried to spellcheck "Java developar", it returns nothing > > while if I tried "developar", spellchecker correctly returns > > "developer". I followed the setup on the wiki. > > > > Regards, > > > > Cuong Hoang > > > > On 7/10/07, Charles Hornberger < [EMAIL PROTECTED]> wrote: > > > > > > For what it's worth, I recently did a quick implementation of the > > > spellchecker feature, and I simply created another field in my schema > > > (Iike 'spell' in Tristan's example below). After feeding content into > > > my search index, I used the spell field into add one single-field > > > document for every distinct word in my document collection (I'm > > > assuming the content folks have run spell-checkers :-)). E.g.: > > > > > > <doc><field name="spell">aardvark</field></doc> > > > <doc><field name="spell">abacus</field></doc> > > > <doc><field name="spell">abbot</field></doc> > > > <doc><field name="spell">acacia</field></doc> > > > etc. > > > > > > I also added some extra documents for proper names that appear in my > > > documents. For instance, there are a couple fields that have > > > comma-separated list of names, so I for each of those -- in addition > > > to documents for "john", "doe", and "jane", which were generated by > > > the naive word-splitting done in the first pass -- I added documents > > > like so: > > > > > > <doc><field name="spell">john doe</field></doc> > > > <doc><field name="spell">jane doe</field></doc> > > > etc. > > > > > > You could do the same for other searchable multi-word tokens in your > > > input -- song/album/book/movie titles, publisher names, geographic > > > names (cities, neighborhoods, etc.), product names, and so on. > > > > > > -Charlie > > > > > > On 7/9/07, Tristan Vittorio <[EMAIL PROTECTED]> wrote: > > > > I think there is some confusion regarding how the spell checker > > > actually > > > > uses the termSourceField. It is suggested that you use a simple > > > field type > > > > such a "string", however since this field type does not tokenize or > > > split > > > > words, it is only useful in situations where the whole field is > > > considered a > > > > dictionary "word": > > > > > > > > <add> > > > > <doc> > > > > <field name="title">Accountant</field> > > > > <http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand > > > ><field > > > > name="title">Auditor</field> > > > > <field name="title">Solicitor</field> > > > > </doc > > > > </add> > > > > > > > > The follow example case will not work with spell checker since the > > > whole > > > > field is considered a single word or string: > > > > > > > > <add> > > > > <doc> > > > > <field name="title">Accountant reveals that Accounting is > > > boring</field> > > > > </doc > > > > </add> > > > > > > > > I might suggest that you create an additional field in your schema > > > that > > > > takes advantage of the StandardTokenizer and StandardFilter which > > > doesn't > > > > perform a great deal of processing on the field yet should provide > > > decent > > > > results when used with the spell checker: > > > > > > > > <fieldType name="spell" class="solr.TextField" > > > positionIncrementGap="100"> > > > > <analyzer type="index"> > > > > <tokenizer class="solr.StandardTokenizerFactory "/> > > > > <filter class="solr.StopFilterFactory" ignoreCase="true" words=" > > > > stopwords.txt"/> > > > > <filter class="solr.StandardFilterFactory"/> > > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > > </analyzer> > > > > <analyzer type="query"> > > > > <tokenizer class="solr.StandardTokenizerFactory "/> > > > > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt > > > " > > > > ignoreCase="true" expand="true"/> > > > > <filter class="solr.StopFilterFactory " ignoreCase="true" > > > words=" > > > > stopwords.txt"/> > > > > <filter class="solr.StandardFilterFactory"/> > > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory "/> > > > > </analyzer> > > > > </fieldType> > > > > > > > > If you want this field to be automatically populated with the > > > contents of > > > > the title field when a document is added to the index, simply use a > > > > copyField: > > > > > > > > <copyField source="title" dest="spell"/> > > > > > > > > Hope this helps, let me know if this is still not clear, I probably > > > will add > > > > it to the wiki page soon. > > > > > > > > cheers, > > > > Tristan > > > > > > > > > > > > > > > > On 7/9/07, climbingrose <[EMAIL PROTECTED] > wrote: > > > > > > > > > > Thanks for the quick reply. However, I'm still not able to setup > > > > > spellchecker. Solr does create spell directory under data but > > > doesn't seem > > > > > to build the spellchecker index. Here are snippets of my > > > schema.xml: > > > > > > > > > > <field name="title" type="string" indexed="true" stored="true"/> > > > > > > > > > > <requestHandler name="spellchecker" class=" > > > solr.SpellCheckerRequestHandler > > > > > " > > > > > startup="lazy"> > > > > > <!-- default values for query parameters --> > > > > > <lst name="defaults"> > > > > > <int name="suggestionCount">1</int> > > > > > <float name="accuracy">0.5</float> > > > > > </lst> > > > > > > > > > > <!-- Main init params for handler --> > > > > > > > > > > <!-- The directory where your SpellChecker Index should > > > live. --> > > > > > <!-- May be absolute, or relative to the Solr "dataDir" > > > directory. > > > > > --> > > > > > <!-- If this option is not specified, a RAM directory will be > > > used > > > > > --> > > > > > <str name="spellcheckerIndexDir">spell</str> > > > > > > > > > > <!-- the field in your schema that you want to be able to > > > build --> > > > > > <!-- your spell index on. This should be a field that uses a > > > very --> > > > > > <!-- simple FieldType without a lot of Analysis (ie: string) > > > --> > > > > > <str name="termSourceField">title</str> > > > > > > > > > > </requestHandler> > > > > > > > > > > I tried this url: > > > > > > > > > > > > > http://localhost:8984/solr/select/?q=Accountent&qt=spellchecker&cmd=rebuildand > > > > > receive this: > > > > > > > > > > <response> > > > > > <lst name="responseHeader"> > > > > > <int name="status">0</int> > > > > > <int name="QTime">2</int> > > > > > </lst> > > > > > <str name="cmdExecuted">rebuild</str> > > > > > <arr name="suggestions"/> > > > > > </response> > > > > > > > > > > > > > > > On 7/9/07, Tristan Vittorio < [EMAIL PROTECTED] > wrote: > > > > > > > > > > > > The spellchecker should be available in 1.2 release, your query > > > is > > > > > > incorrect, try the following: > > > > > > > > > > > > > > > > > > > > > > > http://localhost:8984/solr/select/?q=java&qt=spellchecker&termSourceField=title_text&cmd=rebuild > > > > > > > > > > > > > > > the 'q' parameter must only contain the word being checked; you > > > must > > > > > > specify > > > > > > the field separately. You can set "termSourceField" in your > > > > > > solrconfig.xmlfile so you do not need to explicitly set it each > > > time > > > > > > you want to run a > > > > > > spell check query. Also make sure your field isn't heavily > > > processed ( > > > > > i.e. > > > > > > with porter stemmer analyzers) otherwise the suggestions will > > > look a bit > > > > > > weird / mangled. Take a look at the wiki page for more info: > > > > > > > > > > > > http://wiki.apache.org/solr/SpellCheckerRequestHandler > > > > > > > > > > > > cheers, > > > > > > Tristan > > > > > > > > > > > > > > > > > > > > > > > > On 7/9/07, climbingrose < [EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > Hi Tristan, > > > > > > > > > > > > > > Is this spellchecker available in 1.2 release or I have to > > > build the > > > > > > > trunk. > > > > > > > I tried your instructions but Solr returns nothing: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://localhost:8984/solr/select/?q=title_text:java&qt=spellchecker&cmd=rebuild > > > > > > > > > > > > > > Result: > > > > > > > > > > > > > > <response> > > > > > > > <lst name="responseHeader"> > > > > > > > <int name="status">0</int> > > > > > > > <int name="QTime">3</int> > > > > > > > </lst> > > > > > > > <str name="cmdExecuted">rebuild</str> > > > > > > > <arr name="suggestions"/> > > > > > > > </response> > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > On 7/8/07, Tristan Vittorio < [EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > > > > Hi Otis, > > > > > > > > > > > > > > > > I have written a draft wiki entry for the spell checker: > > > > > > > > http://wiki.apache.org/solr/SpellCheckerRequestHandler > > > > > > > > > > > > > > > > I've learned that my initial observation about the > > > suggestion > > > > > ordering > > > > > > > was > > > > > > > > incorrect, it does in fact order the results by popularity > > > (or term > > > > > > > > frequency) of the word in the termSourceField, the problem I > > > > > > > > > experienced > > > > > > > > was > > > > > > > > caused by setting termSourceField to a field of type "text", > > > which > > > > > > > heavily > > > > > > > > stemmed and analyzed the words. I found that using the > > > > > > > StandardTokenizer > > > > > > > > and StandardFilter and removing the PorterStemmer and > > > > > LowerCaseFilter > > > > > > > from > > > > > > > > the field schema really improved the spell checker > > > performance. > > > > > > > > > > > > > > > > I haven't included this info on the wiki page yet, I'll try > > > to > > > > > update > > > > > > it > > > > > > > > soon when I have a bit more time. > > > > > > > > > > > > > > > > cheers, > > > > > > > > Tristan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/8/07, Otis Gospodnetic < [EMAIL PROTECTED]> > > > wrote: > > > > > > > > > > > > > > > > > > Tristan - good summary - want to copy that to the Solr > > > Wiki? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > Otis > > > > > > > > > > > > > > > > > > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . > > > . > > > > > > > > > Simpy -- http://www.simpy.com/ > > > - Tag - Search - Share > > > > > > > > > > > > > > > > > > ----- Original Message ---- > > > > > > > > > From: Tristan Vittorio < [EMAIL PROTECTED]> > > > > > > > > > To: solr-user@lucene.apache.org > > > > > > > > > Sent: Saturday, July 7, 2007 1:51:15 AM > > > > > > > > > Subject: Re: Spell Check Handler > > > > > > > > > > > > > > > > > > I couldn't find any documention on the spell check handler > > > either > > > > > > but > > > > > > > > > found > > > > > > > > > enough information from the solrconfig.xml file, simply > > > search for > > > > > > > > > "SpellCheckerRequestHandler" (online version here): > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://svn.apache.org/repos/asf/lucene/solr/trunk/example/solr/conf/solrconfig.xml > > > > > > > > > > > > > > > > > > You can view the original development discussion from JIRA > > > (not > > > > > sure > > > > > > > how > > > > > > > > > helpful that will be for you though): > > > > > > > > > https://issues.apache.org/jira/browse/SOLR-81 > > > > > > > > > > > > > > > > > > In a nutshell, the configuration parameters available > > > are:: > > > > > > > > > > > > > > > > > > suggestionCount: determines how many spelling suggestions > > > are > > > > > > > returned. > > > > > > > > > accuracy: a float value between 1.0 and 0.0 on how close > > > the > > > > > > suggested > > > > > > > > > words > > > > > > > > > should match the original word being checked. > > > > > > > > > spellcheckerIndexDir and termSourceField: check > > > solrconfig.xmlfor > > > > > > a > > > > > > > > full > > > > > > > > > explanation. > > > > > > > > > > > > > > > > > > In order to use the spell checking hander for the first > > > time, you > > > > > > need > > > > > > > > to > > > > > > > > > explicitly build the spelling index with a sample query > > > something > > > > > > like > > > > > > > > > this: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker&cmd=rebuild > > > > > > > > > <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker > > > > > > > > > > > > > Depending on how large you main index is, this rebuild > > > operation > > > > > > could > > > > > > > > > take > > > > > > > > > a while. Subsequent queries can omit '&cmd=rebuild' and > > > will > > > > > return > > > > > > > > > results > > > > > > > > > much faster: > > > > > > > > > > > > > > > > > > > > > http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker > > > > > > > > > <http://localhost:8080/solr/select/?q=macrosoft&qt=spellchecker > > > > > > > > > > > > > The order of the suggestions returned seems to be based on > > > the > > > > > > > accuracy > > > > > > > > > figure (i.e. how close it matches the original word). it > > > would be > > > > > > > great > > > > > > > > to > > > > > > > > > be able to sort these suggested results based on term > > > frequency / > > > > > > > > document > > > > > > > > > frequency of the suggested word in the main index, since > > > the most > > > > > > > > accurate > > > > > > > > > suggestion may not always be the most relevant. > > > > > > > > > > > > > > > > > > As far as I can tell there is currently no way of doing > > > this using > > > > > > the > > > > > > > > > spellchecker handler alone (you could always run seperate > > > standard > > > > > > > > queries > > > > > > > > > on each word suggestion and order by numDocs, but that > > > would be > > > > > very > > > > > > > > > inefficient), has anybody else tried to achieve this? > > > > > > > > > > > > > > > > > > cheers, > > > > > > > > > Tristan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/7/07, Andrew Nagy < [EMAIL PROTECTED] > > > > wrote: > > > > > > > > > > > > > > > > > > > > Hello, is there any documentation on how to use the new > > > spell > > > > > > check > > > > > > > > > > module? > > > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > Andrew > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > Cuong Hoang > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > > > > > > Cuong Hoang > > > > > > > > > > > > > > > > > > > > -- > > Regards, > > > > Cuong Hoang > > > > > -- > Regards, > > Cuong Hoang -- Regards, Cuong Hoang