On Thu, 2006-12-21 at 21:27 -0800, Otis Gospodnetic wrote: > Hi, > I'm trying to integrate the Lucene-based spellchecker > (http://wiki.apache.org/jakarta-lucene/SpellChecker + contrib/spellchecker > under Lucene) with Solr (http://issues.apache.org/jira/browse/SOLR-81) in > order to provide a query spellchecking service (you enter Speers and it > suggest pant^H^H ... Spears). I've created a generic NGramTokenizer (+ > NGramTokenizerFactory + unit test) that I'll attach to SOLR-81 shortly. > > What I'm not yet sure about is: > 1) integration of this generic n-grammer with that Lucene SpellChecker code - > SpellChecker & TRStringDistance classes in particular.
Hmm, reading SOLR-81, you actually have everything you need. > 2) mapping n-gram Tokens that come out of my NGramTokenizer to specific field > names, like 3start, 4start, gram1, gram2, gram3.... is there is scheme.xml > trick one can use to accomplish this? It is in the issue: ... <!-- Here you map the @source="word" to @dest="gram2" What is does is copying the word input to the gram2 field--> <copyField source="word" dest="gram2"/> ... <!-- Here you define what happens if the field "gram2" get indexed. The solr.NGramTokenizerFactory will return the different combination of tokens --> <fieldtype name="gram2" class="solr.TextField"> <analyzer> <!--more tokenizer --> <tokenizer class="solr.NGramTokenizerFactory" minGram="2" maxGram="2"/> </analyzer> </fieldtype> The above shows how to configure the second (spellcheck) index, however if you want to update both indexes at the same time you need to write your own implementation of the update servlet. > 3) once 2) is done, getting the.... request handler(?) to n-gram the query > appropriately and hit the SpellChecker index to try and find alternative > spelling suggestions. hmm, not sure, actually IMHO that highly depends on how you plan to use it in the end. I mean there is more then one way to use spell check. In the issue they talked about AJAX suggestions but that would be IMO before the actual search request. If you want to have it in the request handler then you need to decide how and when the spellchecker comes into place. Like if the normal search does not return a result or parallel. Parallel would search in the spell check index for alternatives, use this alternatives to dispatch the alternative word query and later on parse the result of directly into the output writer. Here you have again different alternatives, you can attack the solr index directly (loosing all the cool feature) Or you want the google thingy "Did you mean". ... in any form start with: public class NGramRequestHandler extends StandardRequestHandler implements SolrRequestHandler, SolrInfoMBean { public void handleRequest(SolrQueryRequest req, SolrQueryResponse rsp) { // Depending on the use case do your processing here } } This way you just need to implement the class specific methods. > > Damn, that's a lot of unknowns... on top of that my computer started freezing > every half an hour. Hi Murphy. > > > > Any pointers will be greatly appreciated. Thanks, HTH a wee bit. salu2 > Otis > > >