Re: Help with spellchecker integration

Otis Gospodnetic Fri, 22 Dec 2006 15:20:41 -0800

Hi Mike,

Thanks, that (what you said in the end) is precisely what I ended up doing.  
I'll post a new patch to SOLR-81 shortly.


Otis

----- Original Message ----
From: Mike Klaas <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Friday, December 22, 2006 5:23:42 PM
Subject: Re: Help with spellchecker integration

On 12/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

> OG: Yes, adding those separate fieldtype definitions was my attempt at
> getting separate sets of n-grams of different sizes: uni-bram,
> bi-gram... But how do I get "3start", "4start", "2end", and "4end"?  It looks 
> like I'd have to do this:
> - To get 3start, pass "query string" to "gram3" type tokenizer, and keep only 
> the first token.
> - To get 3end, pass "query string" to "gram3" type tokenizer, and keep only 
> the last token (this could be the same n-gram if query string is a 3-letter 
> word)
>
> But can this be configured somehow?  I don't see a way to configure Solr to 
> do this.
>
> <!-- Here you map the @source="word" to @dest="gram2"
>      What is does is copying the word input to the gram2 field-->
> <copyField source="word" dest="gram2"/>
> ...
>
> OG: But doesn't this tell Solr to copy the _whole_ "word" into a field 
> _named_ "gram2"?  The above fieldtype is a definition for a field of _type_ 
> "gram2".

Let's say you define a field as follows:
<field type="gram2" name="gram2field">

Then you can copy contents into it using:
<copyField source="word" dest="gram2field">

The text will be analyzed as a field type "gram2"

> What I need to tell Solr is:
> "Take the field named word, analyze is as fieldtype gram2 and index it into a 
> field named gram2"
> "Take the field named word, analyze is as fieldtype gram3 and index it into a 
> field named gram3"
> ...

This is covered by the above.

> "Take the field named word, analyze is as fieldtype gram2 and index only the 
> 1st token into a field named 2start"
>
> "Take the field named word, analyze is as fieldtype gram3 and index only the 
> 1st token into a field named 3start"
>
>
> ...
> "Take the field named word, analyze is as fieldtype gram2 and index only the 
> last token into a field named 2end"
>
>
> "Take the field named word, analyze is as fieldtype gram3 and index only the 
> last token into a field named 3end"
>
> OG: I think :).  Doable?

Hmm, the only way I can think of to do that is to define fieldtypes
firstgram2, lastgram3, etc., which discards everything but the
first/last token.  This means you will be re-analyzing for every
field, however.

-Mike

Re: Help with spellchecker integration

Reply via email to