On 12/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

OG: Yes, adding those separate fieldtype definitions was my attempt at
getting separate sets of n-grams of different sizes: uni-bram,
bi-gram... But how do I get "3start", "4start", "2end", and "4end"?  It looks 
like I'd have to do this:
- To get 3start, pass "query string" to "gram3" type tokenizer, and keep only 
the first token.
- To get 3end, pass "query string" to "gram3" type tokenizer, and keep only the 
last token (this could be the same n-gram if query string is a 3-letter word)

But can this be configured somehow?  I don't see a way to configure Solr to do 
this.

<!-- Here you map the @source="word" to @dest="gram2"
     What is does is copying the word input to the gram2 field-->
<copyField source="word" dest="gram2"/>
...

OG: But doesn't this tell Solr to copy the _whole_ "word" into a field _named_ "gram2"?  
The above fieldtype is a definition for a field of _type_ "gram2".

Let's say you define a field as follows:
<field type="gram2" name="gram2field">

Then you can copy contents into it using:
<copyField source="word" dest="gram2field">

The text will be analyzed as a field type "gram2"

What I need to tell Solr is:
"Take the field named word, analyze is as fieldtype gram2 and index it into a field 
named gram2"
"Take the field named word, analyze is as fieldtype gram3 and index it into a field 
named gram3"
...

This is covered by the above.

"Take the field named word, analyze is as fieldtype gram2 and index only the 1st 
token into a field named 2start"

"Take the field named word, analyze is as fieldtype gram3 and index only the 1st 
token into a field named 3start"


...
"Take the field named word, analyze is as fieldtype gram2 and index only the last 
token into a field named 2end"


"Take the field named word, analyze is as fieldtype gram3 and index only the last 
token into a field named 3end"

OG: I think :).  Doable?

Hmm, the only way I can think of to do that is to define fieldtypes
firstgram2, lastgram3, etc., which discards everything but the
first/last token.  This means you will be re-analyzing for every
field, however.

-Mike

Reply via email to