Hi Mike, Thanks, that (what you said in the end) is precisely what I ended up doing. I'll post a new patch to SOLR-81 shortly.
Otis ----- Original Message ---- From: Mike Klaas <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Friday, December 22, 2006 5:23:42 PM Subject: Re: Help with spellchecker integration On 12/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > OG: Yes, adding those separate fieldtype definitions was my attempt at > getting separate sets of n-grams of different sizes: uni-bram, > bi-gram... But how do I get "3start", "4start", "2end", and "4end"? It looks > like I'd have to do this: > - To get 3start, pass "query string" to "gram3" type tokenizer, and keep only > the first token. > - To get 3end, pass "query string" to "gram3" type tokenizer, and keep only > the last token (this could be the same n-gram if query string is a 3-letter > word) > > But can this be configured somehow? I don't see a way to configure Solr to > do this. > > <!-- Here you map the @source="word" to @dest="gram2" > What is does is copying the word input to the gram2 field--> > <copyField source="word" dest="gram2"/> > ... > > OG: But doesn't this tell Solr to copy the _whole_ "word" into a field > _named_ "gram2"? The above fieldtype is a definition for a field of _type_ > "gram2". Let's say you define a field as follows: <field type="gram2" name="gram2field"> Then you can copy contents into it using: <copyField source="word" dest="gram2field"> The text will be analyzed as a field type "gram2" > What I need to tell Solr is: > "Take the field named word, analyze is as fieldtype gram2 and index it into a > field named gram2" > "Take the field named word, analyze is as fieldtype gram3 and index it into a > field named gram3" > ... This is covered by the above. > "Take the field named word, analyze is as fieldtype gram2 and index only the > 1st token into a field named 2start" > > "Take the field named word, analyze is as fieldtype gram3 and index only the > 1st token into a field named 3start" > > > ... > "Take the field named word, analyze is as fieldtype gram2 and index only the > last token into a field named 2end" > > > "Take the field named word, analyze is as fieldtype gram3 and index only the > last token into a field named 3end" > > OG: I think :). Doable? Hmm, the only way I can think of to do that is to define fieldtypes firstgram2, lastgram3, etc., which discards everything but the first/last token. This means you will be re-analyzing for every field, however. -Mike