On 12/22/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
OG: Yes, adding those separate fieldtype definitions was my attempt at getting separate sets of n-grams of different sizes: uni-bram, bi-gram... But how do I get "3start", "4start", "2end", and "4end"? It looks like I'd have to do this: - To get 3start, pass "query string" to "gram3" type tokenizer, and keep only the first token. - To get 3end, pass "query string" to "gram3" type tokenizer, and keep only the last token (this could be the same n-gram if query string is a 3-letter word) But can this be configured somehow? I don't see a way to configure Solr to do this. <!-- Here you map the @source="word" to @dest="gram2" What is does is copying the word input to the gram2 field--> <copyField source="word" dest="gram2"/> ... OG: But doesn't this tell Solr to copy the _whole_ "word" into a field _named_ "gram2"? The above fieldtype is a definition for a field of _type_ "gram2".
Let's say you define a field as follows: <field type="gram2" name="gram2field"> Then you can copy contents into it using: <copyField source="word" dest="gram2field"> The text will be analyzed as a field type "gram2"
What I need to tell Solr is: "Take the field named word, analyze is as fieldtype gram2 and index it into a field named gram2" "Take the field named word, analyze is as fieldtype gram3 and index it into a field named gram3" ...
This is covered by the above.
"Take the field named word, analyze is as fieldtype gram2 and index only the 1st token into a field named 2start" "Take the field named word, analyze is as fieldtype gram3 and index only the 1st token into a field named 3start" ... "Take the field named word, analyze is as fieldtype gram2 and index only the last token into a field named 2end" "Take the field named word, analyze is as fieldtype gram3 and index only the last token into a field named 3end" OG: I think :). Doable?
Hmm, the only way I can think of to do that is to define fieldtypes firstgram2, lastgram3, etc., which discards everything but the first/last token. This means you will be re-analyzing for every field, however. -Mike