r 06, 2012 7:35 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching for Partial Words
Thanks Jack.
In the configuration below:
What are the possible values for "side"?
If I understand it correctly, minGramSize=3 and side=front, will
include eng* but not en*. Is t
Yes, that is true. We are looking for partial word matches. It seems like
we can achieve this by using edge ngram for prefixes and adding wild card
at the end for ignoring suffix. If we set the edge ngram to 3. "eng" will
match ResidentEng but not ResidentEngineer. But a search for "eng*" will
matc
Look at the normal ngram tokenizer. "Engine" with ngram size 3 would yield
"eng" "ngi" "gin" "ine" so a search for engi should match. You can play
around with the min/max values. Edge ngram is useful for prefix matching
but sounds like you want intra-word matching too? ("eng" should match "
Residen
Thanks Jack.
In the configuration below:
What are the possible values for "side"?
If I understand it correctly, minGramSize=3 and side=front, will
include eng* but not en*. Is this correct? So, the minGramSize is for
number of characters allowed in the specified side.
Does it
Add an "edge" n-gram filter (EdgeNGramFilterFactory) to your "index"
analyzer. This will add all the prefixes of words to the index, so that a
query of "engi" will be equivalent to but much faster than the wildcard
engi*. You can specify a minimum size, such as 3 or 4 to eliminate tons of
too-s