Re: Compound word search not what I expected

lee carroll Tue, 07 Jun 2011 14:54:46 -0700

see
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory


from the wiki

Example of generateWordParts="1" and catenateWords="1":
"PowerShot" -> 0:"Power", 1:"Shot" 1:"PowerShot"
(where 0,1,1 are token positions)
"A's+B's&C's" -> 0:"A", 1:"B", 2:"C", 2:"ABC"
"Super-Duper-XL500-42-AutoCoder!" -> 0:"Super", 1:"Duper", 2:"XL",
2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"

One use for WordDelimiterFilter is to help match words with different
delimiters. One way of doing so is to specify generateWordParts="1"
catenateWords="1" in the analyzer used for indexing, and
generateWordParts="1" in the analyzer used for querying. Given that
the current StandardTokenizer immediately removes many intra-word
delimiters, it is recommended that this filter be used after a
tokenizer that leaves them in place (such as WhitespaceTokenizer).

Re: Compound word search not what I expected

Reply via email to