On 6/8/2017 8:39 PM, Phil Scadden wrote: > We have important entities referenced in indexed documents which have > convention naming of geographicname-number. Eg Wainui-8 > I want the tokenizer to treat it as Wainui-8 when indexing, and when I search > I want to a q of Wainui-8 (must it be specified as Wainui\-8 ??) to return > docs with Wainui-8 but not with Wainui-9 or plain Wainui. > > Docs are pdfs, and I have using tika to extract text. > > How do I set up solr for queries like this?
At indexing time, Solr does not treat the hyphen as a special character like it does at query time. Many analysis components do, though. If your analysis chain includes certain components (the standard tokenizer, the ICU tokenizer, and WordDelimeterFilter are on that list), then the hypen may be treated as a word break character and the analysis could remove it. At query time, a hyphen in the middle of a word is not treated as a special character. It would need to be at the beginning of the query text or after a space for the query parser to treat it as a negation. So Wainui-8 would not be a problem, but -7 would, and you'd need to specify it as \-7 for it to work like you want. Thanks, Shawn