Hi Phil, The WordDelimiterFilterFactory ( https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory) can be used to avoid splitting at hypen etc along with WhiteSpaceTokenizerFactory. Use generateWordParts="0"...
Thnx On Thu, Jun 8, 2017 at 10:39 PM, Phil Scadden <p.scad...@gns.cri.nz> wrote: > We have important entities referenced in indexed documents which have > convention naming of geographicname-number. Eg Wainui-8 > I want the tokenizer to treat it as Wainui-8 when indexing, and when I > search I want to a q of Wainui-8 (must it be specified as Wainui\-8 ??) to > return docs with Wainui-8 but not with Wainui-9 or plain Wainui. > > Docs are pdfs, and I have using tika to extract text. > > How do I set up solr for queries like this? > > Notice: This email and any attachments are confidential and may not be > used, published or redistributed without the prior written consent of the > Institute of Geological and Nuclear Sciences Limited (GNS Science). If > received in error please destroy and immediately notify GNS Science. Do not > copy or disclose the contents. >