Problem: Indexed phrase: JetBlue Airlines Ideal matching queries: jetblue, "jet blue" "jetblue airway", "jetblue company"
I'd like to be able to use synonyms (to convert airway to airline), stopwords (to drop "company"), strip periods and use ASCII folding, and split on case. I'm close with the following: *** <tokenizer class="solr.KeywordTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.PatternReplaceFilterFactory" pattern="\." replacement="" /> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1" preserveOriginal="1"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/> <filter class="solr.SynonymFilterFactory" synonyms="syn.txt" ignoreCase="true" expand="true"/> *** Except the problem that I can't do synonyms or stopwords because of the non-tokenizing tokenizer. There's also the problem that a wildcard at the end of the exact-match returns nothing. Does anyone have suggestions on how this could be accomplished? The dataset is under 100k entries and none of the docs are more than 200 characters.