Hi guys, I'm on my way to solve it properly.
This is how my field looks like now: <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(#)|(%23)" replacement="79f20724d6985c5b857d2fa06a3ff8c6"/> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(((?i)((european parliament)|(parlament europeenne)))|(EP)|(PE))" replacement="0ee062d61f44ae0a2aee145076ca6a69european_parliament"/> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="solr.StopFilterFactory" words="blacklist.txt" ignoreCase="true"/> <filter class="solr.StopFilterFactory" words="en" ignoreCase="true"/> <filter class="solr.HunspellStemFilterFactory" dictionary="en_GB.dic" affix="en_GB.aff" ignoreCase="true" /> <filter class="solr.PatternReplaceFilterFactory" pattern="0ee062d61f44ae0a2aee145076ca6a69european_parliament" replacement="european parliament" replace="all" /> <filter class="solr.PatternReplaceFilterFactory" pattern="79f20724d6985c5b857d2fa06a3ff8c6" replacement="#" replace="all" /> </analyzer> I still have one case where I'm facing issues because in fact I want to preserve the #: - #European Parliament is translated into one token instead of two: "#European" and "Parliament"... anyway, I have some ideas on how to do it. Ill let you know whatss the final solution -- View this message in context: http://lucene.472066.n3.nabble.com/Facets-termvectors-relevancy-and-Multi-word-tokenizing-tp4120101p4120948.html Sent from the Solr - User mailing list archive at Nabble.com.