Hi, I have the next text field: <fieldType name="words_ngram" class="solr.TextField" omitNorms="false"> <analyzer> <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" /> <filter class="solr.StopFilterFactory" words="url_stopwords.txt" ignoreCase="true" /> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType>
url_stopwords.txt looks like: http https ftp www So very simple. In index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But any of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser What do I do wrong? Analysis makes me think something is wrong with token positions: <http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg> but I was thinking StopFilterFactory is supposed to remove https/http/ftw/www keywords. Why do they figure there at all? That doesn't make much sense. Regards, Alexander -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html Sent from the Solr - User mailing list archive at Nabble.com.