Hi, I have the next text field:

<fieldType name="words_ngram" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.PatternTokenizerFactory" pattern="[^\w]+" />
    <filter class="solr.StopFilterFactory" words="url_stopwords.txt"
ignoreCase="true" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

url_stopwords.txt looks like:
http
https
ftp
www

So very simple. In index I have:
* twitter.com/testuser

All these queries do match:
* twitter.com/testuser
* com/testuser
* testuser

But any of these does:
* https://twitter.com/testuser
* https://www.twitter.com/testuser
* www.twitter.com/testuser

What do I do wrong? Analysis makes me think something is wrong with token
positions:
<http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg> 
but I was thinking StopFilterFactory is supposed to remove
https/http/ftw/www keywords. Why do they figure there at all? That doesn't
make much sense.

Regards,
Alexander



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to