Re: Stopwords magic

Jack Krupansky Tue, 31 Mar 2015 15:06:13 -0700

Use the Solr Admin UI analysis page to see how the text is analyzed at both
index and query time.


My e-book does have more narrative and examples for stop word processing:
http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

On Tue, Mar 31, 2015 at 5:41 PM, Alex Sylka <sylkaa...@gmail.com> wrote:

> My stopwords don't works as expected.
> Here is part of my schema:
>  <fieldType name="text_general" class="solr.TextField">
>         <analyzer type="index">
>             <tokenizer class="solr.KeywordTokenizerFactory"/>
>             <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>             <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>         <analyzer type="query">
>             <tokenizer class="solr.KeywordTokenizerFactory"/>
>             <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"/>
>             <filter class="solr.LowerCaseFilterFactory"/>
>         </analyzer>
>     </fieldType>
>  <fieldType class="solr.TextField" name="text_auto">
>         <analyzer type="index">
>             <charFilter class="solr.HTMLStripCharFilterFactory"/>
>             <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false"/>
>             <filter class="solr.LowerCaseFilterFactory"/>
>             <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>             <filter class="solr.ShingleFilterFactory" maxShingleSize="3"
> outputUnigrams="true" outputUnigramsIfNoShingles="false"/>
>         </analyzer>
>         <analyzer type="query">
>             <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>             <tokenizer class="solr.StandardTokenizerFactory"/>
>             <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false"/>
>         </analyzer>
>     </fieldType>
>  <field name="deal_title_terms" type="text_auto" indexed="true"
> stored="false" required="false" multiValued="true"/>
>     <field name="deal_description" type="text_general" indexed="true"
> stored="true" required="false" multiValued="false"/>
> In stopwords.txt I have next words: the, is, a;
> Also I have next data in my fields:
>
> deal_description - This is the my description
> deal_title_terms - This is the deal title a terms (will be splitted in
> terms)
>
> When I try to search deal_description:
> Example 1: "deal_description: *his is the m*" - I expect that document with
> deal_description "This is the my description" will be returned
> Example 2: "deal_description: *is th*" - I expect that nothing will be
> found because "is" and "the" are stopwords.
>
> When I try to search deal_title_terms:
> Example 1: "deal_title_terms: *is*" - I expect that nothing will be found
> because "is" is stopword.
> Example 2: "deal_title_terms: *is the deal*" - I expect that "is" and "the"
> will be ignored and term "deal" will be found.
> Example 3: "deal_title_terms: *title a terms*" - I expect that "a" will be
> ignored and term "title terms" will be found.
>
> Question 1: Why stopwords don't works for "deal_description" field ?
> Question 2: Why for field "deal_title_terms" stopwords not removed for my
> query ?(When I am trying to find *title a terms* it will not find "title
> terms" term)
> Question 3: Is there any way to show stopwords in search result but prevent
> them from searching ? Example:
>
> data: This is cool search engine
> search query : "*is coo*" -> return "This is cool search engine"
> search query : "*is*" -> return nothing
> search query : "*This coll*" -> return "This is cool search engine"
>
> Question 4: *Where I can find detailed description (maybe with examples)
> how stopwords works in solr ? Because it looks like magic.*
>

Re: Stopwords magic

Reply via email to