Re: behavior of solr.KeepWordFilterFactory

Joe Zhang Sun, 02 Dec 2012 23:05:15 -0800

In other words, what I wanted to achieve is case-senstive indexing on a
small set of words. Can anybody help?


On Sun, Dec 2, 2012 at 11:56 PM, Joe Zhang <smartag...@gmail.com> wrote:

> To be more specific, this is the data type I was using:
>
>        <fieldType name="textspecial" class="solr.TextField"
>             positionIncrementGap="100">
>             <analyzer>
>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>                 <filter class="solr.KeepWordFilterFactory"
> words="tickers.txt" ignoreCase="false"/>
>                 <filter class="solr.StopFilterFactory"
>                     ignoreCase="true" words="stopwords.txt"/>
>                 <filter class="solr.WordDelimiterFilterFactory"
>                     generateWordParts="1" generateNumberParts="1"
>                     catenateWords="1" catenateNumbers="1" catenateAll="0"
>                     splitOnCaseChange="1"/>
>                 <filter class="solr.LowerCaseFilterFactory"/>
>                 <filter class="solr.EnglishPorterFilterFactory"
>                     protected="protwords.txt"/>
>                 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>             </analyzer>
>         </fieldType>
>
>
> On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang <smartag...@gmail.com> wrote:
>
>> yes, that is the correct behavior. But how do I achieve my goal, i.e,
>> speical treatment on a list of uppercase/special words, normal treatment on
>> everything else?
>>
>>
>> On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen <davidshe...@gmail.com> wrote:
>>
>>> By the definition on
>>>
>>> https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html
>>> ,
>>> I am pretty sure it is the correct behavior of this filter :)
>>>
>>> I guess you are trying to this filter to index some special words in
>>> Chinese?
>>>
>>>
>>> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang <smartag...@gmail.com> wrote:
>>>
>>> > I defined the following data type in my solr schema.xml
>>> >
>>> > <fieldtype name="testkeep" class="solr.TextField">
>>> >    <analyzer>
>>> >      <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"
>>> > ignoreCase="false"/>
>>> >    </analyzer>
>>> > </fieldtype>
>>> >
>>> > when I use the type "testkeep" to index a test field, my true
>>> expecation
>>> > was to make sure solr indexes the uppercase form of a small list of
>>> words
>>> > in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of securing
>>> the
>>> > closed list is achieved, but NO OTHER WORD outside the list is indexed!
>>> >
>>> > Can anybody help? Thanks in advance!
>>> >
>>> > Joe
>>> >
>>>
>>>
>>>
>>> --
>>> Regards，
>>> David Shen
>>>
>>> http://about.me/davidshen
>>> https://twitter.com/#!/davidshen84
>>>
>>
>>
>

Re: behavior of solr.KeepWordFilterFactory

Reply via email to