Re: behavior of solr.KeepWordFilterFactory

Joe Zhang Sun, 02 Dec 2012 22:56:56 -0800

To be more specific, this is the data type I was using:

       <fieldType name="textspecial" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.KeepWordFilterFactory"
words="tickers.txt" ignoreCase="false"/>
                <filter class="solr.StopFilterFactory"
                    ignoreCase="true" words="stopwords.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0"
                    splitOnCaseChange="1"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.EnglishPorterFilterFactory"
                    protected="protwords.txt"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>



On Sun, Dec 2, 2012 at 11:51 PM, Joe Zhang <smartag...@gmail.com> wrote:

> yes, that is the correct behavior. But how do I achieve my goal, i.e,
> speical treatment on a list of uppercase/special words, normal treatment on
> everything else?
>
>
> On Sun, Dec 2, 2012 at 11:46 PM, Xi Shen <davidshe...@gmail.com> wrote:
>
>> By the definition on
>>
>> https://lucene.apache.org/solr/api-3_6_1/org/apache/solr/analysis/KeepWordFilter.html
>> ,
>> I am pretty sure it is the correct behavior of this filter :)
>>
>> I guess you are trying to this filter to index some special words in
>> Chinese?
>>
>>
>> On Mon, Dec 3, 2012 at 1:54 PM, Joe Zhang <smartag...@gmail.com> wrote:
>>
>> > I defined the following data type in my solr schema.xml
>> >
>> > <fieldtype name="testkeep" class="solr.TextField">
>> >    <analyzer>
>> >      <filter class="solr.KeepWordFilterFactory" words="keepwords.txt"
>> > ignoreCase="false"/>
>> >    </analyzer>
>> > </fieldtype>
>> >
>> > when I use the type "testkeep" to index a test field, my true expecation
>> > was to make sure solr indexes the uppercase form of a small list of
>> words
>> > in the file, AND TREAT EVERY OTHER WORD AS USUAL. The goal of securing
>> the
>> > closed list is achieved, but NO OTHER WORD outside the list is indexed!
>> >
>> > Can anybody help? Thanks in advance!
>> >
>> > Joe
>> >
>>
>>
>>
>> --
>> Regards，
>> David Shen
>>
>> http://about.me/davidshen
>> https://twitter.com/#!/davidshen84
>>
>
>

Re: behavior of solr.KeepWordFilterFactory

Reply via email to