Re: Match in the process of filter, not end, does it mean "not matching"?

Erick Erickson Tue, 31 May 2011 07:04:05 -0700

Take a closer look at the results of KeywordTokenizerFactory. It won't break
up the text into any tokens, the entire input is considered a single string. Are
you sure this is what you intend?


I'd start by removing most of your filters, understanding what's happening
at each step then adding them back in again. For instance, it's
unusual (but possibly correct) to use both the MappingCharFilterFactory and
ISOLatin... factory. And I'm not even sure what all the *gram* filters are
doing in a KeywordTokenized field......

Best
Erick

On Sun, May 29, 2011 at 8:39 PM, Ellery Leung <elleryle...@be-o.com> wrote:
> This is the schema:
>
>
>
>                <fieldType name="textContains" class="solr.TextField"
> positionIncrementGap="100">
>
>                        <analyzer type="index">
>
>                                <charFilter
> class="solr.MappingCharFilterFactory"
> mapping="../../filters/filter-mappings.txt"/>
>
>                                <charFilter
> class="solr.HTMLStripCharFilterFactory" />
>
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>
>                                <filter
> class="solr.ISOLatin1AccentFilterFactory"/>
>
>                                <filter class="solr.TrimFilterFactory" />
>
>                                <filter class="solr.LowerCaseFilterFactory"
> />
>
>                                <filter
> class="solr.CommonGramsFilterFactory" words="../../filters/stopwords.txt"
> ignoreCase="true"/>
>
>                                <filter class="solr.ShingleFilterFactory"
> minShingleSize="2" maxShingleSize="30"/>
>
>                                <filter class="solr.NGramFilterFactory"
> minGramSize="2" maxGramSize="30"/>
>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory" />
>
>                        </analyzer>
>
>                        <analyzer type="query">
>
>                                <charFilter
> class="solr.MappingCharFilterFactory"
> mapping="../../filters/filter-mappings.txt"/>
>
>                                <charFilter
> class="solr.HTMLStripCharFilterFactory" />
>
>                                <tokenizer
> class="solr.KeywordTokenizerFactory"/>
>
>                                <filter
> class="solr.ISOLatin1AccentFilterFactory"/>
>
>                                <filter class="solr.TrimFilterFactory" />
>
>                                <filter class="solr.LowerCaseFilterFactory"
> />
>
>                                <filter
> class="solr.RemoveDuplicatesTokenFilterFactory" />
>
>                        </analyzer>
>
>                </fieldType>
>
>
>
> And there is a multiValued field:
>
>
>
> <field name="textContains_Something" type="textContains" multiValued="true"
> indexed="true" stored="true" />
>
>
>
> Now I want to search this string: Merry Christmas and Happy New Year
>
>
>
> In "Admin Analysis" in solr admin, it highlight (in light blue) the matching
> word in LowerCaseFilterFactory, CommonGramsFilterFactory and
> ShingleFilterFactory.  However, it does not have any highlight in
> NGramFilterFactory.
>
>
>
> Now, I did a search in full-interface mode in solr admin:
>
>
>
> textContains_Something:"Merry Christmas and Happy New Year"
>
>
>
> It contains NO RESULT.
>
>
>
> Does it mean that matching only counts after all tokenizer and filters?
>
>
>
> Thank you in advance for any help.
>
>

Re: Match in the process of filter, not end, does it mean "not matching"?

Reply via email to