Re: KeywordTokenizerFactory - trouble with "exact" matches

Jack Krupansky Wed, 29 Jan 2014 07:05:46 -0800

If you change the analyzer for a Solr field, such as adding, removing, orchanging attributes of token filters, you must/should reindex all data (addit to the index again to re-analyze it.) In your case, the data was indexedas lower case, so after your changes a query with upper case would notmatch.


-- Jack Krupansky

-----Original Message-----From: Aleksander Akerø

Sent: Wednesday, January 29, 2014 9:55 AM
To: solr-user@lucene.apache.org
Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches

update:

Guessing that this has nothing to do with the tokenizer. Tried to use the
string fieldtype as well, but still the same results. So this must have to
do with some other solr config.

What confuses me is that when I search "1005" which is another valid value
to search for, it works perfectly, but then again, this query contains no
whitespace.

Any ideas?

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no

2014-01-29 Aleksander Akerø <aleksan...@gurusoft.no>

Thanks for the quick answer, but it doesn't help if I remove the lowercase
analyzer like so:

*        <fieldType name="keyword" class="solr.TextField"
positionIncrementGap="100">*
*            <analyzer type="index">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*            </analyzer>*
*            <analyzer type="query">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*            </analyzer>*
*        </fieldType>*

 I still need to add quotes to the searchquery to get results. And the
weird thing is that if I use the analyzer and put in "FE 009" (again,
without quotes) for both index and query values, it highlights the result

as to show a match, but when i search using the GUI it gives me noresults.The same happens when posting directly to the /select requestHandler viaGET


These is what i post using GET:
http://mysite.com/solr/corename/select?q=number:FE%20009&qf=number    =>
this does not work
http://mysite.com/solr/corename/select?q=number:"FE%20009"&qf=number  =>
this works

Really starting to wonder if I am doing something terribly wrongsomewhere.


This is my requestHandler btw, pretty basic:
<!-- #### Default handler #### -->
    <requestHandler name="/select" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="echoParams">explicit</str>
            <str name="defType">edismax</str>
            <str name="q.alt">*:*</str>
            <str name="rows">10</str>
            <str name="fl">*,score</str>
            <str name="qf">number</str>
        </lst>
    </requestHandler>

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: aleksan...@gurusoft.no

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no


2014-01-29 Aruna Kumar Pamulapati <apamulap...@gmail.com>

Hi ,


I think the misunderstanding you are having is about

http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
lowercase
factory.

You are correct about KeywordTokenizerFactory  but lowercase factory :
Creates
tokens by lowercasing all letters and dropping non-letters.

The best place to play and learn these pipelines is Solr admin panel =>
analysis page.


thanks,
Arun


On Wed, Jan 29, 2014 at 9:05 AM, Aleksander Akerø <aleksan...@gurusoft.no
>wrote:

> Hi, I'll try properly this time.
>
> According to solr documentation the solr.KeywordTokenizerFactory should
not
> do any tokenizing at all. Thus, if I understand this correctly, it
should
> only return exact matches given that this is the only analyzer defined
in
> the field type. Such as the following config:
>
> Fieldtypes:
> *       <fieldType name="keyword" class="solr.TextField"
> positionIncrementGap="100">*
> *            <analyzer type="index">*
> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> *                <filter class="solr.LowerCaseFilterFactory"/>*
> *            </analyzer>*
> *            <analyzer type="query">*
> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
> *                <filter class="solr.LowerCaseFilterFactory"/>*
> *            </analyzer>*
> *        </fieldType>*
>
> Fields:
> *        <field name="number" type="keyword" indexed="true"
stored="true"
> required="false" />*
>
> But it seems not to be this way for me. In the index i have values like
"FE
> 009", "EE 009", "ED 009" and "FE 009-1" (without the quotes of course.
But
> when i search "FE 009" (without quotes), I get no results. It seems
that I
> have to add quotes to the searchquery in order to retrieve any results,
but
> that wont't work for me, as I later on have to expand the index with
other
> fields that need whitespace-tokenization and such, or would that work
> regardless of quotes? I have come to understand that wrapping the query
in
> quotes forces it to be analyzed as one token, no matter what.
>
> If I get this to work I would also like to add the
> "solr.EdgeNGramFilterFactory" to the index side analyzer, thus adding

> trailing wildcard matches. E.g. return "FE 009-1", "FE 009-2" as well> as

> "FE 009" when searching for "FE 009", but not "EE 009", and "ED 009".
Would
> that be an ok way to do it?
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: aleksan...@gurusoft.no
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>

Re: KeywordTokenizerFactory - trouble with "exact" matches

Reply via email to