KeywordTokenizerFactory - trouble with "exact" matches

Aleksander Akerø Wed, 29 Jan 2014 06:08:51 -0800

Hi, I'll try properly this time.

According to solr documentation the solr.KeywordTokenizerFactory should not
do any tokenizing at all. Thus, if I understand this correctly, it should
only return exact matches given that this is the only analyzer defined in
the field type. Such as the following config:


Fieldtypes:
*       <fieldType name="keyword" class="solr.TextField"
positionIncrementGap="100">*
*            <analyzer type="index">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*                <filter class="solr.LowerCaseFilterFactory"/>*
*            </analyzer>*
*            <analyzer type="query">*
*                <tokenizer class="solr.KeywordTokenizerFactory"/>*
*                <filter class="solr.LowerCaseFilterFactory"/>*
*            </analyzer>*
*        </fieldType>*

Fields:
*        <field name="number" type="keyword" indexed="true" stored="true"
required="false" />*

But it seems not to be this way for me. In the index i have values like "FE
009", "EE 009", "ED 009" and "FE 009-1" (without the quotes of course. But
when i search "FE 009" (without quotes), I get no results. It seems that I
have to add quotes to the searchquery in order to retrieve any results, but
that wont't work for me, as I later on have to expand the index with other
fields that need whitespace-tokenization and such, or would that work
regardless of quotes? I have come to understand that wrapping the query in
quotes forces it to be analyzed as one token, no matter what.

If I get this to work I would also like to add the
"solr.EdgeNGramFilterFactory" to the index side analyzer, thus adding
trailing wildcard matches. E.g. return "FE 009-1", "FE 009-2" as well as
"FE 009" when searching for "FE 009", but not "EE 009", and "ED 009". Would
that be an ok way to do it?

*Aleksander Akerø*
Systemkonsulent
Mobil: 944 89 054
E-post: [email protected]

*Gurusoft AS*
Telefon: 92 44 09 99
Østre Kullerød
www.gurusoft.no

KeywordTokenizerFactory - trouble with "exact" matches

Reply via email to