Re: KeywordTokenizerFactory - trouble with "exact" matches

Alexandre Rafalovitch Wed, 29 Jan 2014 07:08:15 -0800

I think the whitespace might also be the issue. The query gets parsed
by standard component that splits it on space before passing
individual components into the field searches.


Try enabling autoGeneratePhraseQueries on the field (or field type)
and reindexing. See if that makes a difference.

Regards,
  Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Jan 29, 2014 at 9:55 PM, Aleksander Akerø
<[email protected]> wrote:
> update:
>
> Guessing that this has nothing to do with the tokenizer. Tried to use the
> string fieldtype as well, but still the same results. So this must have to
> do with some other solr config.
>
> What confuses me is that when I search "1005" which is another valid value
> to search for, it works perfectly, but then again, this query contains no
> whitespace.
>
> Any ideas?
>
> *Aleksander Akerø*
> Systemkonsulent
> Mobil: 944 89 054
> E-post: [email protected]
>
> *Gurusoft AS*
> Telefon: 92 44 09 99
> Østre Kullerød
> www.gurusoft.no
>
>
> 2014-01-29 Aleksander Akerø <[email protected]>
>
>> Thanks for the quick answer, but it doesn't help if I remove the lowercase
>> analyzer like so:
>>
>> *        <fieldType name="keyword" class="solr.TextField"
>> positionIncrementGap="100">*
>> *            <analyzer type="index">*
>> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
>> *            </analyzer>*
>> *            <analyzer type="query">*
>> *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
>> *            </analyzer>*
>> *        </fieldType>*
>>
>>  I still need to add quotes to the searchquery to get results. And the
>> weird thing is that if I use the analyzer and put in "FE 009" (again,
>> without quotes) for both index and query values, it highlights the result
>> as to show a match, but when i search using the GUI it gives me no results.
>> The same happens when posting directly to the /select requestHandler via GET
>>
>> These is what i post using GET:
>> http://mysite.com/solr/corename/select?q=number:FE%20009&qf=number    =>
>> this does not work
>> http://mysite.com/solr/corename/select?q=number:"FE%20009"&qf=number  =>
>> this works
>>
>> Really starting to wonder if I am doing something terribly wrong somewhere.
>>
>> This is my requestHandler btw, pretty basic:
>> <!-- #### Default handler #### -->
>>     <requestHandler name="/select" class="solr.SearchHandler">
>>         <lst name="defaults">
>>             <str name="echoParams">explicit</str>
>>             <str name="defType">edismax</str>
>>             <str name="q.alt">*:*</str>
>>             <str name="rows">10</str>
>>             <str name="fl">*,score</str>
>>             <str name="qf">number</str>
>>         </lst>
>>     </requestHandler>
>>
>> *Aleksander Akerø*
>> Systemkonsulent
>> Mobil: 944 89 054
>> E-post: [email protected]
>>
>> *Gurusoft AS*
>> Telefon: 92 44 09 99
>> Østre Kullerød
>> www.gurusoft.no
>>
>>
>> 2014-01-29 Aruna Kumar Pamulapati <[email protected]>
>>
>> Hi ,
>>>
>>> I think the misunderstanding you are having is about
>>>
>>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory
>>> lowercase
>>> factory.
>>>
>>> You are correct about KeywordTokenizerFactory  but lowercase factory :
>>> Creates
>>> tokens by lowercasing all letters and dropping non-letters.
>>>
>>> The best place to play and learn these pipelines is Solr admin panel =>
>>> analysis page.
>>>
>>>
>>> thanks,
>>> Arun
>>>
>>>
>>> On Wed, Jan 29, 2014 at 9:05 AM, Aleksander Akerø <[email protected]
>>> >wrote:
>>>
>>> > Hi, I'll try properly this time.
>>> >
>>> > According to solr documentation the solr.KeywordTokenizerFactory should
>>> not
>>> > do any tokenizing at all. Thus, if I understand this correctly, it
>>> should
>>> > only return exact matches given that this is the only analyzer defined
>>> in
>>> > the field type. Such as the following config:
>>> >
>>> > Fieldtypes:
>>> > *       <fieldType name="keyword" class="solr.TextField"
>>> > positionIncrementGap="100">*
>>> > *            <analyzer type="index">*
>>> > *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
>>> > *                <filter class="solr.LowerCaseFilterFactory"/>*
>>> > *            </analyzer>*
>>> > *            <analyzer type="query">*
>>> > *                <tokenizer class="solr.KeywordTokenizerFactory"/>*
>>> > *                <filter class="solr.LowerCaseFilterFactory"/>*
>>> > *            </analyzer>*
>>> > *        </fieldType>*
>>> >
>>> > Fields:
>>> > *        <field name="number" type="keyword" indexed="true"
>>> stored="true"
>>> > required="false" />*
>>> >
>>> > But it seems not to be this way for me. In the index i have values like
>>> "FE
>>> > 009", "EE 009", "ED 009" and "FE 009-1" (without the quotes of course.
>>> But
>>> > when i search "FE 009" (without quotes), I get no results. It seems
>>> that I
>>> > have to add quotes to the searchquery in order to retrieve any results,
>>> but
>>> > that wont't work for me, as I later on have to expand the index with
>>> other
>>> > fields that need whitespace-tokenization and such, or would that work
>>> > regardless of quotes? I have come to understand that wrapping the query
>>> in
>>> > quotes forces it to be analyzed as one token, no matter what.
>>> >
>>> > If I get this to work I would also like to add the
>>> > "solr.EdgeNGramFilterFactory" to the index side analyzer, thus adding
>>> > trailing wildcard matches. E.g. return "FE 009-1", "FE 009-2" as well as
>>> > "FE 009" when searching for "FE 009", but not "EE 009", and "ED 009".
>>> Would
>>> > that be an ok way to do it?
>>> >
>>> > *Aleksander Akerø*
>>> > Systemkonsulent
>>> > Mobil: 944 89 054
>>> > E-post: [email protected]
>>> >
>>> > *Gurusoft AS*
>>> > Telefon: 92 44 09 99
>>> > Østre Kullerød
>>> > www.gurusoft.no
>>> >
>>>
>>
>>

Re: KeywordTokenizerFactory - trouble with "exact" matches

Reply via email to