Re: Wildcard ? issue?

Sethi, Parampreet Wed, 08 Feb 2012 08:03:59 -0800

Hi Dalius,

If not already tried, Check http://localhost:8983/solr/admin/analysis.jsp
(enable verbose output for both Field Value index and query for details)
for your queries and see what all filters/tokenizers are being applied.


Hope it helps!

-param

On 2/8/12 10:48 AM, "Dalius Sidlauskas" <dalius.sidlaus...@semantico.com>
wrote:

>If you can not read this mail easily check this ticket:
>https://issues.apache.org/jira/browse/SOLR-3106 This is a copy.
>
>Regards!
>Dalius Sidlauskas
>
>
>On 08/02/12 15:44, Dalius Sidlauskas wrote:
>> Sorry for inaccurate title.
>>
>> I have a 3 fields (dc_title, dc_title_unicode, dc_unicode_full)
>> containing same value:
>>
>> <title xmlns="http://www.tei-c.org/ns/1.0";>cal.lígraf</title>
>>
>> and these fields are configured accordingly:
>>
>> <fieldType name="xml"  class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.ICUFoldingFilterFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> <filter class="solr.ICUFoldingFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> <fieldType name="xml_unicode"  class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> <fieldType name="xml_unicode_full"  class="solr.TextField"
>> positionIncrementGap="100">
>> <analyzer type="index">
>> <charFilter class="solr.HTMLStripCharFilterFactory"/>
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> </analyzer>
>> <analyzer type="query">
>> <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>> </analyzer>
>> </fieldType>
>>
>> And finally my search configuration:
>>
>> <requestHandler name="dictionary"  class="solr.SearchHandler">
>> <lst name="defaults">
>> <str name="echoParams">all</str>
>> <str name="defType">edismax</str>
>> <str name="mm">2&lt;-25%</str>
>> <str name="qf">dc_title_unicode_full^2 dc_title_unicode^2 dc_title</str>
>> <int  name="rows">10</int>
>> <str name="spellcheck.onlyMorePopular">true</str>
>> <str name="spellcheck.extendedResults">false</str>
>> <str name="spellcheck.count">1</str>
>> </lst>
>> <arr name="last-components">
>> <str>spellcheck</str>
>> </arr>
>> </requestHandler>
>>
>> I am trying to match the field with various search phrases (that are
>> valid). There are results:
>>
>>
>> #     search phrase     match?     Comment
>> 1     cal.lígra?     yes
>> 2     cal.ligra?     no     Changed í to i
>> 3     cal.ligraf     yes
>> 4     calligra?     no
>>
>>
>> The problem is the #2 attempt to match a data. The #3 works replacing
>> ? with f.
>>
>> One more thing. If * is used insted of ? other data is matched as
>> cal.lígrafia but not cal.lígraf...
>>
>> Also I have spotted some logic missmatch in debug parsedQuery field:
>> *
>> cal·lígraf:* +DisjunctionMaxQuery((dc_title:*calligraf*^2.0 |
>> dc_title_unicode:cal·lígraf^3.0 | dc_title_unicode_full:cal·lígraf^3.0))
>> *cal·lígra?:*+DisjunctionMaxQuery((dc_title:*cal·lígra?*^2.0 |
>> dc_title_unicode:cal·lígra?^3.0 | dc_title_unicode_full:cal·lígra?^3.0))
>>
>> Should the second be "*calligra?*" insted?*
>>
>> *Environment:
>> Tomcat 7.0.25 (request encoding UTF-8)
>> Solr 3.5.0
>> Java 7 Oracle
>> Ubuntu 11.10
>>

Re: Wildcard ? issue?

Reply via email to