Hello,
solr.KeywordTokenizerFactory seems splitting by whitespaces though according
SOLR documentation shouldn't do that.
For example I have the following configuration for the fields "proj_name" and
"proj_name_sort":
<field name="proj_name" type="sortable_text_general" indexed="true"
stored="true"/>
<field name="proj_name_sort" type="string_sort" indexed="true" stored="false"/>
......
<copyField source="proj_name" dest="proj_name_sort" />
..................
<fieldType name="string_sort" class="solr.TextField" sortMissingLast="true"
omitNorms="true">
<analyzer>
<!-- KeywordTokenizer does no actual tokenizing, so the entire
input string is preserved as a single token
-->
<tokenizer class="solr.KeywordTokenizerFactory"/>
<!-- The LowerCase TokenFilter does what you expect, which can be
when you want your sorting to be case insensitive
-->
<filter class="solr.LowerCaseFilterFactory" />
<!-- The TrimFilter removes any leading or trailing whitespace -->
<filter class="solr.TrimFilterFactory" />
</analyzer>
</fieldType>
There are 3 indexed documents having the respective field values:
proj_name:
Test1008
CR610070 Test1
CR610070 Another Test2
Searching on the "proj_name_sort" giving me the following results:
Query
Expected
Real
Comments
proj_name_sort : CR610070 Test1
CR610070 Test1
CR610070 Test1
Expectable as seems searching exact un-tokenized value
proj_name_sort : CR610070 Te
None
None
Expectable as seems searching exact un-tokenized value
proj_name_sort : CR610070 Te*
CR610070 Test1
CR610070 Test1, Test1008, CR610070 Another Test2
Seems splits on tokens by whitespace ?????
proj_name_sort : CR610070 An*
CR610070 Another Test2
CR610070 Another Test2
Expectable as seems applying wild card on un-tokenized value
proj_name_sort : CR610070 Another Te*
CR610070 Another Test2
CR610070 Test1, Test1008, CR610070 Another Test2
Seems splits on tokens by whitespace ?????
proj_name_sort : CR610070 Another Test1*
None
CR610070 Test1, Test1008, CR610070 Another Test2
Seems splits on tokens by whitespace ?????
Please, advise the way to search on un-tokenized fields using partial criteria
and wild cards.
Thanks
Vadim
This message and the information contained herein is proprietary and
confidential and subject to the Amdocs policy statement,
you may review at http://www.amdocs.com/email_disclaimer.asp