This is a _very_ common thing we all had to learn; what you're seeing is the results of the _query parser_, not the analysis chain. Anything like proj_name_sort:term1 term2 gets split at the query parser level, attaching &debug=query to the URL should show down in the "parsed query" section something like:
proj_name_sort:term1 default_search_field:term2 To get thing through the query parser, enclose in double quotes, escape the space and such. That'll get the terms _as a single token_ to the analysis chain for that field where the behavior will be what you expect. Best, Erick On Wed, Mar 25, 2015 at 9:26 AM, Vadim Gorlovetsky <vadim...@amdocs.com> wrote: > Hello, > > solr.KeywordTokenizerFactory seems splitting by whitespaces though according > SOLR documentation shouldn't do that. > > > For example I have the following configuration for the fields "proj_name" and > "proj_name_sort": > > <field name="proj_name" type="sortable_text_general" indexed="true" > stored="true"/> > <field name="proj_name_sort" type="string_sort" indexed="true" > stored="false"/> > ...... > > <copyField source="proj_name" dest="proj_name_sort" /> > .................. > > <fieldType name="string_sort" class="solr.TextField" sortMissingLast="true" > omitNorms="true"> > <analyzer> > <!-- KeywordTokenizer does no actual tokenizing, so the entire > input string is preserved as a single token > --> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <!-- The LowerCase TokenFilter does what you expect, which can be > when you want your sorting to be case insensitive > --> > <filter class="solr.LowerCaseFilterFactory" /> > <!-- The TrimFilter removes any leading or trailing whitespace --> > <filter class="solr.TrimFilterFactory" /> > </analyzer> > </fieldType> > > There are 3 indexed documents having the respective field values: > proj_name: > Test1008 > CR610070 Test1 > CR610070 Another Test2 > > Searching on the "proj_name_sort" giving me the following results: > > Query > > Expected > > Real > > Comments > > proj_name_sort : CR610070 Test1 > > CR610070 Test1 > > CR610070 Test1 > > Expectable as seems searching exact un-tokenized value > > proj_name_sort : CR610070 Te > > None > > None > > Expectable as seems searching exact un-tokenized value > > proj_name_sort : CR610070 Te* > > CR610070 Test1 > > CR610070 Test1, Test1008, CR610070 Another Test2 > > Seems splits on tokens by whitespace ????? > > proj_name_sort : CR610070 An* > > CR610070 Another Test2 > > CR610070 Another Test2 > > Expectable as seems applying wild card on un-tokenized value > > proj_name_sort : CR610070 Another Te* > > CR610070 Another Test2 > > CR610070 Test1, Test1008, CR610070 Another Test2 > > Seems splits on tokens by whitespace ????? > > proj_name_sort : CR610070 Another Test1* > > None > > CR610070 Test1, Test1008, CR610070 Another Test2 > > Seems splits on tokens by whitespace ????? > > > Please, advise the way to search on un-tokenized fields using partial > criteria and wild cards. > > Thanks > Vadim > > > This message and the information contained herein is proprietary and > confidential and subject to the Amdocs policy statement, > you may review at http://www.amdocs.com/email_disclaimer.asp