KeywordTokenizerFactory splits the string for the exclamation mark

nativecoder Tue, 13 May 2014 07:02:44 -0700

Hi All

I have a following field settings in solr schema


<field name=&quot;&lt;b>Exact_Word*" omitPositions="true"
termVectors="false" omitTermFreqAndPositions="true" compressed="true"
type="string_ci" multiValued="false" indexed="true" stored="true"
required="false" omitNorms="true"/>

<field name="Word" compressed="true" type="email_text_ptn"
multiValued="false" indexed="true" stored="true" required="false"
omitNorms="true"/>

<fieldtype name="string_ci" class="solr.TextField" sortMissingLast="true"
omitNorms="true"><analyzer><tokenizer
class="solr.KeywordTokenizerFactory"/><filter
class="solr.LowerCaseFilterFactory"/></analyzer></fieldtype>

<copyField source="Word" dest="Exact_Word"/>

As you can see Exact_Email has the KeywordTokenizerFactory and that should
treat the string as it is.

But when I enter email with the following string
"[email protected]" it splits the string to two. I was under
the impression that KeywordTokenizerFactory will treat the string as it is.
*!*
Following is the query debug result. There you can see it has split the word 
 "parsedquery":"+((DisjunctionMaxQuery((Exact_Email:d))
-DisjunctionMaxQuery((Exact_Email:[email protected])))~1)",

can someone please tell why it produce the query result as this 

If I put a string without the "!" sign as below, the produced query will be
as below

"parsedquery":"+DisjunctionMaxQuery((Exact_Email:[email protected]))",

I thought if the KeywordTokenizerFactory is applied then it should return
the exact string as it is

Please help me to understand what is going wrong here




--
View this message in context: 
http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-splits-the-string-for-the-exclamation-mark-tp4135460.html
Sent from the Solr - User mailing list archive at Nabble.com.

KeywordTokenizerFactory splits the string for the exclamation mark

Reply via email to