Re: KeywordTokenizerFactory - trouble with "exact" matches

Jack Krupansky Thu, 30 Jan 2014 04:12:29 -0800

The standard, keyword-oriented query parsers will all treat unquoted,unescaped white space as term delimiters and ignore the what space. There isno way to bypass that behavior. So, your regex will never even see the whitespace - unless you enclose the text and white space in quotes or use abackslash to quote each white space character.

You can use the "field" and "term" query parsers to pass a query string asif it were fully enclosed in quotes, but that only handles a single term anddoes not allow for multiple terms or any query operators. For example:


{!field f=myfield}Foo Bar

See:
http://wiki.apache.org/solr/QueryParser

You can also pre-configure the field query parser with the defType=fieldparameter.


-- Jack Krupansky

-----Original Message-----From: Srinivasa7

Sent: Thursday, January 30, 2014 6:37 AM
To: [email protected]
Subject: Re: KeywordTokenizerFactory - trouble with "exact" matches

Hi,

I  have similar kind of problem  where I want search for a words with spaces
in that. And I wanted to search by stripping all the spaces .

I have used following schema for that

<fieldType name="nospaces" class="solr.TextField"
autoGeneratePhraseQueries="true"  >
           <analyzer type="index">
             <tokenizer class="solr.KeywordTokenizerFactory"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.PatternReplaceFilterFactory"
pattern="[^\w]+"  replacement="" replace="all"/>
           </analyzer>
           <analyzer type="query">

               <tokenizer class="solr.KeywordTokenizerFactory"/>
               <filter class="solr.LowerCaseFilterFactory"/>
               <filter class="solr.PatternReplaceFilterFactory"
pattern="[^\w]+"  replacement="" replace="all"/>
           </analyzer>
       </fieldType>

And

<field name="text_nospaces" type="nospaces"  indexed="true" stored="true"
omitNorms="true" />
       <copyField source="text" dest="text_nospaces" />

But it is not searching the right terms . we are stripping the spaces and
indexing lowercase values when we do that.

Like : East Enders

when I seach for   'east end ers'  text, its not returning any values saying
no document found.

I realised the solr uses QueryParser before passing query string to the
QueryAnalyzer in defined in schema.

And The Query parser is tokenizing the query string providing in query . So
it is sending each token to the QueryAnalyser that is defined in schema.

SO is there anyway that I can by pass this query parser or use a correct
query processor which can consider the entire string as single pharse.

At the moment I am using dismax query processor.

Any suggestion would be much appreciated.

Thanks
Srinivasa

--

View this message in context:http://lucene.472066.n3.nabble.com/KeywordTokenizerFactory-trouble-with-exact-matches-tp4114193p4114432.htmlSent from the Solr - User mailing list archive at Nabble.com.

Re: KeywordTokenizerFactory - trouble with "exact" matches

Reply via email to