Re: Solr pattern tokenizer

Erick Erickson Tue, 10 Feb 2015 09:02:34 -0800

Please do not do this. By having such different tokenizers in your
index and query time fieldType
definition, I pretty much guarantee that you will have endless
problems and spend forever
chasing your tail trying to solve them.

Please do yourself a favor and take the time to get to know the
admin/analysis page so you can
see exactly what transformations are happening on your data at both
index and query time. Also
look at the plethora of pre-existing tokenizers/char filters/filters
available to see if they do what
you want, see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for a _start_.

Look particularly at WordDelimiterFilterFactory for things like this.
Also, if you're using edismax,
look at boosting the "pf" field very high to bubble things like your
requirement to HDFC MF
in the target doc.

Best,
Erick

On Tue, Feb 10, 2015 at 1:01 AM, Nivedita <nivedita.pa...@tcs.com> wrote:
> I tried solving issue like
>
>
>  <fieldType name="text_general2" class="solr.TextField"
> positionIncrementGap="100">
>                  <analyzer type="index">
>                  <tokenizer class="solr.StandardTokenizerFactory"/>
>                  <filter class="solr.ShingleFilterFactory" 
> maxShingleSize="2"/>
>                  </analyzer>
>       <analyzer type="query">
>        <tokenizer class="solr.PatternTokenizerFactory"
> pattern="(.*)(HDFC\sLTD)(.*)" group="2"/>
>
>         <filter class="solr.TrimFilterFactory" />
>       </analyzer>
>     </fieldType>
>
>
>
> It works for query like CHQ PAID-INWARD TRANHDFC LTD
> 000000036529
>
> But if HDFC LTD is preceding with underscore(-) or any digit (0-9) it
> matches with HDFC MF also
>
> Please let me know why...
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-pattern-tokenizer-tp4183421p4185270.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr pattern tokenizer

Reply via email to