Re: How to extend the behavior of a common text field (such as text_general) to recognize regex

Alexandre Rafalovitch Tue, 24 Jun 2014 17:37:22 -0700

What about copyField'ing the content into the second field where you
apply the alternative processing. Than eDismax searching both. Don't
have to store the other field, just index.


Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency


On Wed, Jun 25, 2014 at 5:55 AM, Vinay B, <vybe3...@gmail.com> wrote:
> Sorry, previous post got sent prematurely.
>
> Here is the complete post:
>
> This is easy if I only reqdefine a custom field to identify the desired
> patterns (numbers, in my case)
>
> For example, I could define a field thus:
>     <!-- A text field that identifies numberical entities-->
>     <fieldType name="text_num" class="solr.TextField" >
>       <analyzer>
> <tokenizer class="solr.PatternTokenizerFactory"
> pattern="\s*[0-9][0-9-]*[0-9]?\s*" group="0"/>
>       </analyzer>
>     </fieldType>
>
> Input:
> hello, world bye 123-45 abcd 5555 sdfssdf --- aaa
>
> Output:
> 123-45 , 5555
>
> However, I also want to retain the behavior of the default text_general
> field , that is recognize the usual text tokens (hello, world, bye etc
> ...). What is the best way to achieve this.
> I've looked at PatternCaptureGroupFilterFactory (
> http://lucene.apache.org/core/4_7_0/analyzers-common/org/apache/lucene/analysis/pattern/PatternCaptureGroupFilterFactory.html
> ) but I suspect that it too is subject to the behavior of the prior
> tokenizer (which for text_general is StandardTokenizerFactory ).
>
> Thanks
>
>>
>>

Re: How to extend the behavior of a common text field (such as text_general) to recognize regex

Reply via email to