Re: Indexing word with plus sign

Erick Erickson Mon, 22 May 2017 07:41:13 -0700

You can also use any of the other tokenizers. WhitespaceTokenizer for
instance. There are a couple that use regular expressions. Etc. See:
https://cwiki.apache.org/confluence/display/solr/Tokenizers


Each one has it's considerations. WhitespaceTokenizer won't, for
instance, separate out punctuation so you might then have to use a
filter to remove those. Regex's can be tricky to get right ;). Etc....

Best,
Erick

On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
<zahid.iq...@northbaysolutions.net> wrote:
> Hi,
>
>
> Before applying tokenizer, you can replace your special symbols with some
> phrase to preserve it and after tokenized you can replace it back.
>
> For example:
> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\+)"
> replacement="xxx" />
>
>
> Thanks,
> Zahid iqbal
>
> On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
> funderadevelo...@outlook.com> wrote:
>
>> Hi all,
>>
>> I am a bit stuck at a problem that I feel must be easy to solve. In
>> Spanish it is usual to find the term 'i+d'. We are working with Solr 5.5,
>> and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in the
>> index documents both in Spanish and Catalan, and in Catalan it is frequent
>> to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
>> documents as results.
>>
>> I have tried to use the SynonymFilter, with something like:
>>
>> i+d => investigacionYdesarrollo
>>
>> But it does not seem to change anything.
>>
>> Is there a way I could set an exception to the Tokenizer so that it does
>> not split this word?
>>
>> Thanks in advance!
>>
>>

Re: Indexing word with plus sign

Reply via email to