Re: Indexing word with plus sign

Rick Leir Mon, 22 May 2017 11:50:42 -0700

Fundera,
You need a regex which matches a '+' with non-blank chars before and after. It 
should not replace a  '+' preceded by white space, that is important in Solr. 
This is not a perfect solution, but might improve matters for you.
Cheers -- Rick


On May 22, 2017 1:58:21 PM EDT, Fundera Developer 
<funderadevelo...@outlook.com> wrote:
>Thank you Zahid and Erik,
>
>I was going to try the CharFilter suggestion, but then I doubted. I see
>the indexing process, and how the appearance of 'i+d' would be handled,
>but, what happens at query time? If I use the same filter, I could
>remove '+' chars that are added by the user to identify compulsory
>tokens in the search results, couldn't I?  However, if i do not use the
>CharFilter I would not be able to match the 'i+d' search tokens...
>
>Thanks all!
>
>
>
>El 22/05/17 a las 16:39, Erick Erickson escribió:
>
>You can also use any of the other tokenizers. WhitespaceTokenizer for
>instance. There are a couple that use regular expressions. Etc. See:
>https://cwiki.apache.org/confluence/display/solr/Tokenizers
>
>Each one has it's considerations. WhitespaceTokenizer won't, for
>instance, separate out punctuation so you might then have to use a
>filter to remove those. Regex's can be tricky to get right ;). Etc....
>
>Best,
>Erick
>
>On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal
><zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net>
>wrote:
>
>
>Hi,
>
>
>Before applying tokenizer, you can replace your special symbols with
>some
>phrase to preserve it and after tokenized you can replace it back.
>
>For example:
><charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\+)"
>replacement="xxx" />
>
>
>Thanks,
>Zahid iqbal
>
>On Mon, May 22, 2017 at 12:57 AM, Fundera Developer <
>funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com>>
>wrote:
>
>
>
>Hi all,
>
>I am a bit stuck at a problem that I feel must be easy to solve. In
>Spanish it is usual to find the term 'i+d'. We are working with Solr
>5.5,
>and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in
>the
>index documents both in Spanish and Catalan, and in Catalan it is
>frequent
>to find 'i' as a word, when a user searches for 'i+d' it gets Catalan
>documents as results.
>
>I have tried to use the SynonymFilter, with something like:
>
>i+d => investigacionYdesarrollo
>
>But it does not seem to change anything.
>
>Is there a way I could set an exception to the Tokenizer so that it
>does
>not split this word?
>
>Thanks in advance!

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com
-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Indexing word with plus sign

Reply via email to