Fundera, You need a regex which matches a '+' with non-blank chars before and after. It should not replace a '+' preceded by white space, that is important in Solr. This is not a perfect solution, but might improve matters for you. Cheers -- Rick
On May 22, 2017 1:58:21 PM EDT, Fundera Developer <funderadevelo...@outlook.com> wrote: >Thank you Zahid and Erik, > >I was going to try the CharFilter suggestion, but then I doubted. I see >the indexing process, and how the appearance of 'i+d' would be handled, >but, what happens at query time? If I use the same filter, I could >remove '+' chars that are added by the user to identify compulsory >tokens in the search results, couldn't I? However, if i do not use the >CharFilter I would not be able to match the 'i+d' search tokens... > >Thanks all! > > > >El 22/05/17 a las 16:39, Erick Erickson escribió: > >You can also use any of the other tokenizers. WhitespaceTokenizer for >instance. There are a couple that use regular expressions. Etc. See: >https://cwiki.apache.org/confluence/display/solr/Tokenizers > >Each one has it's considerations. WhitespaceTokenizer won't, for >instance, separate out punctuation so you might then have to use a >filter to remove those. Regex's can be tricky to get right ;). Etc.... > >Best, >Erick > >On Mon, May 22, 2017 at 5:26 AM, Muhammad Zahid Iqbal ><zahid.iq...@northbaysolutions.net><mailto:zahid.iq...@northbaysolutions.net> >wrote: > > >Hi, > > >Before applying tokenizer, you can replace your special symbols with >some >phrase to preserve it and after tokenized you can replace it back. > >For example: ><charFilter class="solr.PatternReplaceCharFilterFactory" pattern="(\+)" >replacement="xxx" /> > > >Thanks, >Zahid iqbal > >On Mon, May 22, 2017 at 12:57 AM, Fundera Developer < >funderadevelo...@outlook.com<mailto:funderadevelo...@outlook.com>> >wrote: > > > >Hi all, > >I am a bit stuck at a problem that I feel must be easy to solve. In >Spanish it is usual to find the term 'i+d'. We are working with Solr >5.5, >and StandardTokenizer splits 'i' and 'd' and sometimes, as we have in >the >index documents both in Spanish and Catalan, and in Catalan it is >frequent >to find 'i' as a word, when a user searches for 'i+d' it gets Catalan >documents as results. > >I have tried to use the SynonymFilter, with something like: > >i+d => investigacionYdesarrollo > >But it does not seem to change anything. > >Is there a way I could set an exception to the Tokenizer so that it >does >not split this word? > >Thanks in advance! -- Sorry for being brief. Alternate email is rickleir at yahoo dot com -- Sorry for being brief. Alternate email is rickleir at yahoo dot com