Issue in the analysis chain

Andrea Gazzarini Fri, 02 Dec 2016 03:01:13 -0800

Hi,

I found a strange behavior with the MappingCharFilterFactory in Solr*6.2.1*. Definitely curious if I'm missing something or someone else metthat.


I have a (index and query) chain composed as follows:

<charFilter class="solr.MappingCharFilterFactory"mapping="mapping-FoldToASCII.txt"/>

<tokenizer class="solr.KeywordTokenizerFactory" />
...

The mapping-FoldToASCII.txt is the exact file that you can find in theSolr download bundle, I didn't add any mapping.I started having some search issues and after checking, I saw that somecharacters with diacritics weren't replaced. I isolated one of thosecases and tried to see what's happen in the analysis page.

As expected, the characters weren't replaced so I tried char by char.Nothing, it doesn't work.

An example

I pasted īà in the "Field Value (Index)" box. The *ī* char is theunicode *\u012b* which is already mapped in the mapping-FoldToASCII.txt


Without the "Verbose Output" flag [1]

 * I see an empty space beside the MCF (where instead I'd expect to see
   the "i", "a" replaced characters)
 * the KeywordTokenizer reports exactly my input "īà" so it seems the
   MCF didn't make any change to the source input

However, if I turn the "Verbose Output" flag on [2]

 * You can see that the MCF is working (i.e. ī becomes i, and à becomes a)
 * But the KeywordTokenizer is still ignoring that and it produces īà

I tried the same with a Solr 4.7.1 instance and as you can see [3] itworks as I would expect


Any help would be warmly appreciated

Best,
Andrea

[1] https://drive.google.com/file/d/0B82QaJKoMzvWb3dLcW80ME5wdXc/view
[2] https://drive.google.com/file/d/0B82QaJKoMzvWN2lNSF9JQUhPZ3c/view
[3] https://drive.google.com/file/d/0B82QaJKoMzvWeHRzUnU3MGFtY2s/view

Issue in the analysis chain

Reply via email to