Re: accent insensitive field-type

Søren Fri, 03 Jul 2015 01:28:27 -0700

Thanks Steve! Everything works now.
A little modification:


        "analyzer":{

"charFilters": [ {"class":"solr.MappingCharFilterFactory","mapping":"mapping-ISOLatin1Accent.txt"} ],

            "tokenizer": {"class":"solr.StandardTokenizerFactory"},
            "filters": [{"class":"solr.LowerCaseFilterFactory"}]
        }

Thankfully, when key is a plural word, the value is an array.

It was still teasing me when I tested with various queries. Butspecifying field solved that for me too.

...q=brulee didn't find anything. It goes into to the raw index Iguess


...q=desert:brulee        did find "Crème brûlée"!

cheers
Søren


On 02-07-2015 17:31, Steve Rowe wrote:

Hi Søren,

“charFilter” should be “charFilters”, and “filter” should be “filters”; and 
both their values should be arrays - try this:

{
   "add-field-type”: {
     "name":"myTxtField",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer”: {
       "charFilters": [ {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt”} ],
       "tokenizer": [ {"class":"solr.StandardTokenizerFactory”} ],
       "filters": {"class":"solr.LowerCaseFilterFactory"}
     }
   }
}

There should be better error messages for misspellings here.  I’ll file a JIRA 
issue.

(I also moved “filters” after “tokenizer” since that’s the order in which 
they’re executed in an analysis pipeline, but Solr will interpret the 
out-of-order version correctly.)

FYI, if you want to *correct* a field type, rather than create a new one, you 
should use the “replace-field-type” command instead of the “add-field-type” 
command.  You’ll get an error if you attempt to add a field type that already 
exists in the schema.

Steve

On Jul 2, 2015, at 1:17 AM, Søren <s...@syntonetic.com> wrote:

Hi Solr users

I'm new to Solr and I need to be able to search in structured data in a case and accent insensitive manner. 
E.g. find "Crème brûlée", both when quering with "Crème brûlée" and "creme 
brulee".

It seems that none of the build-in text types support this, or am I wrong?
So I try to add my own inspired by another post, although it was old.

I'm running solr-5.2.1.

Curl to http://localhost:8983/solr/mycore/schema
{
"add-field-type":{
     "name":"myTxtField",
     "class":"solr.TextField",
     "positionIncrementGap":"100",
     "analyzer":{
        "charFilter": {"class":"solr.MappingCharFilterFactory", 
"mapping":"mapping-ISOLatin1Accent.txt"},
        "filter": {"class":"solr.LowerCaseFilterFactory"},
        "tokenizer": {"class":"solr.StandardTokenizerFactory"}
        }
    }
}

But it doesn't work and when I look in '[... 
]\solr-5.2.1\server\solr\mycore\conf\managed-schema'
the analyzer section is reduced to this:
  <fieldType name="myTxtField" class="solr.TextField" 
positionIncrementGap="100">
    <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
    </analyzer>
  </fieldType>

I'm I almost there or am I on a completely wrong track?

Thanks in advance
Søren

Re: accent insensitive field-type

Reply via email to