RE: search ignoring accents

Pedro Figueiredo Fri, 17 Apr 2015 04:50:58 -0700

And for this example what filter should I use?

Filter by "edr" should give the result "Pedro"
The NGram create tokens starting at the beginning or the ending, and in the 
middle?


Thanks!

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal
T. +351 229 446 927 | F. +351 229 446 929
www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA
A CMMI® LEVEL 5 RATED COMPANY CMMI® is registered in the USPTO by CMU"
 


-----Original Message-----
From: Pedro Figueiredo [mailto:pjlfigueir...@criticalsoftware.com] 
Sent: 17 April 2015 12:22
To: solr-user@lucene.apache.org; 'Ahmet Arslan'
Subject: RE: search ignoring accents

Hi Ahmet,

Yes... the EdgeNGram is what produces those results...
I need it to improve the search by name by the applications users.

Thanks.

Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150
 

Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU"
 


-----Original Message-----
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID]
Sent: 17 April 2015 12:01
To: solr-user@lucene.apache.org
Subject: Re: search ignoring accents

Hi Pedro,

solr.ASCIIFoldingFilterFactory is one way to remove diacritics.
Confusion comes from EdgeNGram, why do you need it?

Ahmet



On Friday, April 17, 2015 1:38 PM, Pedro Figueiredo 
<pjlfigueir...@criticalsoftware.com> wrote:



Hello,
 
What is the best way to search in a field ignoring accents?
 
The field has the type:
                <fieldType name="text_general_edge_ngram" 
class="solr.TextField" positionIncrementGap="100">
                               <analyzer type="index">
                                               <tokenizer 
class="solr.LowerCaseTokenizerFactory"/>
                                               <filter 
class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                               <analyzer type="query">
                                               <tokenizer 
class="solr.LowerCaseTokenizerFactory"/>
                                               <filter 
class="solr.EdgeNGramFilterFactory" minGramSize="2" maxGramSize="15"/>
                               </analyzer>
                </fieldType>
 
I’ve tried adding the filter:  <filter class="solr.ASCIIFoldingFilterFactory"/>
but some strange results happened.. like:
 
Search by “Mourao” and the results were:
Mourão -> OK
Monteiro -> NOTOK
Morais -> NOTOK
 
Thanks in advanced,
 
Pedro Figueiredo
Senior Engineer

pjlfigueir...@criticalsoftware.com
M. 934058150 
  
Rua Engº Frederico Ulrich, nº 2650 4470-605 Moreira da Maia, Portugal T. +351 
229 446 927 | F. +351 229 446 929 www.criticalsoftware.com

PORTUGAL | UK | GERMANY | USA | BRAZIL | MOZAMBIQUE | ANGOLA A CMMI® LEVEL 5 
RATED COMPANY CMMI® is registered in the USPTO by CMU"

RE: search ignoring accents

Reply via email to