Re: Search without Accent

Markus Jelsma Wed, 07 Sep 2022 06:22:02 -0700

Hi Karsten,

You forgot to add ASCIIFoldingFilter to IndexAnalyzer, please try again
with:


    <fieldType name="text_general" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="true"/>
       </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="true"/>
      </analyzer>
    </fieldType>

I removed the stopwords filter because it is not recommended for regular
text search.

Regards,
Markus


Op wo 7 sep. 2022 om 14:59 schreef Carsten Klement <
[email protected]>:

>
>
>        -
>       Hi Markus,   thank you, yes i think i have another problem ;) I
> tried with ASCIIFoldingFilterFactory, but analysis shows dèkor instead of
> decor.   I use an Solr-cluster with 3 Nodes (3 replica), i droped
> collection and create a new one, but perhaps there is the problem?
>  <dynamicField name="*_txt"  type="text_general"    indexed="true"
>  stored="true"/>
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>         <filter class="solr.ASCIIFoldingFilterFactory"
> preserveOriginal="true"/>
>       </analyzer>
>     </fieldType>
>    regards Carsten
>
> -----Ursprüngliche Nachricht-----
>
> Von: Markus <[email protected]>
> An: users <[email protected]>
> Datum: Mittwoch, 7. September 2022 13:41 CEST
> Betreff: Re: Search without Accent
>
> Hello Karsten,
>
> I added your config snippet, but with the ASCIIFoldingFilterFactory
> replacing MappingCharFilterFactory, to one of my collections and reloaded
> the collection. Using Solr's analysis page i tested if it works, and it
> does. Can you test the field on your collection? Analysis should be ok,
> perhaps something else is wrong.
>
> Regards,
> Markus
>
> [1] http://localhost:8983/solr/#/
> <COLLECTION>/analysis?analysis.fieldvalue=th%C3%A9&analysis.query=the&analysis.fieldname=schnellsuche&verbose_output=1
>
>
> Op wo 7 sep. 2022 om 13:12 schreef Carsten Klement <
> [email protected]>:
>
> >
> > Hi Markus,
> >
> > thank you for your reply.
> >
> > I dropped the collection and create a new one for my tests, but now i
> also
> > reloaded the collection, but i doesn't change anything.
> >
> > search for "thé" is fine, but search for "the" didn't bring the result.
> > I also testet solr.ASCIIFoldingFilterFactory, but it doesn't change
> > anything. :(
> >
> > Regards
> > Carsten
> >
> >
> > -----Ursprüngliche Nachricht-----
> >
> > Von: Markus <[email protected]>
> > An: users <[email protected]>
> > Datum: Mittwoch, 7. September 2022 12:07 CEST
> > Betreff: Re: Search without Accent
> >
> > Hello Karsten,
> >
> > The MappingCharFilterFactory should work just fine for German or French
> > accents with the default ISOLatin1Accent configuration file, although we
> > rarely use it. Instead, you can try the regular token filter <filter
> > class="solr.ASCIIFoldingFilterFactory"/>. It does a similar job.
> >
> > Do not forget to reload the Solr core/collection once you uploaded or
> > placed the new configuration.
> >
> > Regards,
> > Markus
> >
> > Op wo 7 sep. 2022 om 09:47 schreef Carsten Klement <
> > [email protected]>:
> >
> > > Hi,
> > >
> > > we use solr 6.6 and use a quicksearch on our website. For this case we
> > > copy some data fields to an field called "schnellsuche", this worked
> > > fine a few years.
> > >
> > > Now we want import french data with Accent-Keys, for example "thé".
> The
> > > user should find the same results, if he search for "thé" or "the".
> This
> > > a problem i can't resolve.
> > >
> > > I use charfilter, but this doesn't help
> > >
> > > <charFilter class="solr.MappingCharFilterFactory"
> > > mapping="mapping-ISOLatin1Accent.txt"/>
> > >
> > >
> > > <field name="schnellsuche" type="text_schnellsuche" indexed="true"
> > > stored="false" multiValued="true"/>
> > >
> > > <copyField source="articlegroup_id" dest="schnellsuche"/>
> > > <copyField source="tree_id" dest="schnellsuche"/>
> > > <copyField source="tree_bezeichnung" dest="schnellsuche"/>
> > > <copyField source="tree_keywords" dest="schnellsuche"/>
> > >
> > > <copyField source="*_txt" dest="schnellsuche"/>
> > > <copyField source="*_int" dest="schnellsuche"/>
> > > <copyField source="*_dec" dest="schnellsuche"/>
> > >
> > > <fieldType name="text_schnellsuche" class="solr.TextField"
> > > positionIncrementGap="100">
> > > <analyzer>
> > > <tokenizer class="solr.ClassicTokenizerFactory"/>
> > > <filter class="solr.ManagedSynonymFilterFactory"
> > > managed="german" />
> > > <filter class="solr.LowerCaseFilterFactory"/>
> > > <charFilter class="solr.MappingCharFilterFactory"
> > > mapping="mapping-ISOLatin1Accent.txt"/>
> > > </analyzer>
> > > </fieldType>
> > >
> > > Perhaps somebody can help?
> > >
> > > Thanks
> > > Carsten
> > >
> > >
> >
>

Re: Search without Accent

Reply via email to