Re: Configuring a custom Analyzer for the SynonymFilter

Raf Wed, 28 Sep 2016 00:27:57 -0700

On Wed, Sep 28, 2016 at 3:21 AM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:


> Before you go down this rabbit hole, are you actually sure this does
> what you think it does?
>
> As far as I can tell, that parameter is for analyzing/parsing the
> synonym entries in the synonym file. Not the incoming search queries
> or text actually being indexed.



Yes, this is exactly what I am looking for.

I have already customized my indexing and query analyzer for that field, by
using a custom filter that performs lemmatization for the Italian language.
Hence, the token I have in my index (or in the parsed query) are something
like evento_n (event -> noun) or mangiare_v (eat -> verb).

Now I would like to define synonyms without having to know the "lemma" form.

For example, I would like to have in my synonyms file:
evento,festa,spettacolo
and make the *SynonymFilter* analyzer transform them in
*evento_n,festa_n,spettacolo_n*

This way, a query like *myField:spettacoli* (the plural form of
*spettacolo*) would
be analyzed as *myField:(spettacolo_n evento_n festa_n)*.



> Did you get it to work with the simpler configuration?
>

Yes, I carried out an experiment using the standard Lucene ItalianAnalyzer
class (both at indexing and query time and for the SynonymFilter) and it
works the way I was expecting. Unfortunately I cannot use this analyzer
because I have to apply my custom lemmatization filter.

Therefore, I am confident I can achieve my desired result by defining a
custom Analyzer class, but I would have preferred to be able to alter the
filter chain just modifying the *schema.xml* file.

Is there an alternative way to achieve the same result I am not seeing?

Thank you very much for your help.


Bye,
*Raf*



>
> Just double checking.
>
> Regards,
>    Alex.
> ----
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 27 September 2016 at 22:45, Raf <r.ventag...@gmail.com> wrote:
> > On Tue, Sep 27, 2016 at 4:22 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Looking at the code (on GitHub is easiest), it can take either
> >> analyzer or tokenizer but definitely not any chain definitions. This
> >> seems to be the same all the way to 6.2.1.
> >>
> >
> > Thanks for your answer Alex.
> >
> > Does anyone know if it exists a viable alternative to make it
> configurable
> > inside the schema.xml instead of defining a custom Java class?
> >
> > I was thinking about something like:
> >
> > * defining the *analyzer* outside of the *field* element, giving it a
> name:
> > <analyzer name="myAnalyzer">
> >    <tokenizer class="MyTokenizer" />
> >    <filter class="solr.LowerCaseFilterFactory"/>
> >    <filter class="MyFilter_1" />
> > </analyzer>
> >
> > * referring to it inside the *SynonymFilter* definition by its name:
> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> > ignoreCase="true" expand="true" analyzer="myAnalyzer"/>
> >
> > Unfortunately I have not found anything like this inside the Solr
> > documentation.
> > Is it possible to achieve something like that or the only solution is
> > writing a custom Java class for each combination filter I need to use for
> > synonyms analysis?
> >
> > Thanks.
> >
> > Bye,
> > *Raffaella*
> >
> >
> > ----
> >> Newsletter and resources for Solr beginners and intermediates:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 27 September 2016 at 21:10, Raf <r.ventag...@gmail.com> wrote:
> >> > Hi,
> >> > is it possible to configure a custom analysis for synonyms the same
> way
> >> we
> >> > do for index/query field analysis?
> >> >
> >> > Reading the *SynonymFilter* documentation[0], I have found I can
> specify
> >> a
> >> > custom analyzer by writing its class name.
> >> >
> >> > Example:
> >> > <fieldType name="myField_it" class="solr.TextField" >
> >> >       <analyzer>
> >> >         <tokenizer class="MyTokenizer" />
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >         <filter class="MyFilter_1" />
> >> >         <filter class="MyFilter_2" />
> >> >         <filter class="solr.SynonymFilterFactory"
> >> synonyms="synonyms.txt"
> >> > ignoreCase="true" expand="true"
> >> > analyzer="org.apache.lucene.analysis.it.ItalianAnalyzer"/>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> >
> >> > What I would like to achieve, instead, it is something like this:
> >> > <fieldType name="myField_it" class="solr.TextField">
> >> >       <analyzer>
> >> >         <tokenizer class="MyTokenizer" />
> >> >         <filter class="solr.LowerCaseFilterFactory"/>
> >> >         <filter class="MyFilter_1" />
> >> >         <filter class="MyFilter_2" />
> >> >         <filter class="solr.SynonymFilterFactory"
> >> synonyms="synonyms.txt"
> >> > ignoreCase="true" expand="true">
> >> >   <analyzer>
> >> >               <tokenizer class="MyTokenizer" />
> >> >               <filter class="solr.LowerCaseFilterFactory"/>
> >> >               <filter class="MyFilter_1" />
> >> >             </analyzer>
> >> > </filter>
> >> >       </analyzer>
> >> >     </fieldType>
> >> >
> >> >
> >> > I have tried to configure it this way, but it does not work.
> >> > I do not get any configuration error, but the custom analyzer is not
> >> > applied to synonyms.
> >> >
> >> > Is it possible to achieve this result by configuration or am I forced
> to
> >> > write a custom Analyzer class?
> >> >
> >> > I am currently using Solr 5.2.1.
> >> > At the moment I cannot upgrade to a newer version.
> >> >
> >> >
> >> > Thank you very much for any help you can provide.
> >> >
> >> > Regards,
> >> > *Raf*
> >> >
> >> >
> >> > [0]
> >> > http://archive.apache.org/dist/lucene/solr/ref-guide/
> >> apache-solr-ref-guide-5.2.pdf
> >> >   p. 132
> >>
>

Re: Configuring a custom Analyzer for the SynonymFilter

Reply via email to