Re: Multi words query time synonyms

Dominique Bejean Sun, 11 Feb 2018 06:36:37 -0800

Steve,

According to your comment, I made this test :


1/ put the SynonymGraphFilterFactory after the StopFilterFactory in query
time analyze chain

    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.ElisionFilterFactory" ignoreCase="true"
articles="lang/contractions_fr.txt"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"/>
      <filter class="solr.StopFilterFactory" words="stopwords.txt"
ignoreCase="true"/>
      <filter class="solr.SynonymGraphFilterFactory"
synonyms="gosport_synonyms.txt"
            ignoreCase="true" expand="true" />
      <filter class="solr.FrenchMinimalStemFilterFactory"/>
    </analyzer>

2/ remove the stop word in the synonyms file

om, olympique marseille


The parsed query string are :

for "om maillot"
"parsedquery_toString":"+(((((+name_text_gp:olympiqu +name_text_gp:marseil)
name_text_gp:om)) (name_text_gp:maillot))~1)",

for "olympique de marseille maillot"
"parsedquery_toString":"+((((name_text_gp:om (+name_text_gp:olympiqu
+name_text_gp:marseil))) (name_text_gp:maillot))~1)",

for "maillot om"
parsedquery_toString":"+(((name_text_gp:maillot) (((+name_text_gp:olympiqu
+name_text_gp:marseil) name_text_gp:om)))~1)",

for "maillot olympique de marseille"
 "parsedquery_toString":"+(((name_text_gp:maillot) ((name_text_gp:om
(+name_text_gp:olympiqu +name_text_gp:marseil))))~1)",


The query result are the same for all queries.

It looks like this could be an acceptable workaround.

Thank you

Dominique



Le dim. 11 févr. 2018 à 10:31, Dominique Bejean <dominique.bej...@eolya.fr>
a écrit :

> Hi Steve,
>
> Thank you for your response.
> The Jira was created : SOLR-11968
>
> I let you add your comments.
>
> Regards.
>
> Dominique
>
>
> Le sam. 10 févr. 2018 à 20:30, Steve Rowe <sar...@gmail.com> a écrit :
>
>> Hi Dominique,
>>
>> Looks like it’s a bug, not sure where exactly though.  Can you please
>> create a JIRA?
>>
>> I can see the same behavior on master too, not just on the
>> releases/lucene-solr/6.6.2 tag.
>>
>> One interesting thing I found is that if I remove the stop filter from
>> the query analyzer, I get the following for qq=“maillot om”:
>>
>> +((name_text_gp:maillot) (((+name_text_gp:olympiqu +name_text_gp:de
>> +name_text_gp:marseil) name_text_gp:om)))
>>
>> (btw my stop list only has “de” on it)
>>
>> Thanks,
>>
>> --
>> Steve
>> www.lucidworks.com
>>
>> > On Feb 10, 2018, at 2:12 AM, Dominique Bejean <
>> dominique.bej...@eolya.fr> wrote:
>> >
>> > Hi,
>> >
>> > More info.
>> >
>> > When I test the analisys for the field type the synonyms are correctly
>> > expanded for both expressions
>> >
>> > om maillot
>> > maillot om
>> > olympique de marseille maillot
>> > maillot olympique de marseille
>> >
>> > resulting outputs always include the following terms (obvioulsly not
>> always
>> > in the same order)
>> >
>> > olympiqu om marseil maillot
>> >
>> >
>> > So, i suspect an issue with edismax query parser.
>> >
>> > Regards.
>> >
>> > Dominique
>> >
>> >
>> > Le ven. 9 févr. 2018 à 18:25, Dominique Bejean <
>> dominique.bej...@eolya.fr>
>> > a écrit :
>> >
>> >> Hi,
>> >>
>> >> I am trying multi words query time synonyms with Solr 6.6.2and
>> >> SynonymGraphFilterFactory filter as explain in this article
>> >>
>> >>
>> https://lucidworks.com/2017/04/18/multi-word-synonyms-solr-adds-query-time-support/
>> >>
>> >> My field type is :
>> >>
>> >> <fieldType name="textSyn" class="solr.TextField"
>> >> positionIncrementGap="100">
>> >>    <analyzer type="index">
>> >>      <tokenizer class="solr.StandardTokenizerFactory"/>
>> >>      <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> >>            articles="lang/contractions_fr.txt"/>
>> >>      <filter class="solr.LowerCaseFilterFactory"/>
>> >>      <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> >> ignoreCase="true"/>
>> >>      <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> >>    </analyzer>
>> >>    <analyzer type="query">
>> >>      <tokenizer class="solr.StandardTokenizerFactory"/>
>> >>      <filter class="solr.ElisionFilterFactory" ignoreCase="true"
>> >>            articles="lang/contractions_fr.txt"/>
>> >>      <filter class="solr.LowerCaseFilterFactory"/>
>> >>      <filter class="solr.SynonymGraphFilterFactory"
>> >> synonyms="synonyms.txt"
>> >>            ignoreCase="true" expand="true"/>
>> >>      <filter class="solr.ASCIIFoldingFilterFactory"/>
>> >>      <filter class="solr.StopFilterFactory" words="stopwords.txt"
>> >> ignoreCase="true"/>
>> >>      <filter class="solr.FrenchMinimalStemFilterFactory"/>
>> >>    </analyzer>
>> >>  </fieldType>
>> >>
>> >>
>> >> synonyms.txt contains the line
>> >>
>> >> om, olympique de marseille
>> >>
>> >>
>> >> The order of words in my query has an impact on the generated query in
>> >> edismax
>> >>
>> >> q={!edismax qf='name_text_gp' v=$qq}
>> >> &sow=false
>> >> &qq=...
>> >>
>> >> with "qq=om maillot" or "qq=olympique de marseille maillot", I can see
>> the
>> >> synonyms expansion. It is working as expected.
>> >>
>> >> "parsedquery_toString":"+(((+name_text_gp:olympiqu
>> +name_text_gp:marseil
>> >> +name_text_gp:maillot) name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:om (+name_text_gp:olympiqu
>> >> +name_text_gp:marseil +name_text_gp:maillot)))",
>> >>
>> >>
>> >> with "qq=maillot om" or "qq=maillot olympique de marseille", I can see
>> the
>> >> same generated query
>> >>
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >> "parsedquery_toString":"+((name_text_gp:maillot) (name_text_gp:om))",
>> >>
>> >> I don't understand these generated queries. The first one looks like
>> the
>> >> synonym expansion is ignored, but the second one shows it is not
>> ignored
>> >> and only the synonym term is used.
>> >>
>> >>
>> >> What is wrong in the way I am doing this ?
>> >>
>> >> Regards
>> >>
>> >> Dominique
>> >>
>> >> --
>> >> Dominique Béjean
>> >> 06 08 46 12 43
>> >>
>> > --
>> > Dominique Béjean
>> > 06 08 46 12 43
>>
>> --
> Dominique Béjean
> 06 08 46 12 43
>
-- 
Dominique Béjean
06 08 46 12 43

Re: Multi words query time synonyms

Reply via email to