Hi Markus,

My suggestion: rewrite your synonyms to include the triggering word in the 
expanded synonyms list.  That way you won’t need KeywordRepeat/RemoveDuplicates 
filters, and mm=100% will work as you expect.

I don’t think this situation is a bug, since mm applies to the built query, not 
to the original query terms.

--
Steve
www.lucidworks.com

> On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jel...@openindex.io> wrote:
> 
> Hello,
> 
> Yes of course, index time synonyms lessens the query time complexity and will 
> solve the mm problem. It also screws IDF and the flexibility of adding 
> synonyms on demand. The first we do not want, the second is impossible for us 
> (very large main search index).
> 
> We are looking for a solution with mm that takes KeywordRepeat, stemming and 
> synonym expansion into consideration. To me the current working of mm in this 
> case is a bug, i input one term so treat it as one term in mm, regardless of 
> expanded query terms.
> 
> Any query time ideas to share? I am not well versed with the actual code 
> dealing with this specific subject, the code doesn't like me. I am fine if 
> someone points me to the code that tells mm about the number of original 
> input terms, and what to do. If someone does, please also explain why the 
> change i want to make is a bad one, what to be aware of or what to beware of, 
> or what to take into account.
> 
> Also, am i the only one who regards this behaviour as a bug, or more subtle, 
> a weird unexpected behaviour?
> 
> Many many thanks!
> Markus
> 
> -----Original message-----
>> From:Shawn Heisey <apa...@elyograg.org>
>> Sent: Wednesday 20th December 2017 22:39
>> To: solr-user@lucene.apache.org
>> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
>> 
>> On 12/19/2017 4:38 AM, Markus Jelsma wrote:
>>> I have an interesting issue with mm and SynonymQuery and 
>>> KeywordRepeatFilter. We do query time synonym expansion and use 
>>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are already 
>>> preprocessed and contain only stemmed tokens. Synonym file contains: 
>>> traject,verbind
>>> 
>>> So, any non-root stem that ends up in a synonym is actually a search for 
>>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten 
>>> Synonym(title_nl:traject title_nl:verbind))))
>>> 
>>> But, our default mm requires that two terms must match if the input query 
>>> consists of two terms: 2<-1 5<-2 6<90%
>>> 
>>> So, a simple query looking for a plural (trajecten) will not match a 
>>> document where the title contains only its singular form: q=trajecten will 
>>> not match document with title_nl:"een traject"
>> 
>> I would think that doing synonym expansion at index time would remove
>> any possible confusion about the number of terms at query time.  Queries
>> that involve synonyms will be slightly less complex, but the index would
>> be larger, so it's difficult to say whether those kinds of queries would
>> be any faster or not.
>> 
>> There is one clear disadvantage to index-time synonym expansion: If you
>> change your synonyms, you have to reindex.
>> 
>> Thanks,
>> Shawn
>> 
>> 

Reply via email to