RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter

Markus Jelsma Thu, 21 Dec 2017 06:29:11 -0800

Hello Steve,

Well, that is an interesting approach to the topic indeed. But i do not think 
it is possible to obtain a list of all inflected forms for all words that also 
have roots in some synonym file, the stemmers are not reversible.


Any other ideas?

Thanks,
Markus
 
-----Original message-----
> From:Steve Rowe <sar...@gmail.com>
> Sent: Thursday 21st December 2017 0:10
> To: solr-user@lucene.apache.org
> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
> 
> Hi Markus,
> 
> My suggestion: rewrite your synonyms to include the triggering word in the 
> expanded synonyms list.  That way you won’t need 
> KeywordRepeat/RemoveDuplicates filters, and mm=100% will work as you expect.
> 
> I don’t think this situation is a bug, since mm applies to the built query, 
> not to the original query terms.
> 
> --
> Steve
> www.lucidworks.com
> 
> > On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jel...@openindex.io> 
> > wrote:
> > 
> > Hello,
> > 
> > Yes of course, index time synonyms lessens the query time complexity and 
> > will solve the mm problem. It also screws IDF and the flexibility of adding 
> > synonyms on demand. The first we do not want, the second is impossible for 
> > us (very large main search index).
> > 
> > We are looking for a solution with mm that takes KeywordRepeat, stemming 
> > and synonym expansion into consideration. To me the current working of mm 
> > in this case is a bug, i input one term so treat it as one term in mm, 
> > regardless of expanded query terms.
> > 
> > Any query time ideas to share? I am not well versed with the actual code 
> > dealing with this specific subject, the code doesn't like me. I am fine if 
> > someone points me to the code that tells mm about the number of original 
> > input terms, and what to do. If someone does, please also explain why the 
> > change i want to make is a bad one, what to be aware of or what to beware 
> > of, or what to take into account.
> > 
> > Also, am i the only one who regards this behaviour as a bug, or more 
> > subtle, a weird unexpected behaviour?
> > 
> > Many many thanks!
> > Markus
> > 
> > -----Original message-----
> >> From:Shawn Heisey <apa...@elyograg.org>
> >> Sent: Wednesday 20th December 2017 22:39
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter
> >> 
> >> On 12/19/2017 4:38 AM, Markus Jelsma wrote:
> >>> I have an interesting issue with mm and SynonymQuery and 
> >>> KeywordRepeatFilter. We do query time synonym expansion and use 
> >>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are 
> >>> already preprocessed and contain only stemmed tokens. Synonym file 
> >>> contains: traject,verbind
> >>> 
> >>> So, any non-root stem that ends up in a synonym is actually a search for 
> >>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten 
> >>> Synonym(title_nl:traject title_nl:verbind))))
> >>> 
> >>> But, our default mm requires that two terms must match if the input query 
> >>> consists of two terms: 2<-1 5<-2 6<90%
> >>> 
> >>> So, a simple query looking for a plural (trajecten) will not match a 
> >>> document where the title contains only its singular form: q=trajecten 
> >>> will not match document with title_nl:"een traject"
> >> 
> >> I would think that doing synonym expansion at index time would remove
> >> any possible confusion about the number of terms at query time.  Queries
> >> that involve synonyms will be slightly less complex, but the index would
> >> be larger, so it's difficult to say whether those kinds of queries would
> >> be any faster or not.
> >> 
> >> There is one clear disadvantage to index-time synonym expansion: If you
> >> change your synonyms, you have to reindex.
> >> 
> >> Thanks,
> >> Shawn
> >> 
> >> 
> 
>

RE: Trouble with mm and SynonymQuery and KeywordRepeatFilter

Reply via email to