Hello Steve, Well, that is an interesting approach to the topic indeed. But i do not think it is possible to obtain a list of all inflected forms for all words that also have roots in some synonym file, the stemmers are not reversible.
Any other ideas? Thanks, Markus -----Original message----- > From:Steve Rowe <sar...@gmail.com> > Sent: Thursday 21st December 2017 0:10 > To: solr-user@lucene.apache.org > Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter > > Hi Markus, > > My suggestion: rewrite your synonyms to include the triggering word in the > expanded synonyms list. That way you won’t need > KeywordRepeat/RemoveDuplicates filters, and mm=100% will work as you expect. > > I don’t think this situation is a bug, since mm applies to the built query, > not to the original query terms. > > -- > Steve > www.lucidworks.com > > > On Dec 20, 2017, at 5:02 PM, Markus Jelsma <markus.jel...@openindex.io> > > wrote: > > > > Hello, > > > > Yes of course, index time synonyms lessens the query time complexity and > > will solve the mm problem. It also screws IDF and the flexibility of adding > > synonyms on demand. The first we do not want, the second is impossible for > > us (very large main search index). > > > > We are looking for a solution with mm that takes KeywordRepeat, stemming > > and synonym expansion into consideration. To me the current working of mm > > in this case is a bug, i input one term so treat it as one term in mm, > > regardless of expanded query terms. > > > > Any query time ideas to share? I am not well versed with the actual code > > dealing with this specific subject, the code doesn't like me. I am fine if > > someone points me to the code that tells mm about the number of original > > input terms, and what to do. If someone does, please also explain why the > > change i want to make is a bad one, what to be aware of or what to beware > > of, or what to take into account. > > > > Also, am i the only one who regards this behaviour as a bug, or more > > subtle, a weird unexpected behaviour? > > > > Many many thanks! > > Markus > > > > -----Original message----- > >> From:Shawn Heisey <apa...@elyograg.org> > >> Sent: Wednesday 20th December 2017 22:39 > >> To: solr-user@lucene.apache.org > >> Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter > >> > >> On 12/19/2017 4:38 AM, Markus Jelsma wrote: > >>> I have an interesting issue with mm and SynonymQuery and > >>> KeywordRepeatFilter. We do query time synonym expansion and use > >>> KeywordRepeat for not only finding stemmed tokens. Our synonyms are > >>> already preprocessed and contain only stemmed tokens. Synonym file > >>> contains: traject,verbind > >>> > >>> So, any non-root stem that ends up in a synonym is actually a search for > >>> three terms: +DisjunctionMaxQuery(((title_nl:trajecten > >>> Synonym(title_nl:traject title_nl:verbind)))) > >>> > >>> But, our default mm requires that two terms must match if the input query > >>> consists of two terms: 2<-1 5<-2 6<90% > >>> > >>> So, a simple query looking for a plural (trajecten) will not match a > >>> document where the title contains only its singular form: q=trajecten > >>> will not match document with title_nl:"een traject" > >> > >> I would think that doing synonym expansion at index time would remove > >> any possible confusion about the number of terms at query time. Queries > >> that involve synonyms will be slightly less complex, but the index would > >> be larger, so it's difficult to say whether those kinds of queries would > >> be any faster or not. > >> > >> There is one clear disadvantage to index-time synonym expansion: If you > >> change your synonyms, you have to reindex. > >> > >> Thanks, > >> Shawn > >> > >> > >