Hello, Yes of course, index time synonyms lessens the query time complexity and will solve the mm problem. It also screws IDF and the flexibility of adding synonyms on demand. The first we do not want, the second is impossible for us (very large main search index).
We are looking for a solution with mm that takes KeywordRepeat, stemming and synonym expansion into consideration. To me the current working of mm in this case is a bug, i input one term so treat it as one term in mm, regardless of expanded query terms. Any query time ideas to share? I am not well versed with the actual code dealing with this specific subject, the code doesn't like me. I am fine if someone points me to the code that tells mm about the number of original input terms, and what to do. If someone does, please also explain why the change i want to make is a bad one, what to be aware of or what to beware of, or what to take into account. Also, am i the only one who regards this behaviour as a bug, or more subtle, a weird unexpected behaviour? Many many thanks! Markus -----Original message----- > From:Shawn Heisey <apa...@elyograg.org> > Sent: Wednesday 20th December 2017 22:39 > To: solr-user@lucene.apache.org > Subject: Re: Trouble with mm and SynonymQuery and KeywordRepeatFilter > > On 12/19/2017 4:38 AM, Markus Jelsma wrote: > > I have an interesting issue with mm and SynonymQuery and > > KeywordRepeatFilter. We do query time synonym expansion and use > > KeywordRepeat for not only finding stemmed tokens. Our synonyms are already > > preprocessed and contain only stemmed tokens. Synonym file contains: > > traject,verbind > > > > So, any non-root stem that ends up in a synonym is actually a search for > > three terms: +DisjunctionMaxQuery(((title_nl:trajecten > > Synonym(title_nl:traject title_nl:verbind)))) > > > > But, our default mm requires that two terms must match if the input query > > consists of two terms: 2<-1 5<-2 6<90% > > > > So, a simple query looking for a plural (trajecten) will not match a > > document where the title contains only its singular form: q=trajecten will > > not match document with title_nl:"een traject" > > I would think that doing synonym expansion at index time would remove > any possible confusion about the number of terms at query time. Queries > that involve synonyms will be slightly less complex, but the index would > be larger, so it's difficult to say whether those kinds of queries would > be any faster or not. > > There is one clear disadvantage to index-time synonym expansion: If you > change your synonyms, you have to reindex. > > Thanks, > Shawn > >