Hello, Apologies for bothering you all again, but i really need some help in this matter. How can we resolve this issue? Are we dealing with a bug here (then i'll open a ticket), am i doing something wrong?
Is here anyone who had the same issue or understand the problem? Many thanks, Markus -----Original message----- > From:Markus Jelsma <markus.jel...@openindex.io> > Sent: Tuesday 13th November 2018 9:52 > To: solr-user <solr-user@lucene.apache.org> > Subject: KeywordRepeat, stemming, (single term) synonyms and minimum should > match (edismax) > > Hello, apologies for this long winded e-mail. > > Our fields have KeywordRepeat and language specific filters such as a > stemmer, the final filter at query-time is SynonymGraph. We do not use > RemoveDuplicatesFilter for those of you wondering why when you see the parsed > queries below, this is due to [1]. > > We use a custom QParser extending edismax and also extend > ExtendedSolrQueryParser, so we are able to override newFieldQuery in case we > have to. The problem also directly applies to Solr's vanilla edismax. The > file synonyms.txt contains the stemmed versions of the original terms. > > Consider this example synonym set [bier,brouw] where bier means beer and > brouw is the stemmed version of brouwsel (brewage, concoction), and consider > these parameters on /select: qf=content_nl&defType=edismax&mm=2<-1 5<-2 > 6<90%25. > > The queries q=bier and q=brouw both parse to the following query and give the > desired results (notice the missing RemoveDuplicates here): > +(((Synonym(content_nl:bier content_nl:brouw) Synonym(content_nl:bier > content_nl:brouw))~2)) > > However, for q=brouwsel something (partially) unexpected happens: > +(((content_nl:brouwsel Synonym(content_nl:bier content_nl:brouw))~2)) > > This results in a BooleanQuery where, due to mm=2, both clauses need to > match, giving very few matches. Removing KeywordRepeat or setting mm=1 of > course fixes the problem, but that is not what we want. > > What is also unexpected, and may be related to the problem, is that when > checking the analzer output via the GUI, we see the position incrementing > when KeywordRepeat and SynonymGraph are combined. When these filters are not > combined, the positions are always 1, as expected. When combined we get this > for 'brouw': > term: bier brouw bier brouw > pos: 1 1 2 2 > > or for 'brouwsel': > term: brouwsel bier brouw > pos: 1 2 2 > > ExtendedSolrQueryParser, and everything underneath, is a complicated piece of > code. In the end it extends Lucene's QueryBuilder, but not always relying on > its results, it seems. Edismax for example 'resets' minShouldMatch in > SolrPluginUtils.setMinShouldMatch(), so this is a complicated web of code and > i am a bit too deep in this unfamiliar area, and i am in need of help here. > > So, my question is, how to solve this problem? Or how to approach it? What > is the actual problem? How can i get the same stable results for both > queries? Does the odd positon increment have anything to do with it (it seems > Lucene's QueryBuilder does something with it). What do i need to do? > > Many thanks, > Markus > > ps. this is on Solr 7.2.1 and 7.5.0. > > [1] > http://lucene.472066.n3.nabble.com/Multiple-languages-boosting-and-stemming-and-KeywordRepeat-td4389086.html >