Re: highlighting the boolean query

2015-02-25 Thread Dmitry Kan
Erick, Eric and Mike, Thanks for your help and ideas. It sounds like we'd need to do a bit of revamping in the highlighter. Perhaps even PostingsHighligher should be taken as the baseline, since it is faster. It uses the same extractTerms() method, that Erik has shown. The user story here is tha

Re: highlighting the boolean query

2015-02-24 Thread Michael Sokolov
There is also PostingsHighlighter -- I recommend it, if only for the performance improvement, which is substantial, but I'm not completely sure how it handles this issue. The one drawback I *am* aware of is that it is insensitive to positions (so words from phrases get highlighted even in isol

Re: highlighting the boolean query

2015-02-24 Thread Erik Hatcher
BooleanQuery’s extractTerms looks like this: public void extractTerms(Set terms) { for (BooleanClause clause : clauses) { if (clause.isProhibited() == false) { clause.getQuery().extractTerms(terms); } } } that’s generally the method called by the Highlighter for what terms should

Re: highlighting the boolean query

2015-02-24 Thread Erick Erickson
Hmmm, not quite sure what to say. Offsets and positions help, particularly with FastVectorHighlighter, but the highlighting is usually re-analyzed anyway so it _shouldn't_ matter. But what I don't know about highlighting could fill volumes ;).. Sorry I can't be more help here. Erick On Tue, Feb 2

Re: highlighting the boolean query

2015-02-24 Thread Dmitry Kan
Erick, Our default operator is AND. Both queries below parse the same: a OR (b c) OR d a OR (b AND c) OR d The parsed query: Contents:a (+Contents:b +Contents:c) Contents:d So this part is consistent with our expectation. >> I'm a bit puzzled by your statement that "c" didn't contribute to

Re: highlighting the boolean query

2015-02-23 Thread Erick Erickson
Highlighting is such a pain... what does the parsed query look like? If the default operator is OR, then this seems correct as both 'd' and 'c' appear in the doc. So I'm a bit puzzled by your statement that "c" didn't contribute to the score. If the parsed query is, indeed a +b +c d then it does

Re: highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Erick, nope, we are using std lucene qparser with some customizations, that do not affect the boolean query parsing logic. Should we try some other highlighter? On Mon, Feb 23, 2015 at 6:57 PM, Erick Erickson wrote: > Are you using edismax? > > On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan wrot

Re: highlighting the boolean query

2015-02-23 Thread Erick Erickson
Are you using edismax? On Mon, Feb 23, 2015 at 3:28 AM, Dmitry Kan wrote: > Hello! > > In solr 4.3.1 there seem to be some inconsistency with the highlighting of > the boolean query: > > a OR (b c) OR d > > This returns a proper hit, which shows that only d was included into the > document score

highlighting the boolean query

2015-02-23 Thread Dmitry Kan
Hello! In solr 4.3.1 there seem to be some inconsistency with the highlighting of the boolean query: a OR (b c) OR d This returns a proper hit, which shows that only d was included into the document score calculation. But the highlighter returns both d and c in tags. Is this a known issue of