Re: Unified highlighter- unable to get results - can get results with original and termvector highlighters

Warren, David [USA] Tue, 16 Jun 2020 09:50:16 -0700

David –

It’s fine to take this conversation back to the mailing list.  Thank you very 
much again for your suggestions.


I think you are correct.  It doesn’t appear necessary to set termOffsets, and 
it appears that that the unified highlighter is using the TERM_VECTORS offset 
source if I don’t tell it to do otherwise.  When I run the query with 
hl.offsetSource=ANALYSIS, I get highlighted results returned.  When I run the 
query with hl.offsetSource=TERM_VECTORS or omit hl.offsetSource, I get the same 
result – no text returned in the highlighting section of the search result.

Thanks as well for the suggestion about moving clauses to fq and using a 
simpler query in hl.q.  That helps.

-Dave Warren


From: David Smiley <david.w.smi...@gmail.com>
Date: Tuesday, June 16, 2020 at 12:21 AM
To: David Warren <warren_da...@bah.com>
Subject: Re: [External] Fwd: Unified highlighter- unable to get results - can 
get results with original and termvector highlighters

Hi Dave,

Thanks for providing more information.  Is it alright to take this conversation 
back to the list or is that query/debug info sensitive?

With default hl.weightMatches=true:
Try setting hl.q (instead of defaulting to q) and set it to be the 
highlightable portion of your query -- i.e. strip out all that boosting.  For 
example, (text:zelda OR il_title:zelda) AND collection:xml_products  Does that 
help?  I suggest this because I see QParser is Lucene.  If you used edismax 
with edismax's boosting params, then this QParser is able to tell the 
highlighter the primary part of the query without boosting.  For advanced cases 
like yours, that's perhaps not possible.  BTW the collection:xml_products part 
of the query looks to me like it would better belong as a filter query (fq 
param).

I don't believe it was _necessary_ to set termOffsets; that's merely a 
performance trade-off.  If you set hl.offsetSource=ANALYSIS (thus ignoring 
termOffsets), I believe you should get the same results.  Can you confirm it's 
the same or is it still different?  If different then I'll look closer; I have 
a theory on how it could be different.

p.s. I'm quite busy but will return to this at some point.

~ David


On Thu, Jun 11, 2020 at 4:01 PM Warren, David [USA] 
<warren_da...@bah.com<mailto:warren_da...@bah.com>> wrote:
David –

Thank you very much for the response to my Solr highlighting question.  Due to 
competing priorities, I wasn’t able to further investigation before today.  
But, now that I have…
Based on your advice, I got the unified highlighter to work by setting 
hl.weightMatches=false.  The field I was highlighting wasn’t configured to 
store termOffsets, so I had to set termOffsets=true and re-index to get this to 
work.  I still don’t get any results with the unified highlighter when 
hl.weightMatches=true.

You asked about running with debug=query, Results of that are below.  Also, 
here’s the configuration for the il_title and text fields
<field name="il_title" type="text_en" indexed="true" stored="true" 
multiValued="true" termVectors="true" termOffsets="true"/>
<field name="text" type="text_en" indexed="true" stored="false" 
multiValued="true"/>

debug
rawquerystring
"({!boost b=recip(ms(NOW/HOUR,il_pubdate),3.16e-11,1,1)}text:zelda AND 
collection: xml_products) OR {!boost b=2 v=\"il_title:zelda AND collection: 
xml_products\"}"
querystring
"({!boost b=recip(ms(NOW/HOUR,il_pubdate),3.16e-11,1,1)}text:zelda AND 
collection: xml_products) OR {!boost b=2 v=\"il_title:zelda AND collection: 
xml_products\"}"
parsedquery
"(+FunctionScoreQuery(FunctionScoreQuery(text:zelda, scored by 
boost(1.0/(3.16E-11*float(ms(const(1591902000000),date(il_pubdate)))+1.0)))) 
+collection:xml_products) FunctionScoreQuery(FunctionScoreQuery(+il_title:zelda 
+collection:xml_products, scored by boost(const(2))))"
parsedquery_toString
"(+FunctionScoreQuery(text:zelda, scored by 
boost(1.0/(3.16E-11*float(ms(const(1591902000000),date(il_pubdate)))+1.0))) 
+collection:xml_products) FunctionScoreQuery(+il_title:zelda 
+collection:xml_products, scored by boost(const(2)))"
QParser
"LuceneQParser"

-Dave Warren

From: David Smiley <david.w.smi...@gmail.com<mailto:david.w.smi...@gmail.com>>
Date: Saturday, May 30, 2020 at 11:24 PM
To: David Warren <warren_da...@bah.com<mailto:warren_da...@bah.com>>
Subject: [External] Fwd: Unified highlighter- unable to get results - can get 
results with original and termvector highlighters


---------- Forwarded message ---------
From: David Smiley <david.w.smi...@gmail.com<mailto:david.w.smi...@gmail.com>>
Date: Fri, May 22, 2020 at 11:43 AM
Subject: Re: Unified highlighter- unable to get results - can get results with 
original and termvector highlighters
To: solr-user <solr-user@lucene.apache.org<mailto:solr-user@lucene.apache.org>>

Hello,

Did you get it to work eventually?

Try setting hl.weightMatches=false and see if that helps.  Wether this helps or 
not, I'd like to have a deeper understanding of the internal structure of the 
Query (not the original query string).  What query parser are you using?.  If 
you pass debug=query to Solr then you'll get a a parsed version of the query 
that would be helpful to me.

~ David


On Mon, May 11, 2020 at 10:46 AM Warren, David [USA] 
<warren_da...@bah.com<mailto:warren_da...@bah.com>> wrote:
I am running Solr 8.4 and am attempting to use its highlighting feature. It 
appears to work well when I use the original highlighter or the term vector 
highlighter, but when I try to use the unified highlighter, I get no results 
returned.  My Google searches so far have not revealed anybody having this same 
problem (perhaps user error on my part), hence why I’m asking a question to the 
Solr mailing list.

I am running a query which searches the “title_text” field for a term and 
highlights it.
The configuration for title_text is this:
<field name="title_text" type="text_en" indexed="true" stored="true" 
multiValued="true" termVectors="true"/>

The query looks like this:
https://solr-server/index/c1/select?hl.fl=title_text&hl.method=unified&hl=true&q=<https://urldefense.com/v3/__https:/solr-server/index/c1/select?hl.fl=title_text&hl.method=unified&hl=true&q=__;!!May37g!Zpuuq-n-YlQFRXdl8zgM2YH0M6tXCiz6aMn1pM2QChXAjwl1vOhknKTT7HC2dOHn$>
 title_text%3Azelda

If hl.method=original or hl.method=termvector, I get back results in the 
highlighting section with “Zelda” surrounded by <em> tags.
If hl.method=unified, all results in the highlighting section are blank.

I’ve attached a remote debugger to my Solr server and verified that the unified 
highlighter class (org/apache/solr/highlight/UnifiedSolrHighlighter.java) is 
being invoked when I set hl.method=unified.  And I do not see any errors in the 
Solr logs.

Any idea what I’m doing wrong? In looking at the Solr highlighting 
documentation, I didn’t see any additional configuration which needs to be done 
to get the unified highlighter to work.

I realize I have not provided a bunch of information here, but obviously can 
provide more if needed.

Thank you,
David Warren
Booz | Allen | Hamilton
703-625-0311 mobile

Re: Unified highlighter- unable to get results - can get results with original and termvector highlighters

Reply via email to