Hi list, I'm having difficulties getting the solr highlighter to highlight only the terms that actually caused the match. Let med explain:
Given a query "john OR (peter AND mary)" and two documents: "john is awesome and so is peter" "peter is awesome and so is mary", solr will highlight "peter" and "mary" in the second document, which is expected. However it will also highlight both 'john' and 'peter' in the first document, even though peter requires that mary is present also. Is there any way to improve this? If I add debugQuery, the explain-block can easily tell me that the first document matched because of john, giving it a score of 1, whereas the second matched because of the presence of both peter and mary, giving it a score of 2. So somehow, the information is available, but not used by the highlighter. Below, I have included a real world solr output to explain what I mean. Thanks, Bjarke ----------------------------------- { "responseHeader":{ "status":0, "QTime":12, "params":{ "hl.snippets":"2", "q":"plejehjem* OR (plejecentre* AND boliger*)", "defType":"lucene", "hl":"on", "fl":"doc_id,score", "fq":"doc_id:(0273-000545 OR 259531-2018)", "hl.method":"unified", "debugQuery":"on"}}, "response":{"numFound":2,"start":0,"maxScore":3.0,"docs":[ { "doc_id":"0273-000545", "score":3.0}, { "doc_id":"259531-2018", "score":1.0}] }, "highlighting":{ "udbuddk-0273-000545":{ "content_and_cpv_descriptions_da":["Beskrivelse\n-----------\n\nKonkurrenceudsættelsen omfatter drift af følgende 2 <em>plejecentre</em>: \n· Sandgårdsparken, Kjellerup, 40 <em>boliger</em> \n· Solgården, Sjørslev, 22 <em>boliger</em> \nBeslutningen om at udsætte driften af <em>plejecentre</em> for konkurrence er aftalt i den politiske budgetaftale for 2015, der blev indgået i august 2014 mellem alle byrådets partier undtagen Dansk Folkeparti og Enhedslisten. \n”Ældre- og Handicapudvalget igangsætter en proces for konkurrenceudsættelse af drift af ca. 72 <em>plejehjemspladser</em>. ", "85144100 Sygepleje på <em>plejehjem</em>"]}, "TED-259531-2018":{ "content_and_cpv_descriptions_da":["Morsø Kommune 41333014 Jernbanevej 7 Nykøbing M 7900 Birgitte Lund +45 99707017 birgitte.l...@morsoe.dk https://permalink.mercell.com/87422227.aspx http://www.morsoe.dk/ https://permalink.mercell.com/87422227.aspx Mercell Danmark A/S Østre Stationsvej 33, Vestfløjen Odense C 5000 support...@mercell.com https://permalink.mercell.com/87422227.aspx https://permalink.mercell.com/87422227.aspx Vikarydelser på ældreområdet 773-2018-5278 Udbuddet omfatter hjemmeplejen og <em>plejecentre</em> i Morsø Kommune. ", "85144100 Sygepleje på <em>plejehjem</em>"]}}, "debug":{ "rawquerystring":"plejehjem* OR (plejecentre* AND boliger*)", "querystring":"plejehjem* OR (plejecentre* AND boliger*)", "parsedquery":"content_and_cpv_descriptions_da:plejehjem* (+content_and_cpv_descriptions_da:plejecentre* +content_and_cpv_descriptions_da:boliger*)", "parsedquery_toString":"content_and_cpv_descriptions_da:plejehjem* (+content_and_cpv_descriptions_da:plejecentre* +content_and_cpv_descriptions_da:boliger*)", "explain":{ "udbuddk-0273-000545":"\n3.0 = sum of:\n 1.0 = content_and_cpv_descriptions_da:plejehjem*\n 2.0 = sum of:\n 1.0 = content_and_cpv_descriptions_da:plejecentre*\n 1.0 = content_and_cpv_descriptions_da:boliger*\n", "TED-259531-2018":"\n1.0 = sum of:\n 1.0 = content_and_cpv_descriptions_da:plejehjem*\n"}, "QParser":"LuceneQParser", "filter_queries":["doc_id:(0273-000545 OR 259531-2018)"], "parsed_filter_queries":["doc_id:0273-000545 doc_id:259531-2018"], "timing":{ "time":12.0, "prepare":{ "time":0.0, "query":{ "time":0.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":0.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":0.0}}, "process":{ "time":11.0, "query":{ "time":1.0}, "facet":{ "time":0.0}, "facet_module":{ "time":0.0}, "mlt":{ "time":0.0}, "highlight":{ "time":9.0}, "stats":{ "time":0.0}, "expand":{ "time":0.0}, "terms":{ "time":0.0}, "debug":{ "time":0.0}}, "loadFieldValues":{ "time":0.0}}}}