(11/12/21 22:28), Tanguy Moal wrote:
Dear all,
I'm try to get highlighting working, and I'm almost done, but that's not
perfect yet...
Basically my documents have a title and a description.
I have two kind of text fields :
text :
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
and text_french_light :
<fieldType name="text_french_light" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.FrenchLightStemFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1"
generateNumberParts="1"
catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true"
expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.FrenchLightStemFilterFactory" />
<filter class="solr.ASCIIFoldingFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
I then define my fields the following way :
<field name="title" type="text" indexed="true" stored="true" termVectors="true"
termPositions="true" termOffsets="true"/>
<field name="title_stemmed" type="text_french_light" indexed="true"
stored="true"
termVectors="true" termPositions="true" termOffsets="true"/>
<field name="title_stemmed_nonorms" type="text_french_light" indexed="true"
stored="false"
omitNorms="true" omitTermFreqAndPositions="true"/>
<field name="description" type="text" indexed="true" stored="true"
termVectors="true"
termPositions="true" termOffsets="true"/>
<field name="description_stemmed" type="text_french_light" indexed="true"
stored="true"
termVectors="true" termPositions="true" termOffsets="true"/>
<field name="description_stemmed_nonorms" type="text_french_light" indexed="true"
stored="false"
omitNorms="true" omitTermFreqAndPositions="true"/>
I have the following copyField directives :
<copyField source="title" dest="title_stemmed" />
<copyField source="title" dest="title_stemmed_nonorms" />
<copyField source="description" dest="description_stemmed" />
<copyField source="description" dest="description_stemmed_nonorms" />
I rely on dismax query handler to achieve relevancy.
I have two different search use cases :
- a "structured search" mode where my query looks like q="Term1
term2"&qf=my_category_field^1.0&hl.q=Word1 word2&mm=100%
- a "free-text search" mode where my query looks like q=Term1
term2&qf=title_stemmed_nonorms^1.0
description_stemmed_nonorms^0.5&mm=-40%
Shared query parameters are as follow : defType=dismax&hl=on&hl.fl=title_stemmed
description_stemmed&hl.useFastVectorHighlighter=true&hl.fragListBuilder=single
For all use cases, I have the good relevancy parameters, my results are
satisfying.
Troubles concern highlighting :
- in the "free-text search" mode, everything is fine : the query is not a
phrase query, and
highlighted terms may vary from the query terms (if stemming came into play)
- in the "structured search" mode, I've got less luck : the query is a phrase
query. Therefor, I
rely on the hl.q parameter to achieve my needs. However, when specified in the
hl.q parameter the
query isn't processed the same way that it should when trying to highlight from
the fields : query
analysis seems not to be applied.
I can prove it easily by entering my query term that isn't highlighted in the
analysis.jsp page,
obtain it's stemmed version, use that in the hl.q parameter, and then I can see
my highlighted terms
as expected.
I suspect a bug arround the handling of the default query (hl.q) when fields to
highlight have a
custom analysis (especially when stemming, word delimiters, and so on are
involved).
I tried playing with hl.usePhraseHighlighter=true and
hl.highlightMultiTerm=true but that didn't
help at all =D
I tried using both legacy highlighter and FVH but the same issue occurs.
The issue only triggers when relying on hl.q.
Thank you very much for any help,
--
Tanguy
Tanguy,
Thank you for reporting this!
> The issue only triggers when relying on hl.q.
That is not good. Can you reproduce the problem on Solr example environment?
If we can share same environment (solrconfig.xml and schema.xml), request params
to reproduce and data, I'd like to look into it.
koji
--
http://www.rondhuit.com/en/