Hi,

I'm afraid I've found another slightly odd thing with Highlighting, in this case in a multi-valued field I'm using for author names.

The author names are typically Surname, initials (e.g. May, A.D.), and these are the kind of results I'm getting:

authors:Buxton

<?xml version="1.0" encoding="UTF-8"?>
<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
<result numFound="2" start="0">
 <doc>
  <arr name="authors"><str>Duncan, W.I.</str><str>Buxton, N.W.K.</str></arr>
 </doc>
 <doc>
  <arr name="authors"><str>Buxton, M.W.N.</str><str>Pedley, H.M.</str></arr>
 </doc>
</result>
<lst name="highlighting">
 <lst name="geol/jgs/1995/00000152/00000002/15220251">
  <arr name="authors">
    <str>.&lt;em>Buxton&lt;/em>, N.W.K</str>
  </arr>
 </lst>
 <lst name="geol/jgs/1989/00000146/00000005/14650746">
  <arr name="authors">
    <str>&lt;em>Buxton&lt;/em>, M.W.N</str>
  </arr>
 </lst>
</lst>
</response>

So in the first case, where the second author name was matched, the final period has disappeared, and there's a stray period at the start. In the second case where the first author name was matched, the final period is also missing, but there's no extra period at the start.

This pattern is the same for other author searches, which suggests that it's picking up the last character from the previous field and returning that at the start, and loosing the last character.

However, some searches on keywords (also multi-valued) seem to suggest that it's not that simple:

keywords:rock (with maxSnippets=100)

<?xml version="1.0" encoding="UTF-8"?>
<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>

<result numFound="18" start="0">
 <doc>
<arr name="keywords"><str>fracture (rock)</str><str>porosity (rock)</str><str>permeability (rock)</str>
        <str>nuclear magnetic resonance</str></arr>
 </doc>
 <doc>
<arr name="keywords"><str>United Kingdom</str><str>Carboniferous</str><str>clastie rocks</str>
        <str>coal seams</str><str>sedimentary rocks</str></arr>
 </doc>
</result>
<lst name="highlighting">
 <lst name="geol/pg/2002/00000008/00000003/art00001">
  <arr name="keywords">
        <str>fracture (&lt;em>rock&lt;/em></str>
        <str>)porosity (&lt;em>rock&lt;/em></str>
        <str>)permeability (&lt;em>rock&lt;/em></str>
  </arr>
 </lst>
 <lst name="geol/jgs/1995/00000152/00000005/15250819">
  <arr name="keywords">
        <str>clastie &lt;em>rocks&lt;/em></str>
        <str>sedimentary &lt;em>rocks&lt;/em></str>
  </arr>
 </lst>
</lst>
</response>

The first document seems to have the same behaviour as the authors searching, but the second one where there's no punctuation, there's no missing/moved characters (as far as I can tell this seems to be true whether the highlight is at the start/end of the value, or in the middle).

Any thoughts? Let me know if I should open a JIRA issue.

Thanks,

Andrew

Reply via email to