I think I may have identified a bug with FVH. So I have two questions:

1) Does anyone know how to make FVH return a highlighted snippet when the
query matches all of one string in a multivalued field?
2) If not, does anyone know how to make DIH concatenate all the values in a
multivalued field into one single field?

Imagine a document which looks like this:

<doc>
  <str name="department_name">Obstetrics and Gynaecology</str>
  <arr name="node_names">
    <str>Refer to specialist</str>
    <str>Identify adverse psycho social factors</str>
  </arr>
</doc>

If I search the document and ask for matches to be highlighted with the
original highlighter I get 'node_names' in the highlighting results

q=node_names:("Refer to specialist")&hl=true*hl.fl=*

But if I repeat the search using the FVH, 'node_names' does not appear in
the highlighting results

q=node_names:("Refer to
specialist")&hl=true*hl.fl=*&hl.useFastVectorHighlighter=true

A search for something less than the full string (e.g. "Refer to") works in
both cases.

I have tried every combination of hl.requireFieldMatch,
hl.usePhraseHighlighter with no effect.

node_names is defined as either:

<field name="node_names"      type="text_en_splitting" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>


OR:

   <field name="node_names"      type="text_en" indexed="true"
stored="true" multiValued="true" termVectors="true" termPositions="true"
termOffsets="true"/>

And I have tried setting preserveOriginal="1" on the
WordDelimiterFilterFactory.

Now FVH seems to work fine with single-valued fields, so doing a query
q=department_name:("Obstetrics and Gynaecology") works as expected. Given
that, I have tried unsuccessfully to use either a Javascript or native Java
transformer to merge the contents of node_names into a single
node_names_flat field during data import. This fails because child entities
have no access to their parent entity.

<entity name="pathway">
  <entity name="pages">
    <entity name="nodes">
     -- produces multiple node_names and there seems to be no way to push
them up into 'pages' or 'pathway'
    </entity>
  </entity>
</entity>

Duncan.

Reply via email to