How to I let the FVH highlight individual terms instead of the complete phrase?

Burgmans, Tom Fri, 21 Dec 2012 09:31:46 -0800

Hi group,

I'm trying to highlight my complete(!) XML document, which is indexed for that 
purpose in a special field called "wkxmlsource". I configured the "wkxmlsource" 
field like


<field indexed="true" multiValued="false" name="wkxmlsource" omitNorms="true" 
stored="true" termPositions="true" termOffsets="true" termVectors="true" 
type="text_xml"/>

And the text_xml fieldtype is almost equal to the text_en field, but with the 
<charFilter class="solr.HTMLStripCharFilterFactory" /> as the first class in 
the index analyzer. That prevents highlighting inside XML tags.

First I tried the simple highlighter and that almost worked: I get my document 
back with my search terms and phrases highlighted, each individual term gets it 
own highlight tags. But the problem is that not the complete value of field 
"wkxmlsource" is returned; it cuts off the bottom part, no matter how big I set 
the hl.fragsize.

So my next try was to use the FVH (hl.useFastVectorHighlighter=true) instead. 
That helped: it returns now the complete value of "wkxmlsource" with all my 
search terms/phrases highlighted. But...in case of a phrase search, it doesn't 
highlight each individual term anymore, but it only puts highlight tags around 
the complete phrase. That could possible lead to malformed XML. An example:

Search for phrase: "across the country Santa Fe" it highlights like this in the 
document:

<para align="left">...spread <em>across the country.</para><para 
align="left">Santa Fe</em> Pacific... </para>

How can I let the FVH highlight individual terms instead of the complete 
phrase? Ideally I like to have something like:

<para align="left">...spread <em>across</em>  <em>the</em>  
<em>country</em>.</para><para align="left"><em>Santa</em>  <em>Fe</em> 
Pacific... </para>

which is still valid XML.

My boundaryscanner is configured like:

                                               <boundaryScanner 
name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
                                                                <lst 
name="defaults">
                                                                                
<str name="hl.bs.type">WORD</str>
                                                                                
<str name="hl.bs.language">en</str>
                                                                                
<str name="hl.bs.country">US</str>
                                                                </lst>
                                                </boundaryScanner>


Thanks, Tom
--
Tom Burgmans

[cid:image001.jpg@01CDDFA4.2B7968E0]

Search Specialist


Tel:      +31 (0)17 246 66 33
Mobile: +31 (0)6 306 821 78

Platform Technologies
Global Platform Organization

Zuidpoolsingel 2
2408 ZE, Alphen aan den Rijn The Netherlands

tom.burgm...@wolterskluwer.com


www.wolterskluwer.com





________________________________
This email and any attachments may contain confidential or privileged 
information
and is intended for the addressee only. If you are not the intended recipient, 
please
immediately notify us by email or telephone and delete the original email and 
attachments
without using, disseminating or reproducing its contents to anyone other than 
the intended
recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete 
transmission of
of this email or any attachments, nor for unauthorized use by its employees.

Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The 
Netherlands, and is registered
with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.

How to I let the FVH highlight individual terms instead of the complete phrase?

Reply via email to