Hi group, I'm trying to highlight my complete(!) XML document, which is indexed for that purpose in a special field called "wkxmlsource". I configured the "wkxmlsource" field like
<field indexed="true" multiValued="false" name="wkxmlsource" omitNorms="true" stored="true" termPositions="true" termOffsets="true" termVectors="true" type="text_xml"/> And the text_xml fieldtype is almost equal to the text_en field, but with the <charFilter class="solr.HTMLStripCharFilterFactory" /> as the first class in the index analyzer. That prevents highlighting inside XML tags. First I tried the simple highlighter and that almost worked: I get my document back with my search terms and phrases highlighted, each individual term gets it own highlight tags. But the problem is that not the complete value of field "wkxmlsource" is returned; it cuts off the bottom part, no matter how big I set the hl.fragsize. So my next try was to use the FVH (hl.useFastVectorHighlighter=true) instead. That helped: it returns now the complete value of "wkxmlsource" with all my search terms/phrases highlighted. But...in case of a phrase search, it doesn't highlight each individual term anymore, but it only puts highlight tags around the complete phrase. That could possible lead to malformed XML. An example: Search for phrase: "across the country Santa Fe" it highlights like this in the document: <para align="left">...spread <em>across the country.</para><para align="left">Santa Fe</em> Pacific... </para> How can I let the FVH highlight individual terms instead of the complete phrase? Ideally I like to have something like: <para align="left">...spread <em>across</em> <em>the</em> <em>country</em>.</para><para align="left"><em>Santa</em> <em>Fe</em> Pacific... </para> which is still valid XML. My boundaryscanner is configured like: <boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner"> <lst name="defaults"> <str name="hl.bs.type">WORD</str> <str name="hl.bs.language">en</str> <str name="hl.bs.country">US</str> </lst> </boundaryScanner> Thanks, Tom -- Tom Burgmans [cid:image001.jpg@01CDDFA4.2B7968E0] Search Specialist Tel: +31 (0)17 246 66 33 Mobile: +31 (0)6 306 821 78 Platform Technologies Global Platform Organization Zuidpoolsingel 2 2408 ZE, Alphen aan den Rijn The Netherlands tom.burgm...@wolterskluwer.com www.wolterskluwer.com ________________________________ This email and any attachments may contain confidential or privileged information and is intended for the addressee only. If you are not the intended recipient, please immediately notify us by email or telephone and delete the original email and attachments without using, disseminating or reproducing its contents to anyone other than the intended recipient. Wolters Kluwer shall not be liable for the incorrect or incomplete transmission of of this email or any attachments, nor for unauthorized use by its employees. Wolters Kluwer nv has its registered address in Alphen aan den Rijn, The Netherlands, and is registered with the Trade Registry of the Dutch Chamber of Commerce under number 33202517.