Re: Special Character & Hightlighting issues after 3.1.0 update

getagrip Sun, 17 Apr 2011 08:03:53 -0700

This works with NEITHER HtmlEncoder NOR DefaultEncoder.


1. Special characters like öäüß simply are returned as question marks.
   This goes for ALL document types.

2. The index is built in a way that randomly concatenates words and
   puts them into the highlighting section in a way that does not
   mirror the original text. This goes for GERMAN PDFs.

On 04/14/2011 05:51 PM, Yonik Seeley wrote:

On Thu, Apr 14, 2011 at 11:27 AM, Koji Sekiguchi<k...@r.email.ne.jp>  wrote:

I'm not sure, but it is due to HtmlEncoder?

      <!-- Configure the standard encoder -->
      <encoder name="html"
               default="true"
               class="solr.highlight.HtmlEncoder" />

it set as default in example config.


Thanks Koji,

So it looks like the problems here are either in Tika (and PDFBox), or
the Tika-Solr integration.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Re: Special Character & Hightlighting issues after 3.1.0 update

Reply via email to