This works with NEITHER HtmlEncoder NOR DefaultEncoder.

1. Special characters like öäüß simply are returned as question marks.
   This goes for ALL document types.

2. The index is built in a way that randomly concatenates words and
   puts them into the highlighting section in a way that does not
   mirror the original text. This goes for GERMAN PDFs.

On 04/14/2011 05:51 PM, Yonik Seeley wrote:
On Thu, Apr 14, 2011 at 11:27 AM, Koji Sekiguchi<k...@r.email.ne.jp>  wrote:
I'm not sure, but it is due to HtmlEncoder?

      <!-- Configure the standard encoder -->
      <encoder name="html"
               default="true"
               class="solr.highlight.HtmlEncoder" />

it set as default in example config.

Thanks Koji,

So it looks like the problems here are either in Tika (and PDFBox), or
the Tika-Solr integration.

-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco

Reply via email to