This works with NEITHER HtmlEncoder NOR DefaultEncoder.
1. Special characters like öäüß simply are returned as question marks.
This goes for ALL document types.
2. The index is built in a way that randomly concatenates words and
puts them into the highlighting section in a way that does not
mirror the original text. This goes for GERMAN PDFs.
On 04/14/2011 05:51 PM, Yonik Seeley wrote:
On Thu, Apr 14, 2011 at 11:27 AM, Koji Sekiguchi<k...@r.email.ne.jp> wrote:
I'm not sure, but it is due to HtmlEncoder?
<!-- Configure the standard encoder -->
<encoder name="html"
default="true"
class="solr.highlight.HtmlEncoder" />
it set as default in example config.
Thanks Koji,
So it looks like the problems here are either in Tika (and PDFBox), or
the Tika-Solr integration.
-Yonik
http://www.lucenerevolution.org -- Lucene/Solr User Conference, May
25-26, San Francisco