I also have a real world document that doesn't work (from our nutch crawls):
wget http://variogr.am/badfile.txt
./post.sh badfile.txt
A solr rock star advised me to try SOLR-214, which fixes the
problem. Perhaps he'll illuminate us as to the reasons! But for now
be careful with Resin.
I don't know about this "rock star" business!
Brian's setup worked running trunk from ~1 month ago... the major
character encoding change since then is to use the servlet
container's getReader() rather then construct it from the stream.
The javadocs are clear that the servlet container needs to handle
the conversion... but if that is causing problems in the newest
resin and tomcat, maybe solr should take care of it.
From my experience with Resin, you definitely want to be as explicit
as you can about the character encoding.
-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"