I also have a real world document that doesn't work (from our nutch
crawls):
wget http://variogr.am/badfile.txt
./post.sh badfile.txt
A solr rock star advised me to try SOLR-214, which fixes the problem.
Perhaps he'll illuminate us as to the reasons! But for now be careful
with Resin.
I don't know about this "rock star" business!
Brian's setup worked running trunk from ~1 month ago... the major
character encoding change since then is to use the servlet container's
getReader() rather then construct it from the stream.
The javadocs are clear that the servlet container needs to handle the
conversion... but if that is causing problems in the newest resin and
tomcat, maybe solr should take care of it.