I finally managed to answer my own question. UTF-8 data in the body is ok,
but you need to specify charset=utf-8 in the Content-Type header in each
part, to tell the receiver (Solr) that it's not the default ISO-8859-1
Content-Disposition: form-data; name=literal.bptitle
Content-Type: text/p
I'm trying to post a PDF along with a whole bunch of metadata fields to the
ExtractingRequestHandler as multipart/form-data. It works fine except for
the utf-8 character handling. Here is what my post looks like (abridged):
POST /solr/update/extract HTTP/1.1
TE: deflate,gzip;q=0.3
Conn