I finally managed to answer my own question. UTF-8 data in the body is ok, but you need to specify charset=utf-8 in the Content-Type header in each part, to tell the receiver (Solr) that it's not the default ISO-8859-1
Content-Disposition: form-data; name=literal.bptitle Content-Type: text/plain; charset=utf-8 accented séance ghosts --W76L1XO3T9bSMjapwVc9MgXQDNwQ4DBKgevNArdl References: The default charset is ISO-8859-1: http://tools.ietf.org/html/rfc2616#section-3.7.1 How to set the charset for multipartform-data: http://www.w3.org/TR/html4/interact/forms.html#h-17.13.4.2 And if anybody's curious, here's how you specify that in Perl and send a pdf to the /update/extract solr-cell handler: my %form_fields = ( title => 'accented séance ghosts', author => 'smith' ); my @content; while (my ($field, $value) = each %form_fields){ if ($value =~ /^[[:ascii:]]+$/ ){ push @content, "literal.$field" => $value; }else{ push @content, "literal.$field" => [ undef, "literal.$field", "Content-Type" => 'text/plain; charset=utf-8', "Content-Disposition" => "form-data; name=literal.$field", "Content" => encode('utf-8-strict', $value), ]; } } push @content, ( myfile => [ $path, undef, 'Content-Type' => 'application/pdf', 'Content-Transfer-Encoding', 'binary' ]), local $HTTP::Request::Common::DYNAMIC_FILE_UPLOAD = 1; my $response = $ua->post( $extract_uri, Content_Type => 'form-data', Content => \@content, ); -- View this message in context: http://lucene.472066.n3.nabble.com/form-data-post-to-ExtractingRequestHandler-with-utf-8-characters-not-handled-tp3461731p3474450.html Sent from the Solr - User mailing list archive at Nabble.com.