On Dec 17, 2007 11:04 AM, Jörg Kiegeland <[EMAIL PROTECTED]> wrote:
 > When you use POST, you can and should specify the charset.  If you are
 > doing this, it should work.
 >

 Where can I do this? Have you any example? I have a QueryRequest
 instance, a SolrQuery and a SolrServer instance
 and set the query by solrQuery.setQuery(query) where "query" is a String
 containing Japanese characters.

Ah, sorry, I hadn't realized you were using SolrJ.

It looks like SolrJ uses percent encoded UTF8 in the POST body for
parameters, just as it does in the URL.
Does anyone know if this double-encoding (percent encoding of UTF-8
bytes) is a standard for application/x-www-form-urlencoded?

I don't believe it is.

I had to code up some support for handling form data sent with a PUT request, and the logic that I copied from (I believe) the Resin web server was:

1. Make sure the contentType was either unspecified or application/x-www-form-urlencoded.

2. Read the body as a byte array, then use the request charset to convert to a String. If the request charset was unspecified, assume us-ascii. Typically the charset isn't specified, e.g. if you use the curl tool to POST data then no charset is sent with the Content-Type header.

3. Convert all key/value pairs using URLDecoder.deccode(string, "UTF-8")

Is there any reason we shouldn't just use UTF8 directly and declare
that in the Content-Type?

Since the key/value pairs should be URL-encoded, I believe it's standard to assume us-ascii as the charset for the Content-Type. But UTF-8 would work as well, as us-ascii can be viewed as a sub-set of UTF-8.

-- Ken


$ nc -l -p 8983
POST /solr/select HTTP/1.1
User-Agent: Solr[org.apache.solr.client.solrj.impl.CommonsHttpSolrServer] 1.0
Host: localhost:8983
Content-Length: 42
Content-Type: application/x-www-form-urlencoded

q=features%3Ah%C3%A9llo&wt=xml&version=2.2


-Yonik


--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"If you can't find it, you can't fix it"

Reply via email to