When I use HttpClient and its PostMethod to post a query with some Chinese,
solr fails returning any record, or return everything.
            ... ...
            method = new PostMethod(solrReq);
            method.getParams().setContentCharset("UTF-8");
            method.setRequestHeader("Content-Type",
"application/x-www-form-urlencoded; charset=UTF-8");
            ... ...

I used tcp dump and found out the query my application above sent is an
urlencoded query string to solr (see the "q=xxx" part):

../....SPOST /solr/413/select HTTP/1.1
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
Accept: */*
User-Agent: Jakarta Commons-HttpClient/3.1
Host: 172.20.73.142:8080
Content-Length: 192

q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+&hl.fl=&qt=standard&wt=standard&rows=20
17:09:55.592527 IP xxx> yyy.webcache: tcp 0
... ...

I found this urlencoding is what causing solr query failing. I found this by
copying the above urlencoded query to a file and use curl command, then I
got same error, but if I replace the above query with decoded string, then
it works with solr:

curl -v -H 'Content-type:application/x-www-form-urlencoded; charset=utf-8' 
http://localhost:8080/solr/413/select --data @/tmp/chinese_query

when /tmp/chinese_query has following it works with solr:
q=type:message+AND+customer_id:413+AND+subject_zhs:能力+&hl.fl=&qt=standard&wt=standard&rows=20

But if I switched the /tmp/chinese_query  to use urlencoded string, it fails
again with same error:
q=type%3Amessage+AND+customer_id%3A413+AND+subject_zhs%3A%E8%83%BD%E5%8A%9B+&hl.fl=&qt=standard&wt=standard&rows=20

So, my conclusion:
1) solr (I am using 3.5) only accept decoded query string, it fails with url
encoded query
2) httpclient will send out urlencoded string no matter what (there is no
way seems to me to make it sends out request in POST without urlencoding the
body).

am I missing something, or do you have any suggestion what I am doing wrong?
thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/POST-query-with-non-ASCII-to-solr-using-httpclient-wont-work-tp4032957.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to