Hi Yonik,
I was incorrect to describe it as _solr encoding_. Hoss suggested that
it might be a form error - I haven't checked this yet but it sound
plausible. What I called the _solr url encoding_ was the q= parameter
translated into <I'm not sure what> encoding in the url. As I mention in
my ps this translated value is not the same as when I use IE to post the
same form values.
You mentioned in another earlier post that q=h%c3%e9 would find
matching hits. My experience shows that while the UTF-8 encoded query
doesn't generate any exceptions, no results are matched. However
q=h%e9llo would find matching results (the result set I'd match in Luke).
So assuming that I can fix the form encoding errors so that the characters
are encoded as UTF-8, I believe that I would continue to return incorrect
results. Will cyrillic characters be treated any differently than the
diacritic in your example?
I have solr running in tomcat 5.5.17.
Thanks for all you help,
Tricia
On Tue, 18 Jul 2006, Yonik Seeley wrote:
On 7/18/06, Tricia Williams <[EMAIL PROTECTED]> wrote:
My sample query is: ...... (the english word _canada_
translated into russian) or
%D0%9A%D0%B0%D0%BD%D0%B0%D0%B4%D0%B0 (utf-8) or
%26%231050%3B%26%231072%3B%26%231085%3B%26%231072%3B%26%231076%3B%26%231072%3B
(solr url encoding)
Hi Tricia,
Could you clarify what you mean by "solr url encoding"? Where do you see
this?
The servlet container decodes URLs, and I'm not sure where in Solr
that URLs are encoded.
-Yonik