Hi Lance, Lance Norskog wrote: > What platform are you using? Windows does not use UTF-8 by default, > and this can cause subtle problems. If you can do the same thing on > other platforms (Linux, Mac) that would help narrow down the problem. My Solr server runs in a Tomcat server on a Ubuntu Linux machine.
-Sascha > > On Wed, Nov 18, 2009 at 8:15 AM, Sascha Szott <sz...@zib.de> wrote: >> Hi Erik, >> >> Erik Hatcher wrote: >>> >>> Can you give me a test document that causes an issue? Â (maybe send me >>> a >>> Solr XML document in private e-mail). Â I'll see what I can do once I >>> can >>> see the issue first hand. >> >> Thank you! Just try the utf8-example.xml file in the exampledoc >> directory. >> After having indexed the document, the output of the script test_utf8.sh >> suggests to me that everything works correctly: >> >> Â Solr server is up. >> Â HTTP GET is accepting UTF-8 >> Â HTTP POST is accepting UTF-8 >> Â HTTP POST does not default to UTF-8 >> Â HTTP GET is accepting UTF-8 beyond the basic multilingual plane >> Â HTTP POST is accepting UTF-8 beyond the basic multilingual plane >> Â HTTP POST + URL params is accepting UTF-8 beyond the basic >> multilingual >> >> If I'm using the standard QueryResponseWriter and the query q=umlauts, >> the >> responding xml page contains properly printed non-ASCII characters. The >> same >> query against the VelocityResponseWriter returns a lot of Unicode >> replacement characters (u+FFFD) instead. >> >> -Sascha >> >>> >>> On Nov 18, 2009, at 2:48 PM, Sascha Szott wrote: >>> >>>> Hi, >>>> >>>> I've played around with Solr's VelocityResponseWriter (which is indeed >>>> a >>>> very useful feature for rapid prototyping). I've realized that >>>> Velocity uses >>>> ISO-8859-1 as default character encoding. I've changed this setting to >>>> UTF-8 >>>> in my velocity.properties file (inside the conf directory), i.e., >>>> >>>> Â input.encoding=UTF-8 >>>> Â output.encoding=UTF-8 >>>> >>>> and checked that the settings were successfully loaded. >>>> >>>> Within the main Velocity template, browse.vm, the character encoding >>>> is >>>> set to UTF-8 as well, i.e., >>>> >>>> Â <meta http-equiv="content-type" content="text/html; charset=UTF-8"/> >>>> >>>> After starting Solr (which is deployed in a Tomcat 6 server on a >>>> Ubuntu >>>> machine), I ran into some character encoding problems. >>>> >>>> Due to the change of input.encoding to UTF-8, no problems occur when >>>> non-ASCII characters are presend in the query string, e.g. german >>>> umlauts. >>>> But unfortunately, something is wrong with the encoding of characters >>>> in the >>>> html page that is generated by VelocityResponseWriter. The non-ASCII >>>> characters aren't displayed properly (for example, FF prints a black >>>> diamond >>>> with a white question mark). If I manually set the encoding to >>>> ISO-8859-1, >>>> the non-ASCII characters are displayed correctly. Does anybody have a >>>> clue? >>>> >>>> Thanks in advance, >>>> Sascha >>>> >>>> >>>> >>>> >>>> >>>> >>>> >> >> > > > > -- > Lance Norskog > goks...@gmail.com >