: I have a mirror of the entire dmoz content in a solr index. International : characters seem to be loaded and returned in queries just fine but queries : that _contain_ international character queries return no results for known : matching patterns. : : Is there a filter class I need to be using for international character : support? Any other gotchas in supporting these characters in solr?
there are a couple of things that might be going on... 1) at the moment, solr only really plays nicely with UTF-8 ... so if you are dealing with another charset, that may be the orrigin of the issue... 2) the HTTP requests you are sending may not be encoding the characters properly in the request ... what does your query URL look like? Using the example schema, and searching for "LATIN SMALL LETTER E WITH ACUTE" my URL looks like this... http://localhost:8983/solr/select/?q=%C3%A9&version=2.2&start=0&rows=10&indent=on and correctly finds the doc with id UTF8TEST 3) you may be using an Analyzer/TokenFilter that is striping/replacing your characters during analysis, try using /solr/admin/analysis.jsp to see what is getting indexed in each field when you put in your international characters and what tokens your query time analyzer produces for your input. -Hoss