Merlin

Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so it 
looks like there is a charset mismatch somewhere.


Cheers

François



On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote:

> Hello,
> 
> I am having problems with searches that are issued from spiders that contain
> the ASCII encoded character "ü"
> 
> For example in : "Übersetzung"
> 
> The solr log shows following query request: /suche/%DCbersetzung
> which has been translated into solr query: q=?ersetzung
> 
> If you enter the search term directly as a user into the search box it will
> result into:
> /suche/Übersetzung which returns perfect results.
> 
> I am decoding the URL within PHP: $term     = trim(urldecode($q));
> 
> Somehow urldecode() translates the Character Ü (%DC) into a ? which is a
> illigeal first character in Solr.
> 
> I tried it without urldecode(), with rawurldecode() and with utf8_decode()
> but all of those did not help.
> 
> Thank you for any help or hint on how to solve that problem.
> 
> Regards, Merlin

Reply via email to