Hi all! > Before 7.4, to be handled by regex routines, UTF-8 are converted to > ISO 10646. There was a limitaion in regex routines in that they cannot > handle multibyte characters > 2bytes. In another word only 16bit UCS-2 > are supported. That's why ISO 10646 > 0x10000 is rejected.
Is this still an issue? The sanity check is still in wchar.c, but I can store and retrieve UTF-8 characters quite fine in 8.0, and Paco Avila's example ("Cañón") works fine: # insert into test values('Cañón'); # select * from test where x ~ '^C.*'; x ------- Cañón # select * from test where x ~ 'ñó'; x ------- Cañón Does anybody have an example that fails? Tom, can this check in wchar.c finally be dropped? Thanks and have a nice day, Martin -- Martin Pitt http://www.piware.de Ubuntu Developer http://www.ubuntu.com Debian Developer http://www.debian.org
signature.asc
Description: Digital signature