Hi all!

> Before 7.4, to be handled by regex routines, UTF-8 are converted to
> ISO 10646. There was a limitaion in regex routines in that they cannot
> handle multibyte characters > 2bytes. In another word only 16bit UCS-2
> are supported. That's why ISO 10646 > 0x10000 is rejected.

Is this still an issue? The sanity check is still in wchar.c, but I
can store and retrieve UTF-8 characters quite fine in 8.0, and Paco
Avila's example ("Cañón") works fine:

  # insert into test values('Cañón');

  # select * from test where x ~ '^C.*';
     x
   -------
    Cañón
 
  # select * from test where x ~ 'ñó';
     x
   -------
    Cañón

Does anybody have an example that fails?

Tom, can this check in wchar.c finally be dropped?

Thanks and have a nice day,

Martin

-- 
Martin Pitt        http://www.piware.de
Ubuntu Developer   http://www.ubuntu.com
Debian Developer   http://www.debian.org

Attachment: signature.asc
Description: Digital signature

Reply via email to