Re: regex match and special characters

2018-08-18 Thread Oleksii Kliukin

> On 16. Aug 2018, at 16:57, Tom Lane  wrote:
> 
> Alex Kliukin  writes:
>> Here is a simple SQL statement that gives different results on PostgreSQL 
>> 9.6 and PostgreSQL 10+. The space character at the end of the string is 
>> actually U+2006 SIX-PER-EM SPACE 
>> (http://www.fileformat.info/info/unicode/char/2006/index.htm)
> 
> I think the reason for the discrepancy is that in v10 we fixed the regex
> locale support so that it could properly classify code points above U+7FF,
> cf
> 
> https://git.postgresql.org/gitweb/?p=postgresql.git&a=commitdiff&h=c54159d44ceaba26ceda9fea1804f0de122a8f30
>  
> 

This nails down the cause, thanks a lot for the link! Apparently I missed it 
from PostgreSQL 10 release notes, where it is present in the “Queries” section, 
although AFAIK it deserved an entry in the "migration to version 10”, as it may 
potentially make dump/restore from previous versions to version 10 error out if 
there are table constraints that use regex classes over the Unicode text fields 
with code points above U+7FF.

> 
> So 10 is giving the right answer (i.e. that \s matches U+2006).
> 9.x is not

Agreed.

Cheers,
Alex

Re: regex match and special characters

2018-08-18 Thread Oleksii Kliukin
Hi Adrian,

> On 16. Aug 2018, at 18:13, Adrian Klaver  wrote:
> 
> test=# select 'abcd'||chr(8198) ~ 'abcd\s';
> ?column?
> --
> t
> (1 row)
> 
> 
> Wonder if the OP has standard_conforming_strings='off' and
> escape_string_warning='off'?
> 

Both are set to ‘on’ for me for all versions (I believe those are default 
settings). I have 12devel indeed on my test system alongside 9.6, but I’ve 
tried it as well on PostgreSQL 10 running on a different distro with different 
locale settings and it produced the same result (check being true). 

I think Tom’s answer solves it, although I am wondering how did you get true in 
the statement quoted above on PostgreSQL 9.6, perhaps that result is actually 
from PostgreSQL 10?

Cheers,
Oleksii