Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-18 Thread Tim Shen
On Fri, Oct 18, 2013 at 5:39 AM, Paolo Carlini wrote: >>Oops that's only perfect Chinese version of "hello, world" under UTF-8 >>charset. Now it's hex format. >> >>I also make the char handling simpler. > > Great, thanks! -m32 and -m64 tested and committed. By the way, UTF-8 *encoding* is a more

Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-18 Thread Paolo Carlini
Hi, >Oops that's only perfect Chinese version of "hello, world" under UTF-8 >charset. Now it's hex format. > >I also make the char handling simpler. Great, thanks! Paolo

Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-17 Thread Tim Shen
On Thu, Oct 17, 2013 at 8:46 PM, Paolo Carlini wrote: >> + const wchar_t * s = L"ä½ å¥½, 世+界"; >> + wregex re(s); >> + VERIFY(regex_match_debug(L"ä½ å¥½, 世世世界", re)); Oops that's only perfect Chinese version of "hello, world" under UTF-8 charset. Now it's hex format. I also make t

Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-17 Thread Paolo Carlini
Hi, On 10/18/2013 01:17 AM, Tim Shen wrote: + setlocale(LC_ALL, "zh_CN.UTF8"); + const wchar_t * s = L"ä½ å¥½, 世+界"; + wregex re(s); + VERIFY(regex_match_debug(L"ä½ å¥½, 世世世界", re)); These strings make me a lot nervous: I should check the details of the various character sets to

[Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-17 Thread Tim Shen
The bug is because naively calling `map::count(__c)` where __c could be a wchar_t, and an implicit cast(a truncate?) happened. -m32 and -m64 tested. Thanks! -- Tim Shen a.patch Description: Binary data