Re: [regex] New enum type syntax_option_type

2013-08-20 Thread Tim Shen
On Mon, Aug 19, 2013 at 10:26 PM, Tim Shen  wrote:
> Great, let's test it now, though I don't think that it breaks something.

Committed.


-- 
Tim Shen


Re: [Patch] Fix empty grouping problem in regex

2013-08-21 Thread Tim Shen
Change vector::at() call to vector::operator[]().

Booted and tested with -m32, -m64 and -m64 debug-mode under x86_64
GNU/Linux and committed(r201914).


-- 
Tim Shen


Re: [Patch] Replace spaces with tabs in regex files

2013-08-22 Thread Tim Shen
On Thu, Aug 22, 2013 at 4:56 PM, Paolo Carlini  wrote:
> Ok, thanks!

Committed.


-- 
Tim Shen


Re: [Patch] Fix empty grouping problem in regex

2013-08-22 Thread Tim Shen
On Thu, Aug 22, 2013 at 9:08 AM, Tim Shen  wrote:
> Booted and tested with -m32, -m64 and -m64 debug-mode under x86_64
> GNU/Linux and committed(r201914).

...and here's the patch committed.


-- 
Tim Shen


emptygroup.patch
Description: Binary data


[Patch 2/2] Localization problem in regex

2013-08-23 Thread Tim Shen
Inspired by this mail: http://gcc.gnu.org/ml/libstdc++/2013-08/msg00131.html

Thanks!


-- 
Tim Shen


regex-locale.patch
Description: Binary data


Re: [Patch 2/2] Localization problem in regex

2013-08-23 Thread Tim Shen
On Fri, Aug 23, 2013 at 6:29 PM, Paolo Carlini  wrote:
> Thanks a lot indeed. But: "Fix callers" of *what*?!? And from *where*? To be
> really honest this kind of ChangeLog entry doesn't make much sense. Please
> get used to list the specific functions you are changing. It's important.

I see.

Actually I find a mistake by this review /w\, thanks!


-- 
Tim Shen


regex-locale.patch
Description: Binary data


Re: [Patch 1/2] Rewrite regex scanner

2013-08-23 Thread Tim Shen
On Fri, Aug 23, 2013 at 5:40 PM, Jonathan Wakely  wrote:
> I would rearrange the ChangeLog, to put
>
> * include/bits/regex_executor.tcc: Don't use reference for submatch.
>
> after regex_scanner.tcc, so that the files the _Scanner is moved from
> come immediately before the files it is moved to, otherwise it reads
> like "move _Scanner from ... don't use reference ... here" which
> doesn't make much sense :-)

Hmmm...it turns out there's no specified file order? // I always use
default order before.

So here it is.

By the way, it seems to take surprisingly short time to review; after
all, it's a huge patch.

-- 
Tim Shen


changelog
Description: Binary data


Re: [Patch 2/2] Localization problem in regex

2013-08-23 Thread Tim Shen
On Fri, Aug 23, 2013 at 10:26 PM, Paolo Carlini
 wrote:
> Great. But please but the names of the functions you are changing between
> round brackets. You have tons of examples everywhere.

...like this?


-- 
Tim Shen


changelog
Description: Binary data


Re: [Patch 2/2] Localization problem in regex

2013-08-23 Thread Tim Shen
On Fri, Aug 23, 2013 at 11:03 PM, Paolo Carlini
 wrote:
> The assign change is fine. But I would also put between round brackets
> (class _Compiler<>) for the _M_traits change and use something like
> (_DFSExecutor<>::_DFSExecutor): Adjust. for the last change (remember that
> constructors are special member *functions* thus should be normally listed).

I see.

On Fri, Aug 23, 2013 at 10:57 PM, Jonathan Wakely  wrote:
> Yes, that makes it clear what part of the file is affected. That's
> important when searching ChangeLogs to find what change last touched
> e.g. basic_regex::assign.

Why don't we `svn blame`?

By the way, do we seariously need to keep ChangeLog in files? Is it
more like a tradition than a solution? After all, we have SVN or Git
now. I remember a mail metioned this but cannot find it >.<


-- 
Tim Shen


changelog
Description: Binary data


Re: [Patch 2/2] Localization problem in regex

2013-08-23 Thread Tim Shen
On Sat, Aug 24, 2013 at 1:51 AM, Stefan Schweter  wrote:
> Am Fri, 23 Aug 2013 17:17:41 +0800
> schrieb Tim Shen :
>
>> Inspired by this mail:
>> http://gcc.gnu.org/ml/libstdc++/2013-08/msg00131.html
>>
>> Thanks!
>>
>>
>
> Hi Tim,
>
> I applied the patch + compiled everything - it's working now! Thanks!
>
> But... I found a problem with .imbue():
>
> int main()
> {
> std::setlocale(LC_ALL, "");
>
> std::wstring str2 = L"öäü";
> std::wregex re2;
> re2.imbue(std::locale(""));
> re2.assign(L"([[:lower:]]+)");
> std::wsmatch m2;
>
> std::wsregex_token_iterator end {};
>
> for (std::wsregex_token_iterator p{str2.begin(), str2.end(),
> re2, {1}}; p != end; ++p) { std::wcout << *p << std::endl;
> }
>
> return 0;
> }
>
> Works fine! But when there's no imbue() call, a Segmentation
> fault occurs.
>
> int main()
> {
> std::setlocale(LC_ALL, "");
>
> std::wstring str2 = L"öäü";
> std::wregex re2;
> //re2.imbue(std::locale(""));
> re2.assign(L"([[:lower:]]+)");
> std::wsmatch m2;
>
> std::wsregex_token_iterator end {};
>
> for (std::wsregex_token_iterator p{str2.begin(), str2.end(),
> re2, {1}}; p != end; ++p) { std::wcout << *p << std::endl;
> }
>
> return 0;
> }
>
> May be it's better to throw an exception here? Thanks in advance,
>
> Stefan

Fixed.

It's not fully tested. I just run the 28_regex testcases. I'll do a
full test before committing.

Thanks again!

-- 
Tim Shen


regex-locale.patch
Description: Binary data


Re: [Patch 2/2] Localization problem in regex

2013-08-29 Thread Tim Shen
> Fixed.
>
> It's not fully tested. I just run the 28_regex testcases. I'll do a
> full test before committing.

Committed.


-- 
Tim Shen


changelog
Description: Binary data


regex-locale.patch
Description: Binary data


[Patch] Rewrite regex matchers

2013-08-30 Thread Tim Shen
Rewrite matchers using structs instead of closures. It's more clear
and easy to maintain now.

Tested under -m32, -m64 and debug mode.

Thanks!


-- 
Tim Shen


matchers.patch
Description: Binary data


Re: [Patch] Rewrite regex matchers

2013-08-30 Thread Tim Shen
On Fri, Aug 30, 2013 at 9:26 AM, Paolo Carlini  wrote:
> Hi,
>
> On 08/30/2013 02:05 PM, Tim Shen wrote:
>>
>> +  const _TraitsT&_M_traits;
>> +  _FlagT _M_flags;
>> +  bool   _M_is_non_matching;
>> +  std::set<_CharT>   _M_char_set;
>> +  std::set> _M_range_set;
>> +  _CharClassT_M_class_set;
>
> another, very general comment: now that nothing is decided ABI-wise for
> these features, let's pay attention to the layouts, let's make sure that the
> data members are ordered in the best way to minimize the size. For example,
> when I see a bool sandwiched between big objects something seems at least
> weird... Let's make measurements and optimize for 64-bit but let's double
> check on 32-bit too.

I didn't use any tool to check that, but adjust it by hand. It
shouldn't break anything, but I'll test it again before committing.


On Fri, Aug 30, 2013 at 8:34 AM, Stephen M. Webb
 wrote:
> Other than the test case (breaks on mingw32 target) this is looking very 
> sweet.

Could you please show some details?


-- 
Tim Shen


matchers.patch
Description: Binary data


Re: [Patch] Rewrite regex matchers

2013-09-02 Thread Tim Shen
On Fri, Aug 30, 2013 at 10:04 PM, Tim Shen  wrote:
> I didn't use any tool to check that, but adjust it by hand. It
> shouldn't break anything, but I'll test it again before committing.

Tested under -m32, -m64 and debug mode and committed.


-- 
Tim Shen


[Patch] Rewrite _StateSeq in regex

2013-09-03 Thread Tim Shen
This patch fix the optional operator '?' and interval repeating like
"a{3,5}" through rewrite _StateSeq.

After this patch, the assertion supporting shall be the last
significant feature to be implemented.

It's booted and tested under x86_64. It'll be fully tested before committing.

Thanks!


-- 
Tim Shen


stateseq.patch
Description: Binary data


Re: [Patch] Rewrite _StateSeq in regex

2013-09-03 Thread Tim Shen
According to this
email(http://gcc.gnu.org/ml/libstdc++/2013-09/msg5.html), I add a
testcase.



-- 
Tim Shen


stateseq.patch
Description: Binary data


Re: [Patch] Rewrite _StateSeq in regex

2013-09-05 Thread Tim Shen
On Thu, Sep 5, 2013 at 10:07 AM, Paolo Carlini  wrote:
> On 09/04/2013 05:37 AM, Tim Shen wrote:
> Thanks. Apparently there aren't further comments, thus please install it.

Committed. Complete unmatched parentheses in last commit :)


-- 
Tim Shen


Re: [Patch] Support assertions and greedy/ungreedy matching in regex

2013-09-12 Thread Tim Shen
On Thu, Sep 12, 2013 at 11:37 AM, Paolo Carlini
 wrote:
> .. please fix the overlong lines, I spotted quite a few.

Quite a few? I see only one line by casting:
egrep '^\+.{80}' a.patch

Even it's not 81.

That'll be fixed before committing.

Seriously, I think we need an AST-based formatter? Clang seems to
write a good one.


-- 
Tim Shen


Re: [Patch] Support assertions and greedy/ungreedy matching in regex

2013-09-12 Thread Tim Shen
On Thu, Sep 12, 2013 at 9:57 AM, Paolo Carlini  wrote:
> Great. A quick-quick comment: if these are the last two features, why we
> can't un-xfail the testcase which we added latety? Also, a grep revealed a
> couple more xfails. Can you clarify?

I say `feature` when I think that, what these xfails reveal are too
small to be features, say, regex_search/regex_match flags. Now turns
out "feature" is not a good word for them. I do mean all C++ library
independent part, or pure regex function part, are done. Next days I
will add flags implementation.

> Also, much more generally, I would be curious about the remaining work: I
> think essentially it boils down to the vagaries for the other non-default
> regex dialects? Is it a lot of work?

This patch(http://gcc.gnu.org/ml/libstdc++/2013-08/msg00142.html)
already supports all dialects(ECMAScript, basic, extended, grep,
egrep, awk) specified by standard. Most of the differences between
them are eliminated by _Scanner, aka tokenizer, so it's actually not a
lot of work to even add one more syntax. But again, I think more
testcases are needed, especially from those experienced regex users.
I'm actually not a big fan of regex(but of NFA ;), or can I borrow
some boost/libc++ testcases without making any licence trouble?

Thanks!


-- 
Tim Shen


Re: [Patch] Implement regex_match and regex_search

2013-09-18 Thread Tim Shen
On Tue, Sep 17, 2013 at 4:35 AM, Paolo Carlini  wrote:
> Great. Tim, please complete the testing on -m32 etc, if everything goes
> well, just wait a day or so and commit.

Tested under -m32, -m64 and check-debug, and committed, with changing
the date of the new file to today.

Thank you!


-- 
Tim Shen


[Patch] match_results::format and regex_replace

2013-09-19 Thread Tim Shen
This patch complete the last two parts of the whole regex module, but
two problems left:

1) regex_traits<>::transform_primary [28.7.7]. I don't know how to
implement it correctly. Can anyone give some advice?

2) Digraph support. Is it need to be done, since the standard doesn't
specify it?

Tested under -m64. I'll do a full test before committing.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] match_results::format and regex_replace

2013-09-23 Thread Tim Shen
On Sun, Sep 22, 2013 at 4:20 PM, Paolo Carlini  wrote:
> If testing goes well patch is Ok to commit.

Tested under -m32 and -m64 and committed :)

I'll learn how locale in glibc works.

Thank you all!


-- 
Tim Shen


[Patch] Regex Fixes on constants and no-backref in extended

2013-09-26 Thread Tim Shen
Just follow the standard :)

Booted and tested under -m32, -m64.

Thank you!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Regex Fixes on constants and no-backref in extended

2013-09-26 Thread Tim Shen
On Thu, Sep 26, 2013 at 5:37 PM, Jonathan Wakely  wrote:
> The ChangeLog entry says "stanard", with that fixed this is OK to
> commit, thanks!

Committed.


-- 
Tim Shen


[Patch] Let ordinary escaping in POSIX regex be valid

2013-09-26 Thread Tim Shen
POSIX ERE says that escaping an ordinary char, say R"\n" is not
permitted, because 'n' is not a special char. However, they also say
that : "Implementations are permitted to extend the language to allow
these. Conforming applications cannot use such constructs."

So let's support it not to make users surprised.

Booted and tested under -m32 and -m64

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-27 Thread Tim Shen
On Fri, Sep 27, 2013 at 9:37 AM, Jonathan Wakely  wrote:
> Ah I see.  I definitely agree it's good to accept that instead of
> being unnecessarily strict, but other people will want the option of
> strict conformance, so I think we can please everyone with something
> like:
>
> else
>   {
> #ifdef __STRICT_ANSI__
> __throw_regex_error(regex_constants::error_escape);
> #else
>_M_token = _S_token_ord_char;
>_M_value.assign(1, __c);
> #endif
>   }

Sorry for late >.<

Do I need to bootstrap again?


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Let ordinary escaping in POSIX regex be valid

2013-09-28 Thread Tim Shen
On Fri, Sep 27, 2013 at 4:30 PM, Paolo Carlini  wrote:
> Nah, only double check that the testcase you are un-xfail-ing uses 
> -std=gnu++11, otherwise will not pass ;)

Committed :)

Thanks!


-- 
Tim Shen


[Patch] Fix interval quantifier in lookahead subexpr in regex

2013-09-30 Thread Tim Shen
Forget to let _M_clone and _M_eliminate_dummy follow
_S_opcode_subexpr_lookahead, which also have _M_alt.

Is it OK not to bootstrap it ? Tested under -m64 and -m32.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix interval quantifier in lookahead subexpr in regex

2013-10-01 Thread Tim Shen
On Tue, Oct 1, 2013 at 1:32 AM, Paolo Carlini  wrote:
> Ok, thanks.

Committed.

Thanks!


-- 
Tim Shen


Re: Announcing ?

2013-10-01 Thread Tim Shen
On Mon, Sep 30, 2013 at 2:25 PM, Paolo Carlini  wrote:
> 3- Tweak the implementation status page
> libstdc++/doc/xml/manual/status_cxx2011.xml (possibly clarify bits still
> missing too)

Here is the patch for this, along with some interface fixes.

Thank you!


-- 
Tim Shen


a.patch
Description: Binary data


Re: Announcing ?

2013-10-01 Thread Tim Shen
On Tue, Oct 1, 2013 at 6:08 PM, Paolo Carlini  wrote:
> What do you mean by "not well defined"? Non confoming? Why?

I don't know if the word 'define' is appropriate, maybe I should just
use 'implement'.

It's indeed the problem that needs extra support from ,
described by [28.7.7]. More detailed,
http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2003/n1429.htm
offered what the function means.


-- 
Tim Shen


Re: Announcing ?

2013-10-01 Thread Tim Shen
On Tue, Oct 1, 2013 at 6:41 PM, Paolo Carlini  wrote:
> Ok, thus just say that transform_primary is unimplented. Please also add a 
> comment inline in the code explaining the issue, but don't word it in term of 
> , because  is just what it is per C++11, and doesn't directly 
> provide what we need in the public interface (if I understand the issue): 
> just say that implementing the function appears to require libc support + a 
> link to the message you sent a couple of weeks ago to the libstdc++-v3 
> mailing list.
>
> Ok with the above changes.

Committed :)

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: Announcing ?

2013-10-01 Thread Tim Shen
On Tue, Oct 1, 2013 at 7:49 PM, Paolo Carlini  wrote:
> Thanks. But then the function actually *is* implemented, saying 'not
> implemented' is misleading. Please change the xml to something like 'isn't
> correctly implemented', or more informally, 'needs work', up to you, but
> please adjust it.

Committed.

I learn that an implementation could be incorrect ;)


-- 
Tim Shen


a.patch
Description: Binary data


[wwwdocs] Mention libstdc++-v3 in 4.9 changes.html

2013-10-01 Thread Tim Shen
Hi, libstdc++-v3  is ready for releasing.

Is it Ok to apply? By the way, do we need a News entry for this improvement?

Thanks!

Index: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.27
diff -r1.27 changes.html
136a137,140
>  href="http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011";>
>Improved experimental support for the new ISO C++ standard, C++11,
>including support for <regex>.
> 


-- 
Tim Shen


[Patch] Fix incorrect behavior of "[[=a=]]" in regex

2013-10-02 Thread Tim Shen
_BracketMatcher<>::_M_add_equivalence_class is misimplemented so I try
`git blame regex_compiler.h`...that's me!

Booted and tested under -m32, -m64.

Thanks ;)


-- 
Tim Shen


a.patch
Description: Binary data


Re: [wwwdocs] Mention libstdc++-v3 in 4.9 changes.html

2013-10-02 Thread Tim Shen
I feel little bit uncomfortable with "new ISO C++ standard, C++11",
since C++14 is already there, so I removed it.

Please check the words, since English is not my first language >.<

Thanks!


Index: htdocs/index.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/index.html,v
retrieving revision 1.891
diff -r1.891 index.html
55a56,59
> C++11 <regex> support
> [2013-10-02]
> Regular expression support in libstdc++-v3 is 
> now available.
>

Index: htdocs/gcc-4.9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.9/changes.html,v
retrieving revision 1.27
diff -r1.27 changes.html
136a137,139
>  href="http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011";>
>Improved support for C++11, including support for <regex>.
> 


-- 
Tim Shen


Re: [Patch] Fix incorrect behavior of "[[=a=]]" in regex

2013-10-02 Thread Tim Shen
Committed.

Thanks!

On Wed, Oct 2, 2013 at 11:10 AM, Jonathan Wakely  wrote:
> On 2 October 2013 15:26, Tim Shen wrote:
>> _BracketMatcher<>::_M_add_equivalence_class is misimplemented so I try
>> `git blame regex_compiler.h`...that's me!
>>
>> Booted and tested under -m32, -m64.
>
> This is OK to commit, thanks



-- 
Tim Shen


Re: [wwwdocs] Mention libstdc++-v3 in 4.9 changes.html

2013-10-03 Thread Tim Shen
Committed.

Thanks!

On Wed, Oct 2, 2013 at 11:12 AM, Jonathan Wakely  wrote:
> On 2 October 2013 15:52, Tim Shen wrote:
>> I feel little bit uncomfortable with "new ISO C++ standard, C++11",
>> since C++14 is already there, so I removed it.
>
> Good idea.
>
>> Please check the words, since English is not my first language >.<
>
> The english is fine, please wait a few hours in case anyone else has
> comments, but if you don't hear anything else you can go ahead and
> commit it.



-- 
Tim Shen


[Patch] Fix wrong backup variable initialization in regex

2013-10-03 Thread Tim Shen
Tested under -m32, -m64, however didn't do a  bootstrap. Is that OK?

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix wrong backup variable initialization in regex

2013-10-03 Thread Tim Shen
On Thu, Oct 3, 2013 at 4:39 PM, Paolo Carlini  wrote:
> Seems trivial enough, Ok.

Committed.

> PS: I suppose isn't easy to prepare a testcase which would exercise those
> lines?

Telling the truth, this bug *is* revealed by rerunning testcases with
the non-default DFS matcher. Since the other matcher(BFS) is the
default, it can be fully tested every time we run `make check`; this
time I force(by editting  code) the dispatcher to use DFS, and find
this mistake.

To test both of them, rewriting every testcase seems not a good idea.
So I edit dispatching code by hand and test the non-defuault matcher
every 2-3 commits. Do we have a better solution?


Thanks!


-- 
Tim Shen


Re: [Patch] Fix wrong backup variable initialization in regex

2013-10-03 Thread Tim Shen
On Thu, Oct 3, 2013 at 5:40 PM, Paolo Carlini  wrote:
> Before figuring out a fully general solution, I would anyway add a testcase
> which manually switches the executor and tests for the problem. More elegant
> would be a testcase which due to its nature automatically leads to an
> executor switch. Not sure if it's possible in this specific case.

I may find a possible workaround: when set _GLIBCXX_DEBUG, run both
matchers and check if they agree. Then `make check-debug` will test
them all. I really want a macro _GLIBCXX_TESTSUITE which is set in
`make check`.

> But in principle yes, since we deliver two, and the users have a way to choose
> which one they want (X)

Users can "choose" the matcher only by adding/removing brs to/from the
regex input string. We can easily convert every no-br regex to a br
one, by adding an empty capture and make a br to it(which won't change
the regex's meaning) to "choose" the non-default matcher. There's no
explicit switch exported to users, because it's designed to be
completely transparent to users.

> Well, unless we want to keep secret - that is, undocumented - the switching 
> facility,
> but I'm thinking that we should not.

Yes I think we should keep secret, because the standard doesn't
specify it. They only way to publish the switch to user is making a
library extension(is that true?), but there's no obvious benefit to do
that(is that true? I shall be humble).


-- 
Tim Shen


[Patch] Fix COMPILE error in regex and remove default parameter in function definition

2013-10-05 Thread Tim Shen
Stupid errors hidden in some large commit.

I don't think they will break boostrap?

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Internal functions for testsuite in regex

2013-10-06 Thread Tim Shen
On Sat, Oct 5, 2013 at 7:45 AM, Paolo Carlini  wrote:
> Patch looks great to me. If you don't get more comments over the next day or 
> so, please go ahead.

Committed :)


-- 
Tim Shen


Re: [Patch] Fix COMPILE error in regex and remove default parameter in function definition

2013-10-06 Thread Tim Shen
On Sun, Oct 6, 2013 at 5:38 AM, Paolo Carlini  wrote:
> Ok, thanks.

Committed :)


-- 
Tim Shen


[Patch] Optimize _BFSExecutor in regex

2013-10-06 Thread Tim Shen
Here's a simple piece of code
https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
inefficiency. Some optimizations are here to reduce the unecessary
time complexity from O(n^2) to O(n).

I'll do a bootstrap and full testing before committing.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
Sorry I forgot to reply the mailing list. Here's the discussion:

--- Tim


On Oct 7, 2013 7:12 AM, "Jonathan Wakely"  wrote:
> Does _TodoList really need to derive from std::vector<_StateIdT> or
> could it just have a second member? I think there's no benefit to
> using inheritance here.

I use private inheritance here to convey "it's just a wrapper class of
vector". I don't see great difference between it and using a second
member?

> _TodoList::__exists should be named _M_exists to follow the library
> coding standards.

Sorry for that >.<

> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?

We can add an assertion here, but _M_empty shouldn't set anything.
This function has no side effect.

> I think it's guaranteed that __exists[__u] is always in range, right?

Of course. Otherwise vector's range checker will help, right?

------- Jonathan


On 7 October 2013 14:31, Tim Shen wrote:
> On Oct 7, 2013 7:12 AM, "Jonathan Wakely"  wrote:
>> Does _TodoList really need to derive from std::vector<_StateIdT> or
>> could it just have a second member? I think there's no benefit to
>> using inheritance here.
>
> I use private inheritance here to convey "it's just a wrapper class of
> vector". I don't see great difference between it and using a second member?

Aggregation is simpler,

If I see subtyping I wonder if some users of the type need to rely on
conversion to the base class, i.e. whether it is important that
_TodoList IS-A vector, but as it's private and has no friends, that
can't be true.

In general, prefer composition to inheritance:
http://www.artima.com/cppsource/codestandards3.html

>> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?
>
> We can add an assertion here, but _M_empty shouldn't set anything. This
> function has no side effect.

Sorry, I meant _M_clear() not _M_empty().

Currently you loop over the vector popping one element at a time until
it's empty, why not just use _BaseT::clear()?  Does anything even use
that member?  If it is needed, why is it correct to not reset the
flags in __exists to false?

>> I think it's guaranteed that __exists[__u] is always in range, right?
>
> Of course. Otherwise vector's range checker will help, right?

Only in Debug Mode, there is no range checking in operator[] in the
normal case.  But as long as the vector has the right size (from
_M_nfa.size()) and can only be accessed with valid indices it's fine.

--- Tim


On Mon, Oct 7, 2013 at 10:56 AM, Jonathan Wakely  wrote:
> Aggregation is simpler,

OK I'll change it to use a private member, since indeed I don't need
to use protected part of vector.

> Currently you loop over the vector popping one element at a time until
> it's empty, why not just use _BaseT::clear()?  Does anything even use
> that member?  If it is needed, why is it correct to not reset the
> flags in __exists to false?

I use a loop to clear rather than a _BaseT::clear() then reset the
__exists(_M_exists), simply because it's too
expensive(O(__exists.size())). With my implementation, I can make it
O(this->size()), which is usually far smaller than size of __exists. A
better way to do that is to traverse *this and only reset all trues,
then _BaseT::clear().

Ah we can remove this member function, because it's used only in early
versions of this patch.

--- Jonathan


On 7 October 2013 16:34, Tim Shen wrote:
> On Mon, Oct 7, 2013 at 10:56 AM, Jonathan Wakely  
> wrote:
>> Aggregation is simpler,
>
> OK I'll change it to use a private member, since indeed I don't need
> to use protected part of vector.
>
>> Currently you loop over the vector popping one element at a time until
>> it's empty, why not just use _BaseT::clear()?  Does anything even use
>> that member?  If it is needed, why is it correct to not reset the
>> flags in __exists to false?
>
> I use a loop to clear rather than a _BaseT::clear() then reset the
> __exists(_M_exists),

Where do you reset __exists? That was my point, I don't see that being
reset, but it should have been.

> simply because it's too
> expensive(O(__exists.size())). With my implementation, I can make it
> O(this->size()), which is usually far smaller than size of __exists. A
> better way to do that is to traverse *this and only reset all trues,
> then _BaseT::clear().

Traversing a vector is not

Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 12:22 PM, Jonathan Wakely  wrote:
> because that turns into the equivalent of a std::memset() on integers.

Here I catch your idea. But think about the following example:
_M_exists.size() == 1000, but only 3 of the elements are true. Now
what I intend to do is assigning these 3 elements to false, rather
than reseting the whole vector. Is this called "pay for what you get"?

The ideal solution could be:

void _M_clear() {
if (__exists.size() < _BaseT::size() * CPU_WORD_LENGTH * some_factor)
clear_one_by_one();
else
reset_them_all();
}


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 7:12 AM, Jonathan Wakely  wrote:
> On 6 October 2013 23:43, Tim Shen wrote:
>> Here's a simple piece of code
>> https://gist.github.com/innocentim/6849759 the reveals _BFSExecutor's
>> inefficiency. Some optimizations are here to reduce the unecessary
>> time complexity from O(n^2) to O(n).
>>
>> I'll do a bootstrap and full testing before committing.
>
> In the ChangeLog, please fix the spelling of "unnecessary".
>
> The constructor of _TodoList could be "explicit"
>
> Does _TodoList really need to derive from std::vector<_StateIdT> or
> could it just have a second member? I think there's no benefit to
> using inheritance here.
>
> _TodoList::__exists should be named _M_exists to follow the library
> coding standards.
>
> Shouldn't _TodoList::_M_empty() also set __exists[n] = false for all n?
>
> I think it's guaranteed that __exists[__u] is always in range, right?

Anyway, here's the patch. I intended to write something like a
Quine[1] as the testcase but gave up ;)

Thank you!

[1] http://en.wikipedia.org/wiki/Quine_(computing)

-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:28 PM, Jonathan Wakely  wrote:
> std::memset() on about 125 bytes is not a big deal, I doubt it's
> significantly more expensive than the calculations to find the right
> bits and do the necessary masking for three elements.
> std::vector is a strange beast, remember.

OK so I may change to use something else(bitset?) next time I encounter this.

Is copying 125 bytes more expensive comparing to 3 assignment and
substrction? Because it *is* cheap to find all elements that's true,
they are just in _M_base(in the latest patch). So:

while (!this->empty())
this->pop();

...will be executed only 3 times. I think it's cheaper than copying a
long-enough and sparse-enough byte chunk? Or at least we should theck
the density(_M_base.size() / _M_exists.size()) of _M_exists ?


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:40 PM, Tim Shen  wrote:
> while (!this->empty())
> this->pop();

Sorry it's this->_M_empty() and this->_M_pop();


-- 
Tim Shen


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 1:28 PM, Jonathan Wakely  wrote:
> std::memset() on about 125 bytes is not a big deal, I doubt it's
> significantly more expensive than the calculations to find the right
> bits and do the necessary masking for three elements.
> std::vector is a strange beast, remember.

Probably there could be bitwise operation?

I've test the 3 trues out of 1000 elements here :
https://gist.github.com/innocentim/6874889
In my machine, choosing "v.assign(v.size(), false);" cost 1.768s and
choosing the other cost 1.088s. When we push 7 elements, they cost
almost same time. When push more than that, v.assign() shall win.

I do this because I need to use _M_clear() in this patch :)

Anyway let's use v.assign().

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 7:11 PM, Jonathan Wakely  wrote:
> The version in your gist (which is *not* what your first patch did!)
> is faster for any size of _M_exists and any ratio of _M_states.size()
> / _M_exists.size():

Sorry for my original patch, I made a mistake! It's:
  while (!_BaseT::empty())
_BaseT::pop_back();
but I truely meant:
  while (!this->_M_empty())
this->_M_pop();
To make all trues reset.

In my machine the while loop is slower than assign() when the
_M_states.size() / _M_exists.size() is more than even 7/1000. so let's
keep the assign(), because the ratio in regex could probably be more
than 25%.


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Optimize _BFSExecutor in regex

2013-10-07 Thread Tim Shen
On Mon, Oct 7, 2013 at 8:43 PM, Jonathan Wakely  wrote:
> OK, this latest patch looks good so please go ahead and commit it - thanks!

Committed :)


-- 
Tim Shen


[Patch] Fix PR58737

2013-10-15 Thread Tim Shen
This memory leak is because forgetting virtual destructor of the base
class _Executor.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix PR58737

2013-10-15 Thread Tim Shen
On Tue, Oct 15, 2013 at 11:01 AM, Jonathan Wakely  wrote:
> Great, if it passes the testsuite please commit it.

It surely passed -m32 and -m64, and committed :)


-- 
Tim Shen


[Patch] Fix undefined behaviors in regex

2013-10-16 Thread Tim Shen
It's detected by `clang++ -g -std=c++11 -fsanitize=undefined` but not
detected by g++ with flags `g++ -g -std=c++11 -fsanitize=undefined
-lpthread`. Am I on the right way?

-m32 and -m64 tested :)

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix undefined behaviors in regex

2013-10-16 Thread Tim Shen
On Wed, Oct 16, 2013 at 6:13 PM, Paolo Carlini  wrote:
> Thank you!

Committed.

Thanks!


-- 
Tim Shen


Re: [Patch] Fix undefined behaviors in regex

2013-10-16 Thread Tim Shen
On Wed, Oct 16, 2013 at 6:33 PM, Paolo Carlini  wrote:
> By the way, please feel free to prepare a reduced testcase for the
> -fsanitize people (Marek, Jakub?)

Here it is.

And we should get undefined behaviors before this commit(r203732).


-- 
Tim Shen


split.cc
Description: Binary data


Re: [Patch] Fix undefined behaviors in regex

2013-10-16 Thread Tim Shen
On Wed, Oct 16, 2013 at 6:42 PM, Paolo Carlini  wrote:
> On 10/17/2013 12:39 AM, Tim Shen wrote:
>>
>> On Wed, Oct 16, 2013 at 6:33 PM, Paolo Carlini 
>> wrote:
>>>
>>> By the way, please feel free to prepare a reduced testcase for the
>>> -fsanitize people (Marek, Jakub?)
>>
>> Here it is.
>>
>> And we should get undefined behaviors before this commit(r203732).
>
> To be honest, I was thinking something much smaller than the whole 
> ;) But let's add Marek in CC.

int work() {
}

int main() {
int a = work();
return a;
}

/* This is a smaller case to test the sanitizer. It seems that the
undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`,
is that too old? */


-- 
Tim Shen


[Patch] Fix warnings in regex

2013-10-17 Thread Tim Shen
...to prevent potential bugs.

By the way, I execute `g++ -E file.cc > prep.cc && sed -i 's|^#.*$||'
prep.cc && g++ -Wall prep.cc 2> log` to get all warnings. Do we have a
better way?

Bootstraps and tested under -m64 and -m32.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix undefined behaviors in regex

2013-10-17 Thread Tim Shen
On Thu, Oct 17, 2013 at 5:03 AM, Paolo Carlini  wrote:
>> Though, in the above case, the question is why people ignore warnings from
>> the compiler and need to have special runtime instrumentation to remind them
>> instead. I'm not objecting to that sanitization, only find it weird.

That's my fault, I forgot that compiling code included from libstdc++
doesn't generate warnings. Then I try to get all warnings, and indeed
get the missing-return-statement one.

Thanks!


-- 
Tim Shen


Re: [Patch] Fix warnings in regex

2013-10-17 Thread Tim Shen
On Thu, Oct 17, 2013 at 3:58 PM, Paolo Carlini  wrote:
> Please explain in the ChangeLog *which* warnings.

Committed with attached patch.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


[Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-17 Thread Tim Shen
The bug is because naively calling `map::count(__c)`
where __c could be a wchar_t, and an implicit cast(a truncate?)
happened.

-m32 and -m64 tested.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-17 Thread Tim Shen
On Thu, Oct 17, 2013 at 8:46 PM, Paolo Carlini  wrote:
>> +  const wchar_t * s = L"ä½ å¥½, 世+界";
>> +  wregex re(s);
>> +  VERIFY(regex_match_debug(L"ä½ å¥½, 世世世界", re));

Oops that's only perfect Chinese version of "hello, world" under UTF-8
charset. Now it's hex format.

I also make the char handling simpler.


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Fix wide char locale support(like CJK charset) in regex

2013-10-18 Thread Tim Shen
On Fri, Oct 18, 2013 at 5:39 AM, Paolo Carlini  wrote:
>>Oops that's only perfect Chinese version of "hello, world" under UTF-8
>>charset. Now it's hex format.
>>
>>I also make the char handling simpler.
>
> Great, thanks!

-m32 and -m64 tested and committed.

By the way, UTF-8 *encoding* is a more accurate word, instead of
`charset`(which is Unicode).


-- 
Tim Shen


[Patch] Change default executor to DFS in regex

2013-10-18 Thread Tim Shen
As the comment in this patch said, DFS approach is faster in simple
regex, but exponentially slower in complex(many quantifier) cases.

DFA optimization could be added, but I'm afraid it needs rewriting
regex_automaton.*.

Tested under -m32 and -m64.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Change default executor to DFS in regex

2013-10-18 Thread Tim Shen
On Fri, Oct 18, 2013 at 9:13 PM, Tim Shen  wrote:
> As the comment in this patch said, DFS approach is faster in simple
> regex, but exponentially slower in complex(many quantifier) cases.

Actually I suggest to use DFS where number of quantifiers < 2, to make
this 'optimization' more conservative. Now `split regex`, say "\s+"
can be optimized.


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Change default executor to DFS in regex

2013-10-19 Thread Tim Shen
On Sat, Oct 19, 2013 at 4:03 AM, Paolo Carlini  wrote:
> About the < 2, in general hardcoding a parameter value in the code isn't a
> nice idea. Why don't we take it out to a macro, say
> _GLIBCXX_REGEX_NFA_QUANTIFIERS_LIMIT? In stl_deque.h we have something
> similar and in the present case it would be even safe from the ABI point of
> view, if I'm not mistaken.

I here use a const static local variable to hide it from other parts.

> Finally, you mentioned the DFA optimization: I would like to see some
> details about it. Would it be a big win in all cases? Before we freeze the
> next ABI we should keep open all the possibilities...

It'll be great help to implement partial DFA compilation for all
cases, though DFA cannot handle back-references neither. I need more
time to investigate.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Change default executor to DFS in regex

2013-10-19 Thread Tim Shen
On Sat, Oct 19, 2013 at 3:12 PM, Paolo Carlini  wrote:
> Yes, but giving it a name doesn't buy us much wrt the issue I pointed out.
> For comparison, in similar cases, the compiler driver has --params which the
> user can fine tune on the command line. The best approximation we have got
> in the library - for the time being at least, in principle the driver could
> also forward parameters to the library - is a macro, which is defined and
> documented in a comment a few lines earlier and then is possibly used to
> initialize a const static. Again, see the stl_deque.h example. I can't
> imagine any other simple (and conforming! eg, no additional template parms)
> solution.

I see. Here's the macro version.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


Re: [Patch] Change default executor to DFS in regex

2013-10-20 Thread Tim Shen
On Sun, Oct 20, 2013 at 4:42 AM, Paolo Carlini  wrote:
> Ok with those changes.

Committed.

Thanks!


-- 
Tim Shen


a.patch
Description: Binary data


[Patch] Regex comments

2013-10-27 Thread Tim Shen
Add comments for last regex commit.

Thanks!


-- 
Regards,
Tim Shen
commit b4aa16a08992866ca6fd60f40e422f551d0d6b2a
Author: tim 
Date:   Sun Oct 27 15:50:44 2013 -0400

2013-

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index d3b9a04..2f677de 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -53,6 +53,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
+  // This function serves in different modes, DFS mode or BFS mode, indicated
+  // by template variable __dfs_mode. See _M_main for details.
+  //
+  // 
+  //
+  // DFS mode:
+  //
+  // It applys a Depth-First-Search (aka backtracking) on given NFA and input
+  // string.
+  // At the very beginning the executor stands in the start state, then it try
+  // every possible state transition in current state recursively. Some state
+  // transitions consume input string, say, a single-char-matcher or a
+  // back-reference matcher; some don't, like assertion or other anchor nodes.
+  // When the input is exhausted and/or the current state is an accepting state,
+  // the whole executor returns true.
+  //
+  // TODO: This approach is exponentially slow for certain input.
+  //   Try to compile the NFA to a DFA.
+  //
+  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
+  // Space complexity: \theta(match_results.size() + match_length)
+  //
+  // 
+  //
+  // BFS mode:
+  //
+  // Russ Cox's article (http://swtch.com/~rsc/regexp/regexp1.html)
+  // explained this algorithm clearly.
+  //
+  // It first computes epsilon clousure for every state that's still matching,
+  // using the same DFS algorithm, but doesn't reenter states (set true in
+  // _M_visited), nor follow _S_opcode_match.
+  //
+  // Then apply DFS to every _S_opcode_match one by one (in _M_match_queue).
+  //
+  // The order of which states needs to be recursively applied DFS matters,
+  // depend on which greedy mode we use. See _S_opcode_alternative branch in
+  // _M_dfs.
+  //
+  // It significantly reduces potential duplicate states, so have a better
+  // upper bound; but it deserves more overhead.
+  //
+  // Time complexity: o(match_length * match_results.size())
+  //  O(match_length * _M_nfa.size() * match_results.size())
+  // Space complexity: o(_M_nfa.size() + match_results.size())
+  //   O(_M_nfa.size() * match_results.size())
   template
   template
@@ -68,18 +114,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
   else
 	{
-	  // Like the DFS approach, it try every possible state transition;
-	  // Unlike DFS, it uses a queue instead of a stack to store matching
-	  // states. It's a BFS approach.
-	  //
-	  // Russ Cox's article(http://swtch.com/~rsc/regexp/regexp1.html)
-	  // explained this algorithm clearly.
-	  //
-	  // Time complexity: o(match_length * match_results.size())
-	  //  O(match_length * _M_nfa.size()
-	  //* match_results.size())
-	  // Space complexity: o(_M_nfa.size() + match_results.size())
-	  //   O(_M_nfa.size() * match_results.size())
 	  _M_match_queue->push(make_pair(_M_start_state, _M_results));
 	  bool __ret = false;
 	  while (1)
@@ -132,20 +166,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
-  // A _DFSExecutor perform a DFS on given NFA and input string. At the very
-  // beginning the executor stands in the start state, then it try every
-  // possible state transition in current state recursively. Some state
-  // transitions consume input string, say, a single-char-matcher or a
-  // back-reference matcher; some not, like assertion or other anchor nodes.
-  // When the input is exhausted and the current state is an accepting state,
-  // the whole executor return true.
-  //
-  // TODO: This approach is exponentially slow for certain input.
-  //   Try to compile the NFA to a DFA.
-  //
-  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
-  // Space complexity: \theta(match_results.size() + match_length)
-  //
   template
   template
@@ -164,25 +184,39 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	{
 	case _S_opcode_alternative:
 	  // Greedy or not, this is a question ;)
+	  //
+	  // _M_alt branch is "match once more", while _M_next is "get me out
+	  // of this quantifier".
+
+	  // Greedy.
 	  if (!__state._M_neg)
 	{
+	  // Once more.
 	  _M_dfs<__match_mode>(__state._M_alt);
+	  // If it's DFS executor and already accepted, we're done.
 	  if (!__dfs_mode || !_M_has_sol)
 		_M_dfs<__match_mode>(__state._M_next);
 	}
-	  else
+	  else // Ungreedy mode
 	{
 	  if (__dfs_mode)
 		{
+		  // vice-versa.
 		  _M_dfs<__match_mode>(_

Re: [Patch] Regex comments

2013-10-27 Thread Tim Shen
On Sun, Oct 27, 2013 at 4:20 PM, Paolo Carlini  wrote:
> Should we move this comment too before the 'case', and aligned with it?

Actually this comment is for the if statement below, not the whole
case branch. I've made this clearer by moving "Every change..." part
to the switch statement, where it should stay.


-- 
Regards,
Tim Shen
commit 1f13761ba34b13a52b4f41d2ebdeba1114205bac
Author: tim 
Date:   Sun Oct 27 15:50:44 2013 -0400

2013-10-27  Tim Shen  

	* regex_executor.tcc: Add comments.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index d3b9a04..c4ce362 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -53,6 +53,52 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
+  // This function serves in different modes, DFS mode or BFS mode, indicated
+  // by template variable __dfs_mode. See _M_main for details.
+  //
+  // 
+  //
+  // DFS mode:
+  //
+  // It applys a Depth-First-Search (aka backtracking) on given NFA and input
+  // string.
+  // At the very beginning the executor stands in the start state, then it try
+  // every possible state transition in current state recursively. Some state
+  // transitions consume input string, say, a single-char-matcher or a
+  // back-reference matcher; some don't, like assertion or other anchor nodes.
+  // When the input is exhausted and/or the current state is an accepting state,
+  // the whole executor returns true.
+  //
+  // TODO: This approach is exponentially slow for certain input.
+  //   Try to compile the NFA to a DFA.
+  //
+  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
+  // Space complexity: \theta(match_results.size() + match_length)
+  //
+  // 
+  //
+  // BFS mode:
+  //
+  // Russ Cox's article (http://swtch.com/~rsc/regexp/regexp1.html)
+  // explained this algorithm clearly.
+  //
+  // It first computes epsilon clousure for every state that's still matching,
+  // using the same DFS algorithm, but doesn't reenter states (set true in
+  // _M_visited), nor follow _S_opcode_match.
+  //
+  // Then apply DFS to every _S_opcode_match one by one (in _M_match_queue).
+  //
+  // The order of which states needs to be recursively applied DFS matters,
+  // depend on which greedy mode we use. See _S_opcode_alternative branch in
+  // _M_dfs.
+  //
+  // It significantly reduces potential duplicate states, so have a better
+  // upper bound; but it deserves more overhead.
+  //
+  // Time complexity: o(match_length * match_results.size())
+  //  O(match_length * _M_nfa.size() * match_results.size())
+  // Space complexity: o(_M_nfa.size() + match_results.size())
+  //   O(_M_nfa.size() * match_results.size())
   template
   template
@@ -68,18 +114,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
   else
 	{
-	  // Like the DFS approach, it try every possible state transition;
-	  // Unlike DFS, it uses a queue instead of a stack to store matching
-	  // states. It's a BFS approach.
-	  //
-	  // Russ Cox's article(http://swtch.com/~rsc/regexp/regexp1.html)
-	  // explained this algorithm clearly.
-	  //
-	  // Time complexity: o(match_length * match_results.size())
-	  //  O(match_length * _M_nfa.size()
-	  //* match_results.size())
-	  // Space complexity: o(_M_nfa.size() + match_results.size())
-	  //   O(_M_nfa.size() * match_results.size())
 	  _M_match_queue->push(make_pair(_M_start_state, _M_results));
 	  bool __ret = false;
 	  while (1)
@@ -132,20 +166,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
-  // A _DFSExecutor perform a DFS on given NFA and input string. At the very
-  // beginning the executor stands in the start state, then it try every
-  // possible state transition in current state recursively. Some state
-  // transitions consume input string, say, a single-char-matcher or a
-  // back-reference matcher; some not, like assertion or other anchor nodes.
-  // When the input is exhausted and the current state is an accepting state,
-  // the whole executor return true.
-  //
-  // TODO: This approach is exponentially slow for certain input.
-  //   Try to compile the NFA to a DFA.
-  //
-  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
-  // Space complexity: \theta(match_results.size() + match_length)
-  //
   template
   template
@@ -160,29 +180,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   const auto& __state = _M_nfa[__i];
+  // Every change on _M_cur_results and _M_current will be rolled back after
+  // finishing the recursion step.
   switch (__state._M_opcode)
 	{
 	case _S_opcode_alternative:
 	  // Greedy or not, this is a question ;

Re: [Patch] Regex comments

2013-10-27 Thread Tim Shen
Thank you for figuring out so many syntax errors, I'll be careful next time.

On Sun, Oct 27, 2013 at 5:38 PM, Jonathan Wakely  wrote:
> +  // The order of which states needs to be recursively applied DFS matters,
> +  // depend on which greedy mode we use.
>
> I don't understand this sentence at all, sorry.  Can you explain it in
> other terms, and I'll try to suggest better phrasing?

+// _M_alt branch is "match once more", while _M_next is "get me out
+// of this quantifier". Executing _M_next first or _M_alt first don't
+// mean the same thing, and we need to choose the correct order under
+// given greedy mode.


-- 
Regards,
Tim Shen
commit b195603ee8d79d2afd5303bcae363c47e4c89fab
Author: tim 
Date:   Sun Oct 27 15:50:44 2013 -0400

2013-10-28  Tim Shen  

	* regex_executor.tcc: Add comments.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index d3b9a04..0c42189 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -53,6 +53,49 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
+  // This function operates in different modes, DFS mode or BFS mode, indicated
+  // by template parameter __dfs_mode. See _M_main for details.
+  //
+  // 
+  //
+  // DFS mode:
+  //
+  // It applies a Depth-First-Search (aka backtracking) on given NFA and input
+  // string.
+  // At the very beginning the executor stands in the start state, then it tries
+  // every possible state transition in current state recursively. Some state
+  // transitions consume input string, say, a single-char-matcher or a
+  // back-reference matcher; some don't, like assertion or other anchor nodes.
+  // When the input is exhausted and/or the current state is an accepting state,
+  // the whole executor returns true.
+  //
+  // TODO: This approach is exponentially slow for certain input.
+  //   Try to compile the NFA to a DFA.
+  //
+  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
+  // Space complexity: \theta(match_results.size() + match_length)
+  //
+  // 
+  //
+  // BFS mode:
+  //
+  // Russ Cox's article (http://swtch.com/~rsc/regexp/regexp1.html)
+  // explained this algorithm clearly.
+  //
+  // It first computes epsilon closure for every state that's still matching,
+  // using the same DFS algorithm, but doesn't reenter states (set true in
+  // _M_visited), nor follows _S_opcode_match.
+  //
+  // Then apply DFS using every _S_opcode_match (in _M_match_queue) as the start
+  // state.
+  //
+  // It significantly reduces potential duplicate states, so has a better
+  // upper bound; but it requires more overhead.
+  //
+  // Time complexity: o(match_length * match_results.size())
+  //  O(match_length * _M_nfa.size() * match_results.size())
+  // Space complexity: o(_M_nfa.size() + match_results.size())
+  //   O(_M_nfa.size() * match_results.size())
   template
   template
@@ -68,18 +111,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
   else
 	{
-	  // Like the DFS approach, it try every possible state transition;
-	  // Unlike DFS, it uses a queue instead of a stack to store matching
-	  // states. It's a BFS approach.
-	  //
-	  // Russ Cox's article(http://swtch.com/~rsc/regexp/regexp1.html)
-	  // explained this algorithm clearly.
-	  //
-	  // Time complexity: o(match_length * match_results.size())
-	  //  O(match_length * _M_nfa.size()
-	  //* match_results.size())
-	  // Space complexity: o(_M_nfa.size() + match_results.size())
-	  //   O(_M_nfa.size() * match_results.size())
 	  _M_match_queue->push(make_pair(_M_start_state, _M_results));
 	  bool __ret = false;
 	  while (1)
@@ -132,20 +163,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
-  // A _DFSExecutor perform a DFS on given NFA and input string. At the very
-  // beginning the executor stands in the start state, then it try every
-  // possible state transition in current state recursively. Some state
-  // transitions consume input string, say, a single-char-matcher or a
-  // back-reference matcher; some not, like assertion or other anchor nodes.
-  // When the input is exhausted and the current state is an accepting state,
-  // the whole executor return true.
-  //
-  // TODO: This approach is exponentially slow for certain input.
-  //   Try to compile the NFA to a DFA.
-  //
-  // Time complexity: o(match_length), O(2^(_M_nfa.size()))
-  // Space complexity: \theta(match_results.size() + match_length)
-  //
   template
   template
@@ -160,29 +177,44 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	}
 
   const auto& __state = _M_nfa[__i];
+  // Every change on _M

Re: [Patch] Regex comments

2013-10-27 Thread Tim Shen
On Sun, Oct 27, 2013 at 9:26 PM, Jonathan Wakely  wrote:
> I'm happy for that latest patch to be committed, thanks for taking the
> time to improve the comments.

Committed.

Thanks!


-- 
Regards,
Tim Shen


[Patch, regex, libstdc++/69794] Unify special character parsing

2016-02-13 Thread Tim Shen
I did it wrong in r227289 - I ignored the "\n" special case in grep.
Turns out using code to handle special cases is error prone, so I
turned to use data (_M_grep_spec_char and _M_egrep_spec_char).

Bootstrapped and tested on x86_64-pc-linux-gnu.

Thanks!

-- 
Regards,
Tim Shen
commit 03e651ef56e516f1bc7b0d041d93ef657af54791
Author: Tim Shen 
Date:   Sat Feb 13 10:55:38 2016 -0800

PR libstdc++/69794
* include/bits/regex_scanner.h: Add different special character
sets for grep and egrep regex.
* include/bits/regex_scanner.tcc: Use _M_spec_char more unifiedly.
* testsuite/28_regex/regression.cc: Add new testcase.

diff --git a/libstdc++-v3/include/bits/regex_scanner.h 
b/libstdc++-v3/include/bits/regex_scanner.h
index bff7366..16071da 100644
--- a/libstdc++-v3/include/bits/regex_scanner.h
+++ b/libstdc++-v3/include/bits/regex_scanner.h
@@ -95,11 +95,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  : _M_awk_escape_tbl),
 _M_spec_char(_M_is_ecma()
 ? _M_ecma_spec_char
-: _M_is_basic()
+: _M_flags & regex_constants::basic
 ? _M_basic_spec_char
-: _M_extended_spec_char),
+: _M_flags & regex_constants::extended
+? _M_extended_spec_char
+: _M_flags & regex_constants::grep
+? _M_grep_spec_char
+: _M_flags & regex_constants::egrep
+? _M_egrep_spec_char
+: _M_flags & regex_constants::awk
+? _M_extended_spec_char
+: nullptr),
 _M_at_bracket_start(false)
-{ }
+{ __glibcxx_assert(_M_spec_char); }
 
   protected:
 const char*
@@ -177,6 +185,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 const char* _M_ecma_spec_char = "^$\\.*+?()[]{}|";
 const char* _M_basic_spec_char = ".[\\*^$";
 const char* _M_extended_spec_char = ".[\\()*+?{|^$";
+const char* _M_grep_spec_char = ".[\\*^$\n";
+const char* _M_egrep_spec_char = ".[\\()*+?{|^$\n";
 
 _StateT   _M_state;
 _FlagT_M_flags;
diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index 920cb14..fedba09 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -97,9 +97,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_scan_normal()
 {
   auto __c = *_M_current++;
-  const char* __pos;
 
-  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr)
+  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, ' ')) == nullptr)
{
  _M_token = _S_token_ord_char;
  _M_value.assign(1, __c);
@@ -177,12 +176,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_state = _S_state_in_brace;
  _M_token = _S_token_interval_begin;
}
-  else if (((__pos = std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')))
- != nullptr
-   && *__pos != '\0'
-   && __c != ']'
-   && __c != '}')
-  || (_M_is_grep() && __c == '\n'))
+  else if (__c != ']' && __c != '}')
{
  auto __it = _M_token_tbl;
  auto __narrowc = _M_ctype.narrow(__c, '\0');
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
index f95bef9..c9a3402 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -33,10 +33,26 @@ test01()
   regex re("((.)", regex_constants::basic);
 }
 
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  std::string re_str
+{
+  "/abcd" "\n"
+  "/aecf" "\n"
+  "/ghci"
+};
+  auto rx = std::regex(re_str, std::regex_constants::grep | 
std::regex_constants::icase);
+  VERIFY(std::regex_search("/abcd", rx));
+}
+
 int
 main()
 {
   test01();
+  test02();
   return 0;
 }
 


Re: [Patch, regex, libstdc++/69794] Unify special character parsing

2016-02-15 Thread Tim Shen
On Mon, Feb 15, 2016 at 4:26 AM, Jonathan Wakely  wrote:
> Those new members change the size of the type, so are an ABI change.
>
> Couldn't they be static members?

Ahh right. Since they are just used for once, use them in the line.


-- 
Regards,
Tim Shen
commit 4db39da3091a33e4125ffd8b55da37859277d0d2
Author: Tim Shen 
Date:   Sat Feb 13 10:55:38 2016 -0800

PR libstdc++/69794
* include/bits/regex_scanner.h: Add different special character
sets for grep and egrep regex.
* include/bits/regex_scanner.tcc: Use _M_spec_char more unifiedly.
* testsuite/28_regex/regression.cc: Add new testcase.

diff --git a/libstdc++-v3/include/bits/regex_scanner.h 
b/libstdc++-v3/include/bits/regex_scanner.h
index bff7366..37dea84 100644
--- a/libstdc++-v3/include/bits/regex_scanner.h
+++ b/libstdc++-v3/include/bits/regex_scanner.h
@@ -95,11 +95,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  : _M_awk_escape_tbl),
 _M_spec_char(_M_is_ecma()
 ? _M_ecma_spec_char
-: _M_is_basic()
+: _M_flags & regex_constants::basic
 ? _M_basic_spec_char
-: _M_extended_spec_char),
+: _M_flags & regex_constants::extended
+? _M_extended_spec_char
+: _M_flags & regex_constants::grep
+?  ".[\\*^$\n"
+: _M_flags & regex_constants::egrep
+? ".[\\()*+?{|^$\n"
+: _M_flags & regex_constants::awk
+? _M_extended_spec_char
+: nullptr),
 _M_at_bracket_start(false)
-{ }
+{ __glibcxx_assert(_M_spec_char); }
 
   protected:
 const char*
@@ -137,6 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return _M_flags & regex_constants::awk; }
 
   protected:
+// TODO: Make them static in the next abi change.
 const std::pair _M_token_tbl[9] =
   {
{'^', _S_token_line_begin},
diff --git a/libstdc++-v3/include/bits/regex_scanner.tcc 
b/libstdc++-v3/include/bits/regex_scanner.tcc
index 920cb14..fedba09 100644
--- a/libstdc++-v3/include/bits/regex_scanner.tcc
+++ b/libstdc++-v3/include/bits/regex_scanner.tcc
@@ -97,9 +97,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _M_scan_normal()
 {
   auto __c = *_M_current++;
-  const char* __pos;
 
-  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')) == nullptr)
+  if (std::strchr(_M_spec_char, _M_ctype.narrow(__c, ' ')) == nullptr)
{
  _M_token = _S_token_ord_char;
  _M_value.assign(1, __c);
@@ -177,12 +176,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  _M_state = _S_state_in_brace;
  _M_token = _S_token_interval_begin;
}
-  else if (((__pos = std::strchr(_M_spec_char, _M_ctype.narrow(__c, '\0')))
- != nullptr
-   && *__pos != '\0'
-   && __c != ']'
-   && __c != '}')
-  || (_M_is_grep() && __c == '\n'))
+  else if (__c != ']' && __c != '}')
{
  auto __it = _M_token_tbl;
  auto __narrowc = _M_ctype.narrow(__c, '\0');
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc 
b/libstdc++-v3/testsuite/28_regex/regression.cc
index f95bef9..c9a3402 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -33,10 +33,26 @@ test01()
   regex re("((.)", regex_constants::basic);
 }
 
+void
+test02()
+{
+  bool test __attribute__((unused)) = true;
+
+  std::string re_str
+{
+  "/abcd" "\n"
+  "/aecf" "\n"
+  "/ghci"
+};
+  auto rx = std::regex(re_str, std::regex_constants::grep | 
std::regex_constants::icase);
+  VERIFY(std::regex_search("/abcd", rx));
+}
+
 int
 main()
 {
   test01();
+  test02();
   return 0;
 }
 


[Patch] Add comments for future regex work

2013-12-02 Thread Tim Shen
...for optimization purpose. Should be done in one month.

Thanks!


-- 
Regards,
Tim Shen
commit cc7d58128e68455498d0257c4796cb70a9e24990
Author: tim 
Date:   Mon Dec 2 15:49:15 2013 -0500

2013-12-02  Tim Shen  

	* regex_compiler.h: Add todo comment.
	* regex_executor.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_compiler.h b/libstdc++-v3/include/bits/regex_compiler.h
index b9f8127..5759d48 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -237,6 +237,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 };
 
   /// Matches a character range (bracket expression)
+  // TODO: Convert used _M_flags fields to template parameters, including
+  // collate and icase. Avoid using std::set, could use flat_set
+  // (sorted vector and binary search) instead; use an fixed sized (256)
+  // vector for char specialization if necessary.
   template
 struct _BracketMatcher
 {
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 22fd67c..150adb4 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -162,6 +162,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return false;
 }
 
+  // TODO: Use a function vector to dispatch, instead of using switch-case.
   template
   template


Re: [Patch] Add comments for future regex work

2013-12-03 Thread Tim Shen
On Tue, Dec 3, 2013 at 4:47 AM, Paolo Carlini  wrote:
> Great. Consider that we are now in Stage 3, until mid of February.
> Considering that  is a new feature, I think we have some leeway in
> terms of timing, but definitely we can't commit invasive changes not fixing
> regressions while in Stage 4.

...committed.


-- 
Regards,
Tim Shen


[Patch] Regex bracket matcher cache optimization

2014-01-03 Thread Tim Shen
The data structure _BracketMatcher (storing regex like [123a-z]) could
be quite slow mainly because of regex_traits. So a result of cache for
small range of inputs (char) sounds reasonable. It iterates all 256
inputs and calculate them at regex compile time.

Booted and tested under -m64 and -m32.

Thanks!


-- 
Regards,
Tim Shen
commit de1e43989be9c9dffebf488e118549cfe2987b9e
Author: tim 
Date:   Fri Jan 3 11:43:12 2014 -0500

2014-01-03  Tim Shen  

	* include/bits/regex_compiler.h (_AnyMatcher<>::_AnyMatcher(),
	_AnyMatcher<>::operator(), _AnyMatcher<>::_M_apply(),
	_CharMatcher<>::_CharMatcher(), _CharMatcher<>::_M_translate(),
	_BracketMatcher<>::_BracketMatcher(), _BracketMatcher<>::operator(),
	_BracketMatcher<>::_M_add_char(),
	_BracketMatcher<>::_M_add_collating_element(),
	_BracketMatcher<>::_M_add_equivalence_class(),
	_BracketMatcher<>::_M_add_character_class(),
	_BracketMatcher<>::_M_make_range(), _BracketMatcher<>::_M_ready(),
	_BracketMatcher<>::_M_apply(), _BracketMatcher<>::_M_make_cache()):
	Fix _AnyMatcher behavior of POSIX style and move _M_flags
	to template parameter; Add cache for _BracketMatcher. Adjust
	declarations from here...
	* include/bits/regex.h (basic_regex<>::imbue()): ...to here. Also,
	imbuing a regex will trigger a recompilation to rebuild the cache.
	* include/bits/regex_compiler.tcc (_Compiler<>::_M_atom(),
	_Compiler<>::_M_bracket_expression()): Adjust matchers' caller for
	different template bool parameters.
	* include/bits/regex_executor.h: Remove unnecessary declarations.
	* include/std/regex: Adjust including orders.
	* testsuite/28_regex/traits/char/user_defined.cc: New.
	* testsuite/28_regex/traits/wchar_t/user_defined.cc: New.

diff --git a/libstdc++-v3/include/bits/regex.h b/libstdc++-v3/include/bits/regex.h
index 4e091e0..ae8e1f5 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -30,6 +30,15 @@
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+  template
+class basic_regex;
+
+  template
+class match_results;
+
+_GLIBCXX_END_NAMESPACE_VERSION
+
 namespace __detail
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
@@ -48,6 +57,56 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 		  const basic_regex<_CharT, _TraitsT>& __re,
 		  regex_constants::match_flag_type __flags);
 
+  template
+class _Executor;
+
+  template
+struct __has_contiguous_iter : std::false_type { };
+
+  template
+struct __has_contiguous_iter>
+: std::true_type  // string storage is contiguous
+{ };
+
+  template
+struct __has_contiguous_iter>
+: std::true_type  // vector storage is contiguous
+{ };
+
+  template
+struct __has_contiguous_iter>
+: std::false_type // vector storage is not contiguous
+{ };
+
+  template
+struct __is_contiguous_normal_iter : std::false_type { };
+
+  template
+struct
+__is_contiguous_normal_iter<__gnu_cxx::__normal_iterator<_Tp, _Cont>>
+: __has_contiguous_iter<_Cont>::type
+{ };
+
+  template
+using __enable_if_contiguous_normal_iter
+  = typename enable_if< __is_contiguous_normal_iter<_Iter>::value,
+			std::shared_ptr<_NFA<_TraitsT>> >::type;
+
+  template
+using __disable_if_contiguous_normal_iter
+  = typename enable_if< !__is_contiguous_normal_iter<_Iter>::value,
+			std::shared_ptr<_NFA<_TraitsT>> >::type;
+
+  template
+__disable_if_contiguous_normal_iter<_FwdIter, _TraitsT>
+__compile_nfa(_FwdIter __first, _FwdIter __last, const _TraitsT& __traits,
+		  regex_constants::syntax_option_type __flags);
+
+  template
+__enable_if_contiguous_normal_iter<_Iter, _TraitsT>
+__compile_nfa(_Iter __first, _Iter __last, const _TraitsT& __traits,
+		  regex_constants::syntax_option_type __flags);
+
 _GLIBCXX_END_NAMESPACE_VERSION
 }
 
@@ -501,6 +560,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	basic_regex(_FwdIter __first, _FwdIter __last,
 		flag_type __f = ECMAScript)
 	: _M_flags(__f),
+	  _M_original_str(__first, __last),
 	  _M_automaton(__detail::__compile_nfa(__first, __last, _M_traits,
 	   _M_flags))
 	{ }
@@ -637,6 +697,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 	   flag_type __flags = ECMAScript)
 	{
 	  _M_flags = __flags;
+	  _M_original_str.assign(__s.begin(), __s.end());
 	  _M_automaton = __detail::__compile_nfa(__s.begin(), __s.end(),
 		 _M_traits, _M_flags);
 	  return *this;
@@ -701,7 +762,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*/
   locale_type
   imbue(locale_type __loc)
-  { return _M_traits.imbue(__loc); }
+  {
+	auto __ret = _M_traits.imbue(__loc);
+	this->assign(_M_original_str, _M_flags);
+	return __ret;
+  }
 
   

Re: [Patch] Regex bracket matcher cache optimization

2014-01-04 Thread Tim Shen
On Sat, Jan 4, 2014 at 4:25 AM, Paolo Carlini  wrote:
> Good. Could we actually measure something in the performance testsuite?

Before:
split.cc   108r  107u
  0s0mem0pf
split_bfs.cc   192r  191u
  0s0mem0pf

After:
split.cc14r   14u
  0s0mem0pf
split_bfs.cc78r   77u
  0s0mem0pf

The cache obviously works.


-- 
Regards,
Tim Shen


Re: [Patch] Regex bracket matcher cache optimization

2014-01-04 Thread Tim Shen
On Fri, Jan 3, 2014 at 9:35 PM, Tim Shen  wrote:
> The data structure _BracketMatcher (storing regex like [123a-z]) could
> be quite slow mainly because of regex_traits. So a result of cache for
> small range of inputs (char) sounds reasonable. It iterates all 256
> inputs and calculate them at regex compile time.

In addition, boost failed the test case
(testsuite/28_regex/traits/char/user_defined.cc), which can be seen
here: https://gist.github.com/innocentim/8257944


-- 
Regards,
Tim Shen


Re: [Patch] Regex bracket matcher cache optimization

2014-01-06 Thread Tim Shen
I'd prefer to propose another patch that should be commited with this
one, which fix bugs (say _M_flags & regex_constants::icase, not "&&"),
and do more  "matcher optimization".

It is now more DRY (_RegexTranslator<>) and efficient, but 
takes longer to compile, mainly because now we have 4 times more
_Compiler<> versions :)

Booted and tested with -m32 and -m64 respectively.

Thank you!


-- 
Regards,
Tim Shen
commit 9dcc87dd782c7a09b5e075e746ceb3d0c6f1f4db
Author: tim 
Date:   Mon Jan 6 00:03:41 2014 -0500

2014-01-07  Tim Shen  

* bits/regex_automaton.tcc: Indentation fix.
* bits/regex_compiler.h (__compile_nfa<>(), _Compiler<>,
_RegexTranslator<> _AnyMatcher<>, _CharMatcher<>,
_BracketMatcher<>): Add bool option template parameter and
specialization to make matching more efficient and space saving.
* bits/regex_compiler.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc 
b/libstdc++-v3/include/bits/regex_automaton.tcc
index 7edc67f..e222803 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -134,9 +134,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _NFA<_TraitsT>::_M_dot(std::ostream& __ostr) const
 {
   __ostr << "digraph _Nfa {\n"
-   "  rankdir=LR;\n";
+   "  rankdir=LR;\n";
   for (size_t __i = 0; __i < this->size(); ++__i)
-(*this)[__i]._M_dot(__ostr, __i);
+   (*this)[__i]._M_dot(__ostr, __i);
   __ostr << "}\n";
   return __ostr;
 }
diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4ac67df..4873166 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -39,11 +39,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  template
+  template
 struct _BracketMatcher;
 
   /// Builds an NFA from an input iterator interval.
-  template
+  template
 class _Compiler
 {
 public:
@@ -63,7 +63,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename _ScannerT::_TokenT _TokenT;
   typedef _StateSeq<_TraitsT>_StateSeqT;
   typedef std::stack<_StateSeqT, std::vector<_StateSeqT>> _StackT;
-  typedef _BracketMatcher<_TraitsT>  
_BMatcherT;
+  typedef _BracketMatcher<_TraitsT, __icase, __collate>   _BMatcherT;
   typedef std::ctype_CtypeT;
 
   // accepts a specific token or returns false.
@@ -95,12 +95,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   _M_expression_term(_BMatcherT& __matcher);
 
   bool
-  _M_range_expression(_BMatcherT& __matcher);
-
-  bool
-  _M_collating_symbol(_BMatcherT& __matcher);
-
-  bool
   _M_equivalence_class(_BMatcherT& __matcher);
 
   bool
@@ -134,8 +128,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __compile_nfa(_FwdIter __first, _FwdIter __last, const _TraitsT& __traits,
  regex_constants::syntax_option_type __flags)
 {
-  using _Cmplr = _Compiler<_FwdIter, _TraitsT>;
-  return _Cmplr(__first, __last, __traits, __flags)._M_get_nfa();
+  if (!(__flags & regex_constants::icase))
+   if (!(__flags & regex_constants::collate))
+ return _Compiler<_FwdIter, _TraitsT, false, false>
+   (__first, __last, __traits, __flags)._M_get_nfa();
+   else
+ return _Compiler<_FwdIter, _TraitsT, false, true>
+   (__first, __last, __traits, __flags)._M_get_nfa();
+  else
+   if (!(__flags & regex_constants::collate))
+ return _Compiler<_FwdIter, _TraitsT, true, false>
+   (__first, __last, __traits, __flags)._M_get_nfa();
+   else
+ return _Compiler<_FwdIter, _TraitsT, true, true>
+   (__first, __last, __traits, __flags)._M_get_nfa();
 }
 
   template
@@ -148,16 +154,110 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __compile_nfa(__cfirst, __cfirst + __len, __traits, __flags);
 }
 
-  template
-struct _AnyMatcher
+  // [28.13.14]
+  template
+class _RegexTranslator
 {
-  typedef typename _TraitsT::char_type   _CharT;
+public:
+  typedef typename _TraitsT::char_type   _CharT;
+  typedef typename _TraitsT::string_type _StringT;
+  typedef typename std::conditional<__collate,
+   _StringT,
+   _CharT>::type _StrTransT;
 
   explicit
-  _AnyMatcher(const _TraitsT& __traits)
+  _RegexTranslator(const _TraitsT& __traits)
   : _M_traits(__traits)
   { }
 
+  _CharT
+  _M_translate(_CharT __ch) con

Re: [Patch] Regex bracket matcher cache optimization

2014-01-07 Thread Tim Shen
On Tue, Jan 7, 2014 at 4:02 AM, Paolo Carlini  wrote:
> Ideally, I would suggest committing first the improvements in your previous
> patch (by the way, thanks for the numbers!) + the pure bug fixes and
> separate the further performance improvements which have compile-time
> performance implications (how big?), see if, eg, Jon has something to
> recommend. Can we do that?

First patch committed. I later found that the second patch "b.diff" is
based on the committed version (the attach, which fixed the "&&"
problem);

Here's an example to test compile time:

#include 

using namespace std;

int main() {
  regex a("a");
  regex b("a", regex_constants::ECMAScript | regex_constants::icase);
  regex c("a", regex_constants::ECMAScript | regex_constants::collate);
  regex d("a", regex_constants::ECMAScript | regex_constants::icase |
regex_constants::collate);
  return 0;
}

Before the second patch:
g++ -g -Wall -std=c++11 -O3 regextest.cc  3.30s user 0.13s system 99%
cpu 3.435 total

After it:
g++ -g -Wall -std=c++11 -O3 regextest.cc  4.35s user 0.10s system 99%
cpu 4.454 total

I didn't noticed that's so time consuming. I think reducing the
compile time is possible (by templating several member functions
instead of whole _Compiler<> class).


-- 
Regards,
Tim Shen
Index: include/std/regex
===
--- include/std/regex   (revision 206399)
+++ include/std/regex   (revision 206400)
@@ -56,11 +56,11 @@
 
 #include 
 #include 
+#include 
+#include 
 #include 
-#include 
 #include 
 #include 
-#include 
 
 #endif // C++11
 
Index: include/bits/regex_compiler.h
===
--- include/bits/regex_compiler.h   (revision 206399)
+++ include/bits/regex_compiler.h   (revision 206400)
@@ -129,43 +129,6 @@
   _StackT _M_stack;
 };
 
-  template
-struct __has_contiguous_iter : std::false_type { };
-
-  template
-struct __has_contiguous_iter>
-: std::true_type  // string storage is contiguous
-{ };
-
-  template
-struct __has_contiguous_iter>
-: std::true_type  // vector storage is contiguous
-{ };
-
-  template
-struct __has_contiguous_iter>
-: std::false_type // vector storage is not contiguous
-{ };
-
-  template
-struct __is_contiguous_normal_iter : std::false_type { };
-
-  template
-struct
-__is_contiguous_normal_iter<__gnu_cxx::__normal_iterator<_Tp, _Cont>>
-: __has_contiguous_iter<_Cont>::type
-{ };
-
-  template
-using __enable_if_contiguous_normal_iter
-  = typename enable_if< __is_contiguous_normal_iter<_Iter>::value,
-   std::shared_ptr<_NFA<_TraitsT>> >::type;
-
-  template
-using __disable_if_contiguous_normal_iter
-  = typename enable_if< !__is_contiguous_normal_iter<_Iter>::value,
-   std::shared_ptr<_NFA<_TraitsT>> >::type;
-
   template
 inline __disable_if_contiguous_normal_iter<_FwdIter, _TraitsT>
 __compile_nfa(_FwdIter __first, _FwdIter __last, const _TraitsT& __traits,
@@ -185,7 +148,7 @@
   return __compile_nfa(__cfirst, __cfirst + __len, __traits, __flags);
 }
 
-  template
+  template
 struct _AnyMatcher
 {
   typedef typename _TraitsT::char_type   _CharT;
@@ -197,25 +160,55 @@
 
   bool
   operator()(_CharT __ch) const
+  { return _M_apply(__ch, typename is_same<_CharT, char>::type()); }
+
+  bool
+  _M_apply(_CharT __ch, true_type) const
   {
-   return _M_traits.translate(__ch) != '\n'
- && _M_traits.translate(__ch) != '\r'
- && _M_traits.translate(__ch) != u'\u2028'
- && _M_traits.translate(__ch) != u'\u2029';
+   auto __c = _M_traits.translate(__ch);
+   if (__is_ecma)
+ {
+   static auto __n = _M_traits.translate('\n');
+   static auto __r = _M_traits.translate('\r');
+   return __c != __n && __c != __r;
+ }
+   else
+ {
+   static auto __nul = _M_traits.translate('\0');
+   return __c != __nul;
+ }
   }
 
+  bool
+  _M_apply(_CharT __ch, false_type) const
+  {
+   auto __c = _M_traits.translate(__ch);
+   if (__is_ecma)
+ {
+   static auto __n = _M_traits.translate('\n');
+   static auto __r = _M_traits.translate('\r');
+   static auto __u2028 = _M_traits.translate(u'\u2028');
+   static auto __u2029 = _M_traits.translate(u'\u2029');
+   return __c != __n && __c != __r && __c != __u2028
+ && __c != __u2029;
+ }
+   else

Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Tim Shen
On Wed, Jan 8, 2014 at 5:20 AM, Paolo Carlini  wrote:
> On 01/08/2014 10:24 AM, Jonathan Wakely wrote:
>> Ouch! Yes, that's quite a bit slower, and this code is already very
>> slow to compile.

With this patch (who is based on a-fixed.diff, committed earlerly),
who use templated member functions instead of templating the whole
_Compiler, time consumption is:
g++ -g -Wall -std=c++11 -g -Wall -std=c++11 -O3 regextest.cc  3.79s
user 0.14s system 98% cpu 3.981 total

Comparing to 4.5s it's better and probably fine.

Booted and tested with -m32 and -m64 respectively.

> I only want to add that, besides keeping compile-time under control for
> 4.9.0 - please investigate a bit more along the mentioned lines - we should
> also start experimenting with exporting the instantiations. I don't know
> what the other implementations are doing, but in general it definitely makes
> sense, for compile-time performance too. I think we already said that some
> time ago, but the issue seems more important now. Maybe it's really
> unavoidable if we need template complexity for first class run-time
> performance.

After this patch I plan to instantiate _Compiler and _Executor.


-- 
Regards,
Tim Shen


Re: [Patch] Regex bracket matcher cache optimization

2014-01-08 Thread Tim Shen
On Wed, Jan 8, 2014 at 5:38 PM, Paolo Carlini  wrote:
> I agree, it's probably fine for now, but please actually attach the patch ;)

Oops sorry >.<


So my plan is to instantiate _Compiler and _Executor instead of user
interfaces like basic_regex or regex_match, because the implementation
may change (say add a new executor) later. Is that Ok?


-- 
Regards,
Tim Shen
commit d9f47e783680a1cab86bd704e67236025cbdff18
Author: tim 
Date:   Mon Jan 6 00:03:41 2014 -0500

    2014-01-08  Tim Shen  

* bits/regex_automaton.tcc: Indentation fix.
* bits/regex_compiler.h (__compile_nfa<>(), _Compiler<>,
_RegexTranslator<> _AnyMatcher<>, _CharMatcher<>,
_BracketMatcher<>): Add bool option template parameters and
specializations to make matching more efficient and space saving.
* bits/regex_compiler.tcc: Likewise.

diff --git a/libstdc++-v3/include/bits/regex_automaton.tcc 
b/libstdc++-v3/include/bits/regex_automaton.tcc
index 7edc67f..e222803 100644
--- a/libstdc++-v3/include/bits/regex_automaton.tcc
+++ b/libstdc++-v3/include/bits/regex_automaton.tcc
@@ -134,9 +134,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 _NFA<_TraitsT>::_M_dot(std::ostream& __ostr) const
 {
   __ostr << "digraph _Nfa {\n"
-   "  rankdir=LR;\n";
+   "  rankdir=LR;\n";
   for (size_t __i = 0; __i < this->size(); ++__i)
-(*this)[__i]._M_dot(__ostr, __i);
+   (*this)[__i]._M_dot(__ostr, __i);
   __ostr << "}\n";
   return __ostr;
 }
diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4ac67df..b73fe30 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -39,7 +39,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
* @{
*/
 
-  template
+  template
 struct _BracketMatcher;
 
   /// Builds an NFA from an input iterator interval.
@@ -63,7 +63,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef typename _ScannerT::_TokenT _TokenT;
   typedef _StateSeq<_TraitsT>_StateSeqT;
   typedef std::stack<_StateSeqT, std::vector<_StateSeqT>> _StackT;
-  typedef _BracketMatcher<_TraitsT>  
_BMatcherT;
   typedef std::ctype_CtypeT;
 
   // accepts a specific token or returns false.
@@ -91,20 +90,30 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool
   _M_bracket_expression();
 
-  void
-  _M_expression_term(_BMatcherT& __matcher);
+  template
+   void
+   _M_insert_any_matcher_ecma();
 
-  bool
-  _M_range_expression(_BMatcherT& __matcher);
+  template
+   void
+   _M_insert_any_matcher_posix();
 
-  bool
-  _M_collating_symbol(_BMatcherT& __matcher);
+  template
+   void
+   _M_insert_char_matcher();
 
-  bool
-  _M_equivalence_class(_BMatcherT& __matcher);
+  template
+   void
+   _M_insert_character_class_matcher();
 
-  bool
-  _M_character_class(_BMatcherT& __matcher);
+  template
+   void
+   _M_insert_bracket_matcher(bool __neg);
+
+  template
+   void
+   _M_expression_term(_BracketMatcher<_TraitsT, __icase, __collate>&
+  __matcher);
 
   int
   _M_cur_int_value(int __radix);
@@ -148,16 +157,110 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return __compile_nfa(__cfirst, __cfirst + __len, __traits, __flags);
 }
 
-  template
-struct _AnyMatcher
+  // [28.13.14]
+  template
+class _RegexTranslator
 {
-  typedef typename _TraitsT::char_type   _CharT;
+public:
+  typedef typename _TraitsT::char_type   _CharT;
+  typedef typename _TraitsT::string_type _StringT;
+  typedef typename std::conditional<__collate,
+   _StringT,
+   _CharT>::type _StrTransT;
 
   explicit
-  _AnyMatcher(const _TraitsT& __traits)
+  _RegexTranslator(const _TraitsT& __traits)
   : _M_traits(__traits)
   { }
 
+  _CharT
+  _M_translate(_CharT __ch) const
+  {
+   if (__icase)
+ return _M_traits.translate_nocase(__ch);
+   else if (__collate)
+ return _M_traits.translate(__ch);
+   else
+ return __ch;
+  }
+
+  _StrTransT
+  _M_transform(_CharT __ch) const
+  {
+   return _M_transform_impl(__ch, typename integral_constant::type());
+  }
+
+private:
+  _StrTransT
+  _M_transform_impl(_CharT __ch, false_type) const
+  { return __ch; }
+
+  _StrTransT
+  _M_transform_impl(_CharT __ch, true_type) const
+  {
+   _StrTransT __str = _StrTransT(1, _M_translate(__ch));
+   return _M_traits.transform(__str

Re: [GSoC] Patches for shared_ptr array and polymorphic_allocator

2015-07-18 Thread Tim Shen
On Fri, Jul 17, 2015 at 7:16 PM, Fan You  wrote:
> Hi,
>
> According to 
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4335.html#memory.smartptr>
>
> Here is my implementation of
>
> [8.2] Extend shared_ptr to support arrays

Please don't resend the shared_ptr patch, since it's already tracked
in another thread.

> [8.3] Type-Erased allocator

Please send a working patch with tests and (probably with Makefile.am changes).

Format: please replace leading consecutive spaces with tabs.

L70:
static std::atomic s_default_resource;

naming: _S_default_resource.

L43:
virtual ~memory_resource() { }

Please break the line after virtual/return type. This also applies for
other places in the patch.

L81:
  template 
class __constructor_helper_imp

This doesn't work correctly for at least following case:
std::__constructor_helper_imp(std::allocator_arg,
std::allocator(),
std::true_type(),
std::true_type(),
std::true_type());

Based on [allocator.uses.construction], this falls into the second
rule, because there exists priorities in those rules; but overloading
resolution thinks it's ambiguous.

To enforce the order, you can do this:

template
struct __uses_allocator_construction_helper;

...
{
constexpr bool __uses_alloc = uses_allocator<...>::value;
constexpr bool __normally_constructible = is_constructible<_Tp,
_Args...>::value;
constexpr bool __constructible_alloc_before =
is_constructible<_Tp, allocator_tag_t, _Alloc, _Args...>::value;
constexpr bool __constructible_alloc_after =
is_constructible<_Args..., _Alloc>::value;

constexpr bool __uses_rule1 = !__uses_alloc && __normally_constructible;
constexpr bool __uses_rule2 = __uses_alloc && __constructible_alloc_before;
constexpr bool __uses_rule3 = __uses_alloc && __constructible_alloc_after;
__uses_allocator_construction_helper<__uses_rule1 ? 1 :
(__uses_rule2 ? 2 : (__uses_rule3 ? 3 : 0))>::_S_apply(...);
}

Consider use a more readable helper name, like
__uses_allocator_construction_helper and document it.

L73:
  bool operator==(const memory_resource& __a,
  const memory_resource& __b) noexcept
  { return &__a == &__b || __a.is_equal(__b); }

Make all non-template functions inlined.

L178:
  { } // used here

What does this mean?

L180:
  polymorphic_allocator(memory_resource* __r)
  : _M_resource(__r ? __r : get_default_resource())
  { }

[8.6.2.3] describes __r != nullptr as a precondition, which is
guaranteed by the caller, so we don't have to check here.
Alternatively you can use _GLIBCXX_ASSERT.

L262:
  memory_resource* _M_resource;

private member variables should be defined after private member functions.

L286:
  template 
bool operator!=(const polymorphic_allocator<_Tp1>& __a,
const polymorphic_allocator<_Tp2>& __b) noexcept
{ return ! (__a == __b); }

Remove extra space after "!".

L340:
auto __p = dynamic_cast(&__other);

What if the user turns off RTTI?

L345:
  // Calculate Aligned Size
  size_t _Aligned_size(size_t __size, size_t __alignment)
  { return ((__size - 1)|(__alignment - 1)) + 1; }

  bool _M_supported (size_t __x)
  { return ((__x != 0) && (__x != 0) && !(__x & (__x - 1))); }

Document these two functions' behaviors, e.g.:
Returns a size that is larger than or equal to __size and divided by
__alignment, where __alignment is required to be the power of 2.

L355:
  // Global memory resources
  atomic memory_resource::s_default_resource;
L386:
  // The default memory resource
  memory_resource* get_default_resource() noexcept
  {
memory_resource *__ret
  = memory_resource::s_default_resource.load();

if (__ret == nullptr) { __ret = new_delete_resource(); }
return __ret;
  }

According to [8.8.5], memory resource pointer should be intialized
with new_delete_resource(), not nullptr; and get_default_resource
should only return the pointer.

L396:
  memory_resource* set_default_resource(memory_resource* __r) noexcept
  {
if ( __r == nullptr)
{ __r = new_delete_resource(); }

memory_resource* __prev = get_default_resource();
memory_resource::s_default_resource.store(__r);
return __prev;
  }

We shouldn't care if it's nullptr or not.
Your get-then-set may cause a data race. I think
std::atomic<>::exchange will work, but we should confirm with Jon.


-- 
Regards,
Tim Shen


Re: [Patch] Small refactor on _State<>

2015-07-25 Thread Tim Shen
On Sat, Jul 25, 2015 at 12:11 AM, Tim Shen  wrote:
> It's not a very necessary refactoring, but simply can't resist. :)
>
> I'm not sure of the ::memcpy calls. It looks not very idiomatic, but
> std::copy on char* looks even more weird? :/
>
> Bootstrapped and tested.
>
> Thanks!
>
>
> --
> Regards,
> Tim Shen



-- 
Regards,
Tim Shen


[libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-26 Thread Tim Shen
Kinda important, since "[a-z0-9-]" may be a common case.

Bootstrapped and tested.

Guess it can also be backported to 5, or even 4.9?

Thanks!


-- 
Regards,
Tim Shen


Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-26 Thread Tim Shen
On Sun, Jul 26, 2015 at 5:19 AM, Tim Shen  wrote:
> Kinda important, since "[a-z0-9-]" may be a common case.
>
> Bootstrapped and tested.

Actual patch...

-- 
Regards,
Tim Shen
commit e0e6c2e3b722e1453d29ad3a56d0de80046453b0
Author: Tim Shen 
Date:   Sun Jul 26 04:37:45 2015 -0700

PR libstdc++/67015

* include/bits/regex_compiler.h (_Compiler<>::_M_expression_term,
_BracketMatcher<>::_M_add_collating_element): Change signature
to make checking the and of bracket expression easier.
* include/bits/regex_compiler.tcc (_Compiler<>::_M_expression_term):
Treat '-' as a valid literal if it's at the end of bracket expression.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
New testcases.

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4472116..6d9799e 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -116,8 +116,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
void
_M_insert_bracket_matcher(bool __neg);
 
+  // Returns true if successfully matched one term and should continue.
+  // Returns false if the compiler should move on.
   template
-   void
+   bool
_M_expression_term(pair& __last_char,
   _BracketMatcher<_TraitsT, __icase, __collate>&
   __matcher);
@@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   }
 
-  void
+  _StringT
   _M_add_collating_element(const _StringT& __s)
   {
auto __st = _M_traits.lookup_collatename(__s.data(),
@@ -400,6 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_DEBUG
_M_is_ready = false;
 #endif
+   return __st;
   }
 
   void
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 33d7118..f48e3c1 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -424,8 +424,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__last_char.first = true;
__last_char.second = _M_value[0];
  }
-  while (!_M_match_token(_ScannerT::_S_token_bracket_end))
-   _M_expression_term(__last_char, __matcher);
+  while (_M_expression_term(__last_char, __matcher));
   __matcher._M_ready();
   _M_stack.push(_StateSeqT(
  *_M_nfa,
@@ -434,21 +433,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
   template
-void
+bool
 _Compiler<_TraitsT>::
 _M_expression_term(pair& __last_char,
   _BracketMatcher<_TraitsT, __icase, __collate>& __matcher)
 {
+  if (_M_match_token(_ScannerT::_S_token_bracket_end))
+   return false;
+
   if (_M_match_token(_ScannerT::_S_token_collsymbol))
-   __matcher._M_add_collating_element(_M_value);
+   {
+ auto __symbol = __matcher._M_add_collating_element(_M_value);
+ if (__symbol.size() == 1)
+   {
+ __last_char.first = true;
+ __last_char.second = __symbol[0];
+   }
+   }
   else if (_M_match_token(_ScannerT::_S_token_equiv_class_name))
__matcher._M_add_equivalence_class(_M_value);
   else if (_M_match_token(_ScannerT::_S_token_char_class_name))
__matcher._M_add_character_class(_M_value, false);
-  // POSIX doesn't permit '-' as a start-range char (say [a-z--0]),
-  // except when the '-' is the first character in the bracket expression
-  // ([--0]). ECMAScript treats all '-' after a range as a normal 
character.
-  // Also see above, where _M_expression_term gets called.
+  // POSIX doesn't allow '-' as a start-range char (say [a-z--0]),
+  // except when the '-' is the first or last character in the bracket
+  // expression ([--0]). ECMAScript treats all '-' after a range as a
+  // normal character. Also see above, where _M_expression_term gets 
called.
   //
   // As a result, POSIX rejects [-], but ECMAScript doesn't.
   // Boost (1.57.0) always uses POSIX style even in its ECMAScript syntax.
@@ -459,10 +468,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
{
  if (!__last_char.first)
{
+ __matcher._M_add_char(_M_value[0]);
  if (_M_value[0] == '-'
  && !(_M_flags & regex_constants::ECMAScript))
-   __throw_regex_error(regex_constants::error_range);
- __matcher._M_add_char(_M_value[0]);
+   {
+ if (_M_match_token(_ScannerT::_S_token_bracket_end))
+   return false;
+ __throw_regex_error(regex_constants:

Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-27 Thread Tim Shen
On Mon, Jul 27, 2015 at 4:45 AM, Jonathan Wakely  wrote:
> On 26/07/15 05:20 -0700, Tim Shen wrote:
>>
>> @@ -389,7 +391,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
>> #endif
>>   }
>>
>> -  void
>> +  _StringT
>>   _M_add_collating_element(const _StringT& __s)
>>   {
>> auto __st = _M_traits.lookup_collatename(__s.data(),
>
>
> Changing this return type is an ABI change. When compiled with older
> versions of GCC this function will not return anything, so code
> compiled with newer versions that links to that older code will read
> garbage off the stack if linked to the old version.
>
> (This isn't a problem for _M_expression_term because the return type
> is part of the mangled name for function templates).
>
> One solution would be to rename the function, so that code compiled
> with the new GCC will use the new function, not link to the old
> version that doesn't return anything.

Done by s/_M_add_collating_element/_M_add_collate_element/.


-- 
Regards,
Tim Shen
commit 4a3b4ab285b0ecabb706a776551dabf5a37059aa
Author: Tim Shen 
Date:   Sun Jul 26 04:37:45 2015 -0700

PR libstdc++/67015

* include/bits/regex_compiler.h (_Compiler<>::_M_expression_term,
_BracketMatcher<>::_M_add_collating_element): Change signature
to make checking the and of bracket expression easier.
* include/bits/regex_compiler.tcc (_Compiler<>::_M_expression_term):
Treat '-' as a valid literal if it's at the end of bracket expression.
* testsuite/28_regex/algorithms/regex_match/cstring_bracket_01.cc:
New testcases.

diff --git a/libstdc++-v3/include/bits/regex_compiler.h 
b/libstdc++-v3/include/bits/regex_compiler.h
index 4472116..0cb0c04 100644
--- a/libstdc++-v3/include/bits/regex_compiler.h
+++ b/libstdc++-v3/include/bits/regex_compiler.h
@@ -116,8 +116,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
void
_M_insert_bracket_matcher(bool __neg);
 
+  // Returns true if successfully matched one term and should continue.
+  // Returns false if the compiler should move on.
   template
-   void
+   bool
_M_expression_term(pair& __last_char,
   _BracketMatcher<_TraitsT, __icase, __collate>&
   __matcher);
@@ -389,8 +391,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   }
 
-  void
-  _M_add_collating_element(const _StringT& __s)
+  _StringT
+  _M_add_collate_element(const _StringT& __s)
   {
auto __st = _M_traits.lookup_collatename(__s.data(),
 __s.data() + __s.size());
@@ -400,6 +402,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_DEBUG
_M_is_ready = false;
 #endif
+   return __st;
   }
 
   void
diff --git a/libstdc++-v3/include/bits/regex_compiler.tcc 
b/libstdc++-v3/include/bits/regex_compiler.tcc
index 33d7118..9a62311 100644
--- a/libstdc++-v3/include/bits/regex_compiler.tcc
+++ b/libstdc++-v3/include/bits/regex_compiler.tcc
@@ -424,8 +424,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__last_char.first = true;
__last_char.second = _M_value[0];
  }
-  while (!_M_match_token(_ScannerT::_S_token_bracket_end))
-   _M_expression_term(__last_char, __matcher);
+  while (_M_expression_term(__last_char, __matcher));
   __matcher._M_ready();
   _M_stack.push(_StateSeqT(
  *_M_nfa,
@@ -434,21 +433,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
   template
-void
+bool
 _Compiler<_TraitsT>::
 _M_expression_term(pair& __last_char,
   _BracketMatcher<_TraitsT, __icase, __collate>& __matcher)
 {
+  if (_M_match_token(_ScannerT::_S_token_bracket_end))
+   return false;
+
   if (_M_match_token(_ScannerT::_S_token_collsymbol))
-   __matcher._M_add_collating_element(_M_value);
+   {
+ auto __symbol = __matcher._M_add_collate_element(_M_value);
+ if (__symbol.size() == 1)
+   {
+ __last_char.first = true;
+ __last_char.second = __symbol[0];
+   }
+   }
   else if (_M_match_token(_ScannerT::_S_token_equiv_class_name))
__matcher._M_add_equivalence_class(_M_value);
   else if (_M_match_token(_ScannerT::_S_token_char_class_name))
__matcher._M_add_character_class(_M_value, false);
-  // POSIX doesn't permit '-' as a start-range char (say [a-z--0]),
-  // except when the '-' is the first character in the bracket expression
-  // ([--0]). ECMAScript treats all '-' after a range as a normal 
character.
-  // Also see above, where _M_expression_term gets called.
+  // POSIX doesn't allow '-' as a start-range char

Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-28 Thread Tim Shen
On Tue, Jul 28, 2015 at 3:27 AM, Jonathan Wakely  wrote:
> On 27/07/15 19:40 -0700, Tim Shen wrote:
>>
>> Done by s/_M_add_collating_element/_M_add_collate_element/.
>
>
> Great, thanks. OK for trunk and gcc-5-branch.

Committed. Is there no need for gcc-4_9-branch? What's the general
policy for backporting fixes?

Thanks!


-- 
Regards,
Tim Shen


Re: [Patch] Small refactor on _State<>

2015-07-29 Thread Tim Shen
On Wed, Jul 29, 2015 at 1:32 AM, Jonathan Wakely  wrote:
> Apologies, you have a user-declared move constructor, so assignment is
> already deleted. It wouldn't hurt to make that explicit though:

I'm just bad at memorizing when they are implicitly
declared/defined/deleted and fully unware of the deleted copy
assignment :)

It's better to make it explicit, since I don't want to be clever on
pretending knowing all ctor/assigment rules :)

>  _State& operator=(const _State&) = delete;
>
> So it's just the alignment issue that I'm concerned about now.
>

Will alignof(std::function) be all the same across
instantiations, including user-defined C, now and in the future?

I created template parameter _Matcher_size (which should be
__matcher_size) because I don't want to make this assumption on
sizeof(). If they (both alignment and size) are expected to be the
same, we can remove that template parameter (and all indentation
changes!); otherwise, is alignment more unlikely to change than size
the reason we always stick on alignof(std::function)?


-- 
Regards,
Tim Shen


Re: [libstdc++/67015, patch] Fix regex POSIX bracket parsing

2015-07-29 Thread Tim Shen
On Wed, Jul 29, 2015 at 1:35 AM, Jonathan Wakely  wrote:
> The 4.9 branch is the oldest one we still support, so should be very
> stable now, although C++11 support in 4.9 is still labelled
> experimental, and this only changes C++11 code. How important is it to
> fix?

Um... I'd say medium importance, for [a-z0-9-] not working in POSIX
syntax (while ECMAScript seems to be more popular, which works
correctly).

If 4.9 is `stable` and expected to be experimetal, which by my
definition should only include important changes, we may leave it as
is?


-- 
Regards,
Tim Shen


Re: [Patch] Small refactor on _State<>

2015-07-29 Thread Tim Shen
On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely  wrote:
> Yes, that makes sense. See the code in  for how
> to set the alignment of the buffer appropriately. You can use the size
> and alignment of std::function even though it will
> sometimes be a different std::function specialization.

Done.

Also change _Executor::_M_lookahead(_State<>) to
_Executor::_M_lookahead(_StateIdT __next_state).


-- 
Regards,
Tim Shen
commit 52c70e70bdbef15c787f81e83722bfc119543ff0
Author: Tim Shen 
Date:   Wed Jul 29 21:08:43 2015 -0700

* include/bits/regex_automaton.h (_State_base, _State<>):
Remove _TraitsT dependency from _State<>; Make matcher member
into the union to reduce struct size.
* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
_State_base<>::_M_dot, _StateSeq<>::_M_clone):
Adjust to fit the interface. Factor out common parts in
_M_clone as _State<>::_M_has_alt.
* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
Only pass state id instead of the whole state.
* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
_Executer<>::_M_lookahead): Adjust to fit the interface.
* include/std/regex: Include 

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..0ef3896 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
 _Opcode  _M_opcode;   // type of outgoing transition
+
+  public:
 _StateIdT_M_next; // outgoing transition
 union // Since they are mutually exclusive.
 {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// quantifiers (ungreedy if set true)
bool   _M_neg;
   };
+  // For _S_opcode_match
+  __gnu_cxx::__aligned_membuf<_Matcher> _M_matcher_storage;
 };
 
+  protected:
 explicit _State_base(_Opcode __opcode)
 : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
 { }
 
-  protected:
-~_State_base() = default;
-
   public:
+bool
+_M_has_alt()
+{
+  return _M_opcode == _S_opcode_alternative
+   || _M_opcode == _S_opcode_repeat
+   || _M_opcode == _S_opcode_subexpr_lookahead;
+}
+
 #ifdef _GLIBCXX_DEBUG
 std::ostream&
 _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template
+  template
 struct _State : _State_base
 {
-  typedef _Matcher _MatcherT;
+  typedef _Matcher<_Char_type> _MatcherT;
+  static_assert(sizeof(_MatcherT) == sizeof(_Matcher),
+   "The aussmption std::function has "
+   "the same size as std::function is violated");
+  static_assert(alignof(_MatcherT) == alignof(_Matcher),
+   "The aussmption std::function has "
+   "the same alignment as std::function is violated");
+
+  explicit
+  _State(_Opcode __opcode) : _State_base(__opcode)
+  {
+   if (_M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr()) _MatcherT();
+  }
+
+  _State(const _State& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(__rhs._M_get_matcher());
+  }
+
+  _State(_State&& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(std::move(__rhs._M_get_matcher()));
+  }
+
+  _State&
+  operator=(const _State&) = delete;
+
+  ~_State()
+  {
+   if (_M_opcode() == _S_opcode_match)
+ _M_get_matcher().~_MatcherT();
+  }
+
+  // Since correct ctor and dtor rely on _M_opcode, it's better not to
+  // change it over time.
+  _Opcode
+  _M_opcode() const
+  { return _State_base::_M_opcode; }
+
+  bool
+  _M_matches(_Char_type __char) const
+  { return _M_get_matcher()(__char); }
 
-  _MatcherT  _M_matches;// for _S_opcode_match
+  _MatcherT&
+  _M_get_matcher()
+  { return 
*reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-  explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+  const _MatcherT&
+  _M_get_matcher() const
+  { return *reinterpret_cast(this->_M_matcher_storage._M_addr()); }
 };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct _NFA
-: _NFA_base, std::vector<_State<_TraitsT>>
+: _NFA_base, std::vector<_State>
 {
-  typede

Re: [Patch] Small refactor on _State<>

2015-07-29 Thread Tim Shen
On Wed, Jul 29, 2015 at 9:21 PM, Tim Shen  wrote:
> On Wed, Jul 29, 2015 at 2:15 AM, Jonathan Wakely  wrote:
>> Yes, that makes sense. See the code in  for how
>> to set the alignment of the buffer appropriately. You can use the size
>> and alignment of std::function even though it will
>> sometimes be a different std::function specialization.
>
> Done.

Oops, fixed one typo: s/_Matcher/_Matcher/.


-- 
Regards,
Tim Shen
commit d9fb4e3ec5eb9fcaf08f757c2a9ddcf57289684f
Author: Tim Shen 
Date:   Wed Jul 29 21:08:43 2015 -0700

* include/bits/regex_automaton.h (_State_base, _State<>):
Remove _TraitsT dependency from _State<>; Make matcher member
into the union to reduce struct size.
* include/bits/regex_automaton.tcc (_State_base<>::_M_print,
_State_base<>::_M_dot, _StateSeq<>::_M_clone):
Adjust to fit the interface. Factor out common parts in
_M_clone as _State<>::_M_has_alt.
* include/bits/regex_executor.h (_Executer<>::_M_lookahead):
Only pass state id instead of the whole state.
* include/bits/regex_executor.tcc (_Executer<>::_M_dfs,
_Executer<>::_M_lookahead): Adjust to fit the interface.
* include/std/regex: Include 

diff --git a/libstdc++-v3/include/bits/regex_automaton.h 
b/libstdc++-v3/include/bits/regex_automaton.h
index fc0eb41..e153d42 100644
--- a/libstdc++-v3/include/bits/regex_automaton.h
+++ b/libstdc++-v3/include/bits/regex_automaton.h
@@ -72,7 +72,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   struct _State_base
   {
+  protected:
 _Opcode  _M_opcode;   // type of outgoing transition
+
+  public:
 _StateIdT_M_next; // outgoing transition
 union // Since they are mutually exclusive.
 {
@@ -87,16 +90,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// quantifiers (ungreedy if set true)
bool   _M_neg;
   };
+  // For _S_opcode_match
+  __gnu_cxx::__aligned_membuf<_Matcher> _M_matcher_storage;
 };
 
+  protected:
 explicit _State_base(_Opcode __opcode)
 : _M_opcode(__opcode), _M_next(_S_invalid_state_id)
 { }
 
-  protected:
-~_State_base() = default;
-
   public:
+bool
+_M_has_alt()
+{
+  return _M_opcode == _S_opcode_alternative
+   || _M_opcode == _S_opcode_repeat
+   || _M_opcode == _S_opcode_subexpr_lookahead;
+}
+
 #ifdef _GLIBCXX_DEBUG
 std::ostream&
 _M_print(std::ostream& ostr) const;
@@ -107,14 +118,64 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif
   };
 
-  template
+  template
 struct _State : _State_base
 {
-  typedef _Matcher _MatcherT;
+  typedef _Matcher<_Char_type> _MatcherT;
+  static_assert(sizeof(_MatcherT) == sizeof(_Matcher),
+   "The aussmption std::function has "
+   "the same size as std::function is violated");
+  static_assert(alignof(_MatcherT) == alignof(_Matcher),
+   "The aussmption std::function has "
+   "the same alignment as std::function is violated");
+
+  explicit
+  _State(_Opcode __opcode) : _State_base(__opcode)
+  {
+   if (_M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr()) _MatcherT();
+  }
+
+  _State(const _State& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(__rhs._M_get_matcher());
+  }
+
+  _State(_State&& __rhs) : _State_base(__rhs)
+  {
+   if (__rhs._M_opcode() == _S_opcode_match)
+ new (this->_M_matcher_storage._M_addr())
+   _MatcherT(std::move(__rhs._M_get_matcher()));
+  }
+
+  _State&
+  operator=(const _State&) = delete;
+
+  ~_State()
+  {
+   if (_M_opcode() == _S_opcode_match)
+ _M_get_matcher().~_MatcherT();
+  }
+
+  // Since correct ctor and dtor rely on _M_opcode, it's better not to
+  // change it over time.
+  _Opcode
+  _M_opcode() const
+  { return _State_base::_M_opcode; }
+
+  bool
+  _M_matches(_Char_type __char) const
+  { return _M_get_matcher()(__char); }
 
-  _MatcherT  _M_matches;// for _S_opcode_match
+  _MatcherT&
+  _M_get_matcher()
+  { return 
*reinterpret_cast<_MatcherT*>(this->_M_matcher_storage._M_addr()); }
 
-  explicit _State(_Opcode __opcode) : _State_base(__opcode) { }
+  const _MatcherT&
+  _M_get_matcher() const
+  { return *reinterpret_cast(this->_M_matcher_storage._M_addr()); }
 };
 
   struct _NFA_base
@@ -155,10 +216,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
 struct _NFA
-: _NFA_base, std::vector<_State<_TraitsT>>
+: _NFA_base, std::vector<_State>
 

[v3 patch] refactoring - pull out common data members as _Context

2015-08-06 Thread Tim Shen
In next few weeks I'm gonna sending patches for refactoring .

The goal is to make the gigantic _Executor class into several smaller
ones and hopefully more readable. After that, there are several
correctness/performance issues to be fixed. We may also need more
documentation, but self-documenting code is even more important.

I've almost finished my drafting (with a lot of off-history back and
forth, but stablized finally):
https://github.com/innocentim/gcc/commits/executer, but I don't want
to commit this branch, since it's more like a mind experiment (and
trust me, you won't follow the commit history of this branch ;).

This patch doesn't change any behavior. I've tried my best not to
sacrifies performance for this _Context abstraction (as all C++
programmers do :), so far so good.

I let _Executor privately inherits from _Context; but alternatively we
can just make it a member.

Bootstrapped and tested.

Take your time. This isn't urgent. My plan is to finish them by the
next big release.

Thanks!


-- 
Regards,
Tim Shen
commit 007f04c627b0e0ea83b4e2e818354254f4ce649a
Author: Tim Shen 
Date:   Thu Aug 6 00:37:21 2015 -0700

* include/bits/regex.tcc: Adjust caller side for interface change.
* include/bits/regex_executor.h: Created __regex namespace.
Created _Context for common data members for all executers.
Do not pass around __match_mode, but make it a _Context member
(_M_search_mode). Deleted unnecessary member _State_info::_M_start.
* include/bits/regex_executor.tcc: Adjust for interface/member change.
* include/std/regex: Reorder included headers.

diff --git a/libstdc++-v3/include/bits/regex.tcc 
b/libstdc++-v3/include/bits/regex.tcc
index fedc2b9..361ad32 100644
--- a/libstdc++-v3/include/bits/regex.tcc
+++ b/libstdc++-v3/include/bits/regex.tcc
@@ -61,25 +61,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__it.matched = false;
 
   bool __ret;
+  constexpr auto __mode = __match_mode
+   ? __regex::_Search_mode::_Match : __regex::_Search_mode::_Search;
   if ((__re.flags() & regex_constants::__polynomial)
  || (__policy == _RegexExecutorPolicy::_S_alternate
  && !__re._M_automaton->_M_has_backref))
{
- _Executor<_BiIter, _Alloc, _TraitsT, false>
-   __executor(__s, __e, __m, __re, __flags);
- if (__match_mode)
-   __ret = __executor._M_match();
- else
-   __ret = __executor._M_search();
+ __regex::_Executor<_BiIter, _Alloc, _TraitsT, false> __executor(
+ __s, __e, *__re._M_automaton, __flags, __mode, __res);
+ __ret = __executor.template _M_match<__mode>();
}
   else
{
- _Executor<_BiIter, _Alloc, _TraitsT, true>
-   __executor(__s, __e, __m, __re, __flags);
- if (__match_mode)
-   __ret = __executor._M_match();
- else
-   __ret = __executor._M_search();
+ __regex::_Executor<_BiIter, _Alloc, _TraitsT, true> __executor(
+ __s, __e, *__re._M_automaton, __flags, __mode, __res);
+ __ret = __executor.template _M_match<__mode>();
}
   if (__ret)
{
diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index f3f8876..5310bb8 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -32,15 +32,71 @@
 
 namespace std _GLIBCXX_VISIBILITY(default)
 {
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+_GLIBCXX_BEGIN_NAMESPACE_CXX11
+
+  template
+class sub_match;
+
+_GLIBCXX_END_NAMESPACE_CXX11
+_GLIBCXX_END_NAMESPACE_VERSION
 namespace __detail
 {
+namespace __regex
+{
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
-
   /**
* @addtogroup regex-detail
* @{
*/
 
+  enum class _Search_mode : unsigned char { _Match, _Search };
+
+  template
+struct _Context
+{
+  using _Char_type = typename iterator_traits<_Bi_iter>::value_type;
+
+  _Context(_Bi_iter __begin, _Bi_iter __end, const _NFA<_Traits>& __nfa,
+  regex_constants::match_flag_type __flags,
+  _Search_mode __search_mode)
+  : _M_begin(__begin), _M_end(__end), _M_match_flags(__flags),
+  _M_nfa(__nfa), _M_search_mode(__search_mode) { }
+
+  bool
+  _M_is_word(_Char_type __ch) const
+  {
+   static const _Char_type __s[2] = { 'w' };
+   return _M_nfa._M_traits.isctype
+ (__ch, _M_nfa._M_traits.lookup_classname(__s, __s+1));
+  }
+
+  bool
+  _M_at_begin() const
+  {
+   return _M_current == _M_begin
+ && !(_M_match_flags & (regex_constants::match_not_bol
+| regex_constants::match_prev_avail));
+  }
+
+  bool
+  _M_at_end() const
+  {
+   return _M_current == _M_end
+

Re: [Patch, libstdc++] Add specific error message into exceptions

2015-09-19 Thread Tim Shen
On Wed, Sep 16, 2015 at 10:38 AM, Jonathan Wakely  wrote:
> On 12/09/15 01:57 +0000, Tim Shen wrote:
>>
>> Ok then, let's not appending dynamic location string, but only throw a
>> string literal pointer.
>
>
> This looks great, and a *huge* improvement on the current errors even
> without more precise location info.

I'm glad to hear this :).

> OK for trunk, thanks very much for doing this.

Tested & Committed as r227936.


-- 
Regards,
Tim Shen


Re: [v3 patch] refactoring - pull out common data members as _Context

2015-09-21 Thread Tim Shen
Hi,

As the changes grow
(https://github.com/innocentim/gcc/commits/master), it's getting
harder to rebase them onto svn trunk. Can we start slowly reviewing
and checking these in? Should I post them one by one to the lis?

These patches typically break one giant piece of code (mainly
_Executor::_M_dfs) into different smaller ones and fix mostly known
issues and standard conformance issues.


-- 
Regards,
Tim Shen


Re: [v3 patch] Fix spelling in regex headers

2013-11-06 Thread Tim Shen
On Wed, Nov 6, 2013 at 4:47 AM, Jonathan Wakely  wrote:
> This simply fixes s/boundry/boundary/, which I wanted to do first
> before some more regex refactoring I'm planning.

Thanks a lot Jonathan!


-- 
Regards,
Tim Shen


[Patch, libstdc++/63497] Avoid dereferencing invalid iterator in regex_executor

2014-10-20 Thread Tim Shen
Bootstrapped and tested.

Thanks!


-- 
Regards,
Tim Shen
commit 95c73ab6280c1f8182d018ee29a44230965dd4ef
Author: timshen 
Date:   Sun Oct 19 15:14:55 2014 -0700

PR libstdc++/63497
include/bits/regex_executor.h (_Executor::_M_word_boundary): Remove
const qualifier.
include/bits/regex_executor.tcc (_Executor::_M_dfs,
_Executor::_M_word_boundary): Avoid dereferecing _M_current at _M_end
or other invalid position.

diff --git a/libstdc++-v3/include/bits/regex_executor.h 
b/libstdc++-v3/include/bits/regex_executor.h
index cd9e55d..b867951 100644
--- a/libstdc++-v3/include/bits/regex_executor.h
+++ b/libstdc++-v3/include/bits/regex_executor.h
@@ -145,7 +145,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   bool
-  _M_word_boundary(_State<_TraitsT> __state) const;
+  _M_word_boundary(_State<_TraitsT> __state);
 
   bool
   _M_lookahead(_State<_TraitsT> __state);
diff --git a/libstdc++-v3/include/bits/regex_executor.tcc 
b/libstdc++-v3/include/bits/regex_executor.tcc
index 5eab852..9655c7a 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -284,9 +284,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
_M_dfs(__match_mode, __state._M_next);
  break;
case _S_opcode_match:
+ if (_M_current == _M_end)
+   break;
  if (__dfs_mode)
{
- if (_M_current != _M_end && __state._M_matches(*_M_current))
+ if (__state._M_matches(*_M_current))
{
  ++_M_current;
  _M_dfs(__match_mode, __state._M_next);
@@ -407,25 +409,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 bool _Executor<_BiIter, _Alloc, _TraitsT, __dfs_mode>::
-_M_word_boundary(_State<_TraitsT>) const
+_M_word_boundary(_State<_TraitsT>)
 {
-  // By definition.
-  bool __ans = false;
-  auto __pre = _M_current;
-  --__pre;
-  if (!(_M_at_begin() && _M_at_end()))
+  bool __left_is_word = false;
+  if (_M_current != _M_begin
+ || (_M_flags & regex_constants::match_prev_avail))
{
- if (_M_at_begin())
-   __ans = _M_is_word(*_M_current)
- && !(_M_flags & regex_constants::match_not_bow);
- else if (_M_at_end())
-   __ans = _M_is_word(*__pre)
- && !(_M_flags & regex_constants::match_not_eow);
- else
-   __ans = _M_is_word(*_M_current)
- != _M_is_word(*__pre);
+ --_M_current;
+ if (_M_is_word(*_M_current))
+   __left_is_word = true;
+ ++_M_current;
}
-  return __ans;
+  bool __right_is_word = false;
+  if (_M_current != _M_end && _M_is_word(*_M_current))
+   __right_is_word = true;
+
+  if (__left_is_word == __right_is_word)
+   return false;
+  if (__left_is_word && !(_M_flags & regex_constants::match_not_eow))
+   return true;
+  if (__right_is_word && !(_M_flags & regex_constants::match_not_bow))
+   return true;
+  return false;
 }
 
 _GLIBCXX_END_NAMESPACE_VERSION


  1   2   3   4   >