On Sun, Jan 19, 2014 at 4:59 PM, Tim Shen <[email protected]> wrote:
> Tested and committed.
It's quite interesting that after this change in the patch:
- this->_M_quantifier();
+ while (this->_M_quantifier());
...even the regex input is nothing to do with quantifiers at all (say
regex re(" ")), g++ -O3 generates slower code than -O2:
~ # g++ -O2 perf.cc && time ./a.out
./a.out 0.46s user 0.00s system 99% cpu 0.461 total
~ # g++ -O3 perf.cc && time ./a.out
./a.out 0.56s user 0.00s system 99% cpu 0.569 total
perf.cc is almost the same as testsuite/performance/28_regex/split.cc.
Following the man page, I found that g++ claims that the difference
between -O3 and -O2 are:
~ # /usr/bin/g++ -c -Q -O3 --help=optimizers > /tmp/O3-opts
~ # /usr/bin/g++ -c -Q -O2 --help=optimizers > /tmp/O2-opts
~ # diff /tmp/O2-opts /tmp/O3-opts | grep enabled
> -fgcse-after-reload [enabled]
> -finline-functions [enabled]
> -fipa-cp-clone [enabled]
> -fpredictive-commoning [enabled]
> -ftree-loop-distribute-patterns [enabled]
> -ftree-loop-vectorize [enabled]
> -ftree-partial-pre [enabled]
> -ftree-slp-vectorize [enabled]
> -funswitch-loops [enabled]
However, -O2 with those flags give me a postive result:
~ # g++ -O2 perf.cc -fgcse-after-reload -finline-functions
-fipa-cp-clone -fpredictive-commoning -ftree-loop-distribute-patterns
-ftree-loop-vectorize -ftree-partial-pre -ftree-slp-vectorize
-funswitch-loops && time ./a.out
./a.out 0.45s user 0.01s system 99% cpu 0.460 total
By the way, my "g++" is alas to "g++ -g -Wall -std=c++11", and I'm
sure -g doesn't matter.
I don't know much about it. Is there any explanations as interesting
as the question? ;)
Thank you!
--
Regards,
Tim Shen