[PATCH] D67368: [NFCI]Create CommonAttributeInfo Type as base type of *Attr and ParsedAttr.
sebpop added a comment. I still see a link error on aarch64-linux on master: /usr/bin/ld: tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/AttrImpl.cpp.o: in function `clang::AttributeCommonInfo::getAttributeSpellingListIndex() const': /home/ubuntu/llvm-project/llvm/tools/clang/include/clang/Basic/AttributeCommonInfo.h:166: undefined reference to `clang::AttributeCommonInfo::calculateAttributeSpellingListIndex() const' collect2: error: ld returned 1 exit status I reverted locally this patch and it finishes building. Repository: rL LLVM CHANGES SINCE LAST ACTION https://reviews.llvm.org/D67368/new/ https://reviews.llvm.org/D67368 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D91157: [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
sebpop added a comment. Herald added a subscriber: MaskRay. Herald added a project: All. Hi Pavel, We need to handle one more case for __sync_* builtins, please see testcase and patches applied to GCC: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162 Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91157/new/ https://reviews.llvm.org/D91157 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D91157: [AArch64] Out-of-line atomics (-moutline-atomics) implementation.
sebpop added a comment. I tested this change on Graviton2 aarch64-linux by building https://github.com/xianyi/OpenBLAS with `clang -O3 -moutline-atomics` and `make test`: all tests pass with and without outline-atomics. Clang was configured to use libgcc. I also tested https://github.com/boostorg/boost.git with and without -moutline-atomics, and there are no new fails. Here is how I built and ran the tests for boost: git clone --recursive https://github.com/boostorg/boost.git $HOME/boost cd $HOME/boost mkdir usr ./bootstrap.sh --prefix=$HOME/boost/usr # in project-config.jam line 12 # replace `using gcc ;` with `using clang : : $HOME/llvm-project/usr/bin/clang++ ;` ./b2 --build-type=complete --layout=versioned -a cd status ../b2 # runs all regression tests I also looked at the performance of some atomic operations using google-benchmark on Ubuntu 20.04 c6g instance with Graviton2 (Neoverse-N1). Performance is better when using LSE instructions compared to generic armv8-a code. The overhead of -moutline-atomics is negligible compared to armv8-a+lse. clang trunk as of today produces slightly slower code than gcc-9 with and without -moutline-atomics. $ cat a.cc #include #include std::atomic i; static void BM_atomic_increment(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(i++); } BENCHMARK(BM_atomic_increment); int j; static void BM_atomic_fetch_add(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(__atomic_fetch_add(&j, 1, __ATOMIC_SEQ_CST)); } BENCHMARK(BM_atomic_fetch_add); int k; static void BM_atomic_compare_exchange(benchmark::State& state) { for (auto _ : state) benchmark::DoNotOptimize(__atomic_compare_exchange (&j, &k, &k, 1, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE)); } BENCHMARK(BM_atomic_compare_exchange); template struct node { T data; node* next; node(const T& data) : data(data), next(nullptr) {} }; static void BM_std_atomic_compare_exchange(benchmark::State& state) { node* new_node = new node(42); std::atomic*> head; for (auto _ : state) benchmark::DoNotOptimize(std::atomic_compare_exchange_weak_explicit (&head, &new_node->next, new_node, std::memory_order_release, std::memory_order_relaxed)); } BENCHMARK(BM_std_atomic_compare_exchange); BENCHMARK_MAIN(); --- $ ./go.sh + g++ -o generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./generic-v8 2020-12-06 01:06:26 Running ./generic-v8 Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.36, 59.36, 36.41 ***WARNING*** Library was built as DEBUG. Timings may be affected. - Benchmark Time CPU Iterations - BM_atomic_increment 7.21 ns 7.20 ns 97116662 BM_atomic_fetch_add 7.20 ns 7.20 ns 97152394 BM_atomic_compare_exchange 7.71 ns 7.71 ns 90780423 BM_std_atomic_compare_exchange 7.61 ns 7.61 ns 92037159 + /home/ubuntu/llvm-project/nin/bin/clang++ -o clang-generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./clang-generic-v8 2020-12-06 01:06:30 Running ./clang-generic-v8 Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.57, 59.49, 36.57 ***WARNING*** Library was built as DEBUG. Timings may be affected. - Benchmark Time CPU Iterations - BM_atomic_increment 9.21 ns 9.21 ns 75989223 BM_atomic_fetch_add 9.21 ns 9.21 ns 76031211 BM_atomic_compare_exchange 7.61 ns 7.61 ns 92012620 BM_std_atomic_compare_exchange 12.4 ns 12.4 ns 56421424 + g++ -o lse -march=armv8-a+lse a.cc -std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread + ./lse 2020-12-06 01:06:34 Running ./lse Run on (64 X 243.75 MHz CPU s) CPU Caches: L1 Data 64 KiB (x64) L1 Instruction 64 KiB (x64) L2 Unified 1024 KiB (x64) L3 Unified 32768 KiB (x1) Load Average: 64.85, 59.63, 36.74 ***WARNING*** Library was built as DEBUG. Timings may b
[PATCH] D27068: Improve string::find
This revision was automatically updated to reflect the committed changes. Closed by commit rL290761: improve performance of string::find (authored by spop). Changed prior to commit: https://reviews.llvm.org/D27068?vs=80134&id=82731#toc Repository: rL LLVM https://reviews.llvm.org/D27068 Files: libcxx/trunk/benchmarks/string.bench.cpp libcxx/trunk/include/__string Index: libcxx/trunk/include/__string === --- libcxx/trunk/include/__string +++ libcxx/trunk/include/__string @@ -538,19 +538,59 @@ return static_cast<_SizeT>(__r - __p); } +template +inline _LIBCPP_CONSTEXPR_AFTER_CXX11 const _CharT * +__search_substring(const _CharT *__first1, const _CharT *__last1, + const _CharT *__first2, const _CharT *__last2) { + // Take advantage of knowing source and pattern lengths. + // Stop short when source is smaller than pattern. + const ptrdiff_t __len2 = __last2 - __first2; + if (__len2 == 0) +return __first1; + + ptrdiff_t __len1 = __last1 - __first1; + if (__len1 < __len2) +return __last1; + + // First element of __first2 is loop invariant. + _CharT __f2 = *__first2; + while (true) { +__len1 = __last1 - __first1; +// Check whether __first1 still has at least __len2 bytes. +if (__len1 < __len2) + return __last1; + +// Find __f2 the first byte matching in __first1. +__first1 = _Traits::find(__first1, __len1 - __len2 + 1, __f2); +if (__first1 == 0) + return __last1; + +// It is faster to compare from the first byte of __first1 even if we +// already know that it matches the first byte of __first2: this is because +// __first2 is most likely aligned, as it is user's "pattern" string, and +// __first1 + 1 is most likely not aligned, as the match is in the middle of +// the string. +if (_Traits::compare(__first1, __first2, __len2) == 0) + return __first1; + +++__first1; + } +} + template inline _SizeT _LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_INLINE_VISIBILITY __str_find(const _CharT *__p, _SizeT __sz, const _CharT* __s, _SizeT __pos, _SizeT __n) _NOEXCEPT { -if (__pos > __sz || __sz - __pos < __n) +if (__pos > __sz) return __npos; -if (__n == 0) + +if (__n == 0) // There is nothing to search, just return __pos. return __pos; -const _CharT* __r = -_VSTD::__search(__p + __pos, __p + __sz, -__s, __s + __n, _Traits::eq, -random_access_iterator_tag(), random_access_iterator_tag()).first; + +const _CharT *__r = __search_substring<_CharT, _Traits>( +__p + __pos, __p + __sz, __s, __s + __n); + if (__r == __p + __sz) return __npos; return static_cast<_SizeT>(__r - __p); Index: libcxx/trunk/benchmarks/string.bench.cpp === --- libcxx/trunk/benchmarks/string.bench.cpp +++ libcxx/trunk/benchmarks/string.bench.cpp @@ -0,0 +1,49 @@ +#include +#include +#include + +#include "benchmark/benchmark_api.h" +#include "GenerateInput.hpp" + +constexpr std::size_t MAX_STRING_LEN = 8 << 14; + +// Benchmark when there is no match. +static void BM_StringFindNoMatch(benchmark::State &state) { + std::string s1(state.range(0), '-'); + std::string s2(8, '*'); + while (state.KeepRunning()) +benchmark::DoNotOptimize(s1.find(s2)); +} +BENCHMARK(BM_StringFindNoMatch)->Range(10, MAX_STRING_LEN); + +// Benchmark when the string matches first time. +static void BM_StringFindAllMatch(benchmark::State &state) { + std::string s1(MAX_STRING_LEN, '-'); + std::string s2(state.range(0), '-'); + while (state.KeepRunning()) +benchmark::DoNotOptimize(s1.find(s2)); +} +BENCHMARK(BM_StringFindAllMatch)->Range(1, MAX_STRING_LEN); + +// Benchmark when the string matches somewhere in the end. +static void BM_StringFindMatch1(benchmark::State &state) { + std::string s1(MAX_STRING_LEN / 2, '*'); + s1 += std::string(state.range(0), '-'); + std::string s2(state.range(0), '-'); + while (state.KeepRunning()) +benchmark::DoNotOptimize(s1.find(s2)); +} +BENCHMARK(BM_StringFindMatch1)->Range(1, MAX_STRING_LEN / 4); + +// Benchmark when the string matches somewhere from middle to the end. +static void BM_StringFindMatch2(benchmark::State &state) { + std::string s1(MAX_STRING_LEN / 2, '*'); + s1 += std::string(state.range(0), '-'); + s1 += std::string(state.range(0), '*'); + std::string s2(state.range(0), '-'); + while (state.KeepRunning()) +benchmark::DoNotOptimize(s1.find(s2)); +} +BENCHMARK(BM_StringFindMatch2)->Range(1, MAX_STRING_LEN / 4); + +BENCHMARK_MAIN() ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27068: Improve string::find
sebpop added a comment. In https://reviews.llvm.org/D27068#632685, @EricWF wrote: > Holy crap those improvements are impressive. You're welcome. > This LGTM. I'm assuming @mclow.lists has nothing left to say about this. Thanks for your review and LGTM. I addressed your last comment, added some more comments, tested on x86_64-linux, and committed. Repository: rL LLVM https://reviews.llvm.org/D27068 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D30035: Add const to function parameters
sebpop added a comment. Looks good to me. https://reviews.llvm.org/D30035 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26991: Hoist redundant load
sebpop added a comment. Let's also add a testcase and show the performance improvement. https://reviews.llvm.org/D26991 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27068: Improve string::find
sebpop added a comment. Please also post the performance numbers from the added benchmarks with and without the change to the lib. Comment at: libcxx/benchmarks/string.bench.cpp:12 +// until the end of s1. +static void BM_StringFindPhase1(benchmark::State& state) { + // Benchmark following the length of s1. Let's call this benchmark "BM_StringFindNoMatch" Comment at: libcxx/benchmarks/string.bench.cpp:21 + +// Benchmark the __phase2 part of string search: we want the strings s1 and s2 +// to match from the first char in s1. Please reword to remove the reference to __phase2. Comment at: libcxx/benchmarks/string.bench.cpp:23 +// to match from the first char in s1. +static void BM_StringFindPhase2(benchmark::State& state) { + std::string s1(MAX_STRING_LEN, '-'); Let's call this benchmark "BM_StringFindAllMatch" https://reviews.llvm.org/D27068 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D26991: Hoist redundant load
sebpop added a comment. In https://reviews.llvm.org/D26991#606764, @mclow.lists wrote: > /me wonders what the perf difference when `_LIBCPP_UNROLL_LOOPS` is defined > or not. > > I think this (`_LIBCPP_UNROLL_LOOPS`) falls squarely into Chandler's request > that we complain to him when the compiler generates sub-optimal code, instead > of doing things like manually unrolling loops. I think we should remove all the code in the ifdef: it looks to me like this code was left in from a previous "experiment", as it never gets executed unless one compiles the code with -D_LIBCPP_UNROLL_LOOPS, which seems wrong. Let's push this cleanup in a separate commit. https://reviews.llvm.org/D26991 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27068: Improve string::find
sebpop added a comment. In https://reviews.llvm.org/D27068#606757, @mclow.lists wrote: > Definitely want to see numbers. master: Benchmark Time CPU Iterations --- BM_StringFindPhase1/104 ns 4 ns 161568945 BM_StringFindPhase1/64 28 ns 28 ns 24937005 BM_StringFindPhase1/512 132 ns132 ns5321432 BM_StringFindPhase1/4k 956 ns956 ns 732038 BM_StringFindPhase1/32k7622 ns 7621 ns 92121 BM_StringFindPhase1/128k 30485 ns 30483 ns 22938 BM_StringFindPhase2/1 4 ns 4 ns 191884173 BM_StringFindPhase2/8 7 ns 7 ns 105129099 BM_StringFindPhase2/64 34 ns 34 ns 20582485 BM_StringFindPhase2/512 187 ns187 ns3736654 BM_StringFindPhase2/4k 1415 ns 1414 ns 495342 BM_StringFindPhase2/32k 11221 ns 11220 ns 62393 BM_StringFindPhase2/128k 44952 ns 44949 ns 15595 master + patch: Benchmark Time CPU Iterations --- BM_StringFindPhase1/103 ns 3 ns 204725158 BM_StringFindPhase1/645 ns 5 ns 146268957 BM_StringFindPhase1/512 12 ns 12 ns 60176557 BM_StringFindPhase1/4k 67 ns 67 ns 10508533 BM_StringFindPhase1/32k 503 ns503 ns100 BM_StringFindPhase1/128k 2396 ns 2396 ns 292701 BM_StringFindPhase2/1 6 ns 6 ns 127378641 BM_StringFindPhase2/8 6 ns 6 ns 117407388 BM_StringFindPhase2/647 ns 7 ns 95998016 BM_StringFindPhase2/512 18 ns 18 ns 38135928 BM_StringFindPhase2/4k 83 ns 83 ns8452337 BM_StringFindPhase2/32k 762 ns762 ns 925744 BM_StringFindPhase2/128k 3255 ns 3255 ns 215141 https://reviews.llvm.org/D27068 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D27068: Improve string::find
sebpop added a comment. The numbers are from an x86_64-linux machine: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz Overall the patch is a win on the synthetic benchmark. We have also seen this in a proprietary benchmark where it accounted for about 10% speedup. https://reviews.llvm.org/D27068 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D24991: Inline hot functions in libcxx shared_ptr implementation.
sebpop added a comment. @mclow.lists could you please have a last look at this patch: the change is for a performance improvement (20% uplift on a proprietary benchmark), and all the issues mentioned in the review have been addressed. The existing synthetic benchmark shows an overall improvement: master: Benchmark Time CPU Iterations BM_SharedPtrCreateDestroy 54 ns 54 ns 12388755 BM_SharedPtrIncDecRef 37 ns 37 ns 19021739 BM_WeakPtrIncDecRef 38 ns 38 ns 18421053 master + patch: Benchmark Time CPU Iterations BM_SharedPtrCreateDestroy 44 ns 44 ns 14730639 BM_SharedPtrIncDecRef 18 ns 18 ns 3889 BM_WeakPtrIncDecRef 30 ns 30 ns 23648649 https://reviews.llvm.org/D24991 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits