[PATCH] D67368: [NFCI]Create CommonAttributeInfo Type as base type of *Attr and ParsedAttr.

2019-09-14 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

I still see a link error on aarch64-linux on master:

  /usr/bin/ld: tools/clang/lib/AST/CMakeFiles/obj.clangAST.dir/AttrImpl.cpp.o: 
in function `clang::AttributeCommonInfo::getAttributeSpellingListIndex() const':
  
/home/ubuntu/llvm-project/llvm/tools/clang/include/clang/Basic/AttributeCommonInfo.h:166:
 undefined reference to 
`clang::AttributeCommonInfo::calculateAttributeSpellingListIndex() const'
  collect2: error: ld returned 1 exit status

I reverted locally this patch and it finishes building.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D67368/new/

https://reviews.llvm.org/D67368



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91157: [AArch64] Out-of-line atomics (-moutline-atomics) implementation.

2022-05-16 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.
Herald added a subscriber: MaskRay.
Herald added a project: All.

Hi Pavel,

We need to handle one more case for __sync_* builtins, please see testcase and 
patches applied to GCC:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105162


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91157/new/

https://reviews.llvm.org/D91157

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D91157: [AArch64] Out-of-line atomics (-moutline-atomics) implementation.

2020-12-05 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

I tested this change on Graviton2 aarch64-linux by building 
https://github.com/xianyi/OpenBLAS with `clang -O3 -moutline-atomics` and `make 
test`: all tests pass with and without outline-atomics.
Clang was configured to use libgcc.

I also tested https://github.com/boostorg/boost.git with and without 
-moutline-atomics, and there are no new fails.
Here is how I built and ran the tests for boost:

  git clone --recursive https://github.com/boostorg/boost.git $HOME/boost
  cd $HOME/boost
  mkdir usr
  ./bootstrap.sh --prefix=$HOME/boost/usr
  # in project-config.jam line 12
  # replace `using gcc ;` with `using clang :   : 
$HOME/llvm-project/usr/bin/clang++ ;`
  ./b2 --build-type=complete --layout=versioned -a
  cd status
  ../b2  # runs all regression tests

I also looked at the performance of some atomic operations using 
google-benchmark on Ubuntu 20.04 c6g instance with Graviton2 (Neoverse-N1).
Performance is better when using LSE instructions compared to generic armv8-a 
code. 
The overhead of -moutline-atomics is negligible compared to armv8-a+lse.
clang trunk as of today produces slightly slower code than gcc-9 with and 
without -moutline-atomics.

  $ cat a.cc
  #include 
  #include 
  
  std::atomic i;
  static void BM_atomic_increment(benchmark::State& state) {
for (auto _ : state)
  benchmark::DoNotOptimize(i++);
  }
  BENCHMARK(BM_atomic_increment);
  
  int j;
  static void BM_atomic_fetch_add(benchmark::State& state) {
for (auto _ : state)
  benchmark::DoNotOptimize(__atomic_fetch_add(&j, 1, __ATOMIC_SEQ_CST));
  }
  BENCHMARK(BM_atomic_fetch_add);
  
  int k;
  static void BM_atomic_compare_exchange(benchmark::State& state) {
for (auto _ : state)
  benchmark::DoNotOptimize(__atomic_compare_exchange
   (&j, &k, &k, 1, __ATOMIC_ACQUIRE, 
__ATOMIC_ACQUIRE));
  }
  BENCHMARK(BM_atomic_compare_exchange);
  
  
  template
  struct node {
T data;
node* next;
node(const T& data) : data(data), next(nullptr) {}
  };
  
  static void BM_std_atomic_compare_exchange(benchmark::State& state) {
node* new_node = new node(42);
std::atomic*> head;
for (auto _ : state)
  benchmark::DoNotOptimize(std::atomic_compare_exchange_weak_explicit
   (&head, &new_node->next, new_node,
std::memory_order_release,
std::memory_order_relaxed));
  }
  BENCHMARK(BM_std_atomic_compare_exchange);
  
  BENCHMARK_MAIN();
  
  ---
  $ ./go.sh
  + g++ -o generic-v8 a.cc -std=c++11 -O2 -isystem benchmark/include 
-Lbenchmark/build/src -lbenchmark -lpthread
  + ./generic-v8
  2020-12-06 01:06:26
  Running ./generic-v8
  Run on (64 X 243.75 MHz CPU s)
  CPU Caches:
L1 Data 64 KiB (x64)
L1 Instruction 64 KiB (x64)
L2 Unified 1024 KiB (x64)
L3 Unified 32768 KiB (x1)
  Load Average: 64.36, 59.36, 36.41
  ***WARNING*** Library was built as DEBUG. Timings may be affected.
  -
  Benchmark   Time CPU   Iterations
  -
  BM_atomic_increment  7.21 ns 7.20 ns 97116662
  BM_atomic_fetch_add  7.20 ns 7.20 ns 97152394
  BM_atomic_compare_exchange   7.71 ns 7.71 ns 90780423
  BM_std_atomic_compare_exchange   7.61 ns 7.61 ns 92037159
  + /home/ubuntu/llvm-project/nin/bin/clang++ -o clang-generic-v8 a.cc 
-std=c++11 -O2 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark 
-lpthread
  + ./clang-generic-v8
  2020-12-06 01:06:30
  Running ./clang-generic-v8
  Run on (64 X 243.75 MHz CPU s)
  CPU Caches:
L1 Data 64 KiB (x64)
L1 Instruction 64 KiB (x64)
L2 Unified 1024 KiB (x64)
L3 Unified 32768 KiB (x1)
  Load Average: 64.57, 59.49, 36.57
  ***WARNING*** Library was built as DEBUG. Timings may be affected.
  -
  Benchmark   Time CPU   Iterations
  -
  BM_atomic_increment  9.21 ns 9.21 ns 75989223
  BM_atomic_fetch_add  9.21 ns 9.21 ns 76031211
  BM_atomic_compare_exchange   7.61 ns 7.61 ns 92012620
  BM_std_atomic_compare_exchange   12.4 ns 12.4 ns 56421424
  + g++ -o lse -march=armv8-a+lse a.cc -std=c++11 -O2 -isystem 
benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread
  + ./lse
  2020-12-06 01:06:34
  Running ./lse
  Run on (64 X 243.75 MHz CPU s)
  CPU Caches:
L1 Data 64 KiB (x64)
L1 Instruction 64 KiB (x64)
L2 Unified 1024 KiB (x64)
L3 Unified 32768 KiB (x1)
  Load Average: 64.85, 59.63, 36.74
  ***WARNING*** Library was built as DEBUG. Timings may b

[PATCH] D27068: Improve string::find

2016-12-30 Thread Sebastian Pop via Phabricator via cfe-commits
This revision was automatically updated to reflect the committed changes.
Closed by commit rL290761: improve performance of string::find (authored by 
spop).

Changed prior to commit:
  https://reviews.llvm.org/D27068?vs=80134&id=82731#toc

Repository:
  rL LLVM

https://reviews.llvm.org/D27068

Files:
  libcxx/trunk/benchmarks/string.bench.cpp
  libcxx/trunk/include/__string

Index: libcxx/trunk/include/__string
===
--- libcxx/trunk/include/__string
+++ libcxx/trunk/include/__string
@@ -538,19 +538,59 @@
 return static_cast<_SizeT>(__r - __p);
 }
 
+template 
+inline _LIBCPP_CONSTEXPR_AFTER_CXX11 const _CharT *
+__search_substring(const _CharT *__first1, const _CharT *__last1,
+   const _CharT *__first2, const _CharT *__last2) {
+  // Take advantage of knowing source and pattern lengths.
+  // Stop short when source is smaller than pattern.
+  const ptrdiff_t __len2 = __last2 - __first2;
+  if (__len2 == 0)
+return __first1;
+
+  ptrdiff_t __len1 = __last1 - __first1;
+  if (__len1 < __len2)
+return __last1;
+
+  // First element of __first2 is loop invariant.
+  _CharT __f2 = *__first2;
+  while (true) {
+__len1 = __last1 - __first1;
+// Check whether __first1 still has at least __len2 bytes.
+if (__len1 < __len2)
+  return __last1;
+
+// Find __f2 the first byte matching in __first1.
+__first1 = _Traits::find(__first1, __len1 - __len2 + 1, __f2);
+if (__first1 == 0)
+  return __last1;
+
+// It is faster to compare from the first byte of __first1 even if we
+// already know that it matches the first byte of __first2: this is because
+// __first2 is most likely aligned, as it is user's "pattern" string, and
+// __first1 + 1 is most likely not aligned, as the match is in the middle of
+// the string.
+if (_Traits::compare(__first1, __first2, __len2) == 0)
+  return __first1;
+
+++__first1;
+  }
+}
+
 template
 inline _SizeT _LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_INLINE_VISIBILITY
 __str_find(const _CharT *__p, _SizeT __sz, 
const _CharT* __s, _SizeT __pos, _SizeT __n) _NOEXCEPT
 {
-if (__pos > __sz || __sz - __pos < __n)
+if (__pos > __sz)
 return __npos;
-if (__n == 0)
+
+if (__n == 0) // There is nothing to search, just return __pos.
 return __pos;
-const _CharT* __r = 
-_VSTD::__search(__p + __pos, __p + __sz,
-__s, __s + __n, _Traits::eq,
-random_access_iterator_tag(), random_access_iterator_tag()).first;
+
+const _CharT *__r = __search_substring<_CharT, _Traits>(
+__p + __pos, __p + __sz, __s, __s + __n);
+
 if (__r == __p + __sz)
 return __npos;
 return static_cast<_SizeT>(__r - __p);
Index: libcxx/trunk/benchmarks/string.bench.cpp
===
--- libcxx/trunk/benchmarks/string.bench.cpp
+++ libcxx/trunk/benchmarks/string.bench.cpp
@@ -0,0 +1,49 @@
+#include 
+#include 
+#include 
+
+#include "benchmark/benchmark_api.h"
+#include "GenerateInput.hpp"
+
+constexpr std::size_t MAX_STRING_LEN = 8 << 14;
+
+// Benchmark when there is no match.
+static void BM_StringFindNoMatch(benchmark::State &state) {
+  std::string s1(state.range(0), '-');
+  std::string s2(8, '*');
+  while (state.KeepRunning())
+benchmark::DoNotOptimize(s1.find(s2));
+}
+BENCHMARK(BM_StringFindNoMatch)->Range(10, MAX_STRING_LEN);
+
+// Benchmark when the string matches first time.
+static void BM_StringFindAllMatch(benchmark::State &state) {
+  std::string s1(MAX_STRING_LEN, '-');
+  std::string s2(state.range(0), '-');
+  while (state.KeepRunning())
+benchmark::DoNotOptimize(s1.find(s2));
+}
+BENCHMARK(BM_StringFindAllMatch)->Range(1, MAX_STRING_LEN);
+
+// Benchmark when the string matches somewhere in the end.
+static void BM_StringFindMatch1(benchmark::State &state) {
+  std::string s1(MAX_STRING_LEN / 2, '*');
+  s1 += std::string(state.range(0), '-');
+  std::string s2(state.range(0), '-');
+  while (state.KeepRunning())
+benchmark::DoNotOptimize(s1.find(s2));
+}
+BENCHMARK(BM_StringFindMatch1)->Range(1, MAX_STRING_LEN / 4);
+
+// Benchmark when the string matches somewhere from middle to the end.
+static void BM_StringFindMatch2(benchmark::State &state) {
+  std::string s1(MAX_STRING_LEN / 2, '*');
+  s1 += std::string(state.range(0), '-');
+  s1 += std::string(state.range(0), '*');
+  std::string s2(state.range(0), '-');
+  while (state.KeepRunning())
+benchmark::DoNotOptimize(s1.find(s2));
+}
+BENCHMARK(BM_StringFindMatch2)->Range(1, MAX_STRING_LEN / 4);
+
+BENCHMARK_MAIN()
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27068: Improve string::find

2016-12-30 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

In https://reviews.llvm.org/D27068#632685, @EricWF wrote:

> Holy crap those improvements are impressive.


You're welcome.

> This LGTM. I'm assuming @mclow.lists has nothing left to say about this.

Thanks for your review and LGTM.
I addressed your last comment, added some more comments, tested on 
x86_64-linux, and committed.


Repository:
  rL LLVM

https://reviews.llvm.org/D27068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D30035: Add const to function parameters

2017-02-16 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

Looks good to me.


https://reviews.llvm.org/D30035



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26991: Hoist redundant load

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

Let's also add a testcase and show the performance improvement.


https://reviews.llvm.org/D26991



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27068: Improve string::find

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

Please also post the performance numbers from the added benchmarks with and 
without the change to the lib.




Comment at: libcxx/benchmarks/string.bench.cpp:12
+// until the end of s1.
+static void BM_StringFindPhase1(benchmark::State& state) {
+  // Benchmark following the length of s1.

Let's call this benchmark "BM_StringFindNoMatch"



Comment at: libcxx/benchmarks/string.bench.cpp:21
+
+// Benchmark the __phase2 part of string search: we want the strings s1 and s2
+// to match from the first char in s1.

Please reword to remove the reference to __phase2.



Comment at: libcxx/benchmarks/string.bench.cpp:23
+// to match from the first char in s1.
+static void BM_StringFindPhase2(benchmark::State& state) {
+  std::string s1(MAX_STRING_LEN, '-');

Let's call this benchmark "BM_StringFindAllMatch"


https://reviews.llvm.org/D27068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D26991: Hoist redundant load

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

In https://reviews.llvm.org/D26991#606764, @mclow.lists wrote:

> /me wonders what the perf difference when `_LIBCPP_UNROLL_LOOPS` is defined 
> or  not.
>
> I think this (`_LIBCPP_UNROLL_LOOPS`) falls squarely into Chandler's request 
> that we complain to him when the compiler generates sub-optimal code, instead 
> of doing things like manually unrolling loops.


I think we should remove all the code in the ifdef: it looks to me like this 
code was left in from a previous "experiment", as it never gets executed unless 
one compiles the code with -D_LIBCPP_UNROLL_LOOPS, which seems wrong.  Let's 
push this cleanup in a separate commit.


https://reviews.llvm.org/D26991



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27068: Improve string::find

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

In https://reviews.llvm.org/D27068#606757, @mclow.lists wrote:

> Definitely want to see numbers.




  master:
  Benchmark Time   CPU Iterations
  ---
  BM_StringFindPhase1/104 ns  4 ns  161568945
  BM_StringFindPhase1/64   28 ns 28 ns   24937005
  BM_StringFindPhase1/512 132 ns132 ns5321432
  BM_StringFindPhase1/4k  956 ns956 ns 732038
  BM_StringFindPhase1/32k7622 ns   7621 ns  92121
  BM_StringFindPhase1/128k  30485 ns  30483 ns  22938
  BM_StringFindPhase2/1 4 ns  4 ns  191884173
  BM_StringFindPhase2/8 7 ns  7 ns  105129099
  BM_StringFindPhase2/64   34 ns 34 ns   20582485
  BM_StringFindPhase2/512 187 ns187 ns3736654
  BM_StringFindPhase2/4k 1415 ns   1414 ns 495342
  BM_StringFindPhase2/32k   11221 ns  11220 ns  62393
  BM_StringFindPhase2/128k  44952 ns  44949 ns  15595
  
  master + patch:
  Benchmark Time   CPU Iterations
  ---
  BM_StringFindPhase1/103 ns  3 ns  204725158
  BM_StringFindPhase1/645 ns  5 ns  146268957
  BM_StringFindPhase1/512  12 ns 12 ns   60176557
  BM_StringFindPhase1/4k   67 ns 67 ns   10508533
  BM_StringFindPhase1/32k 503 ns503 ns100
  BM_StringFindPhase1/128k   2396 ns   2396 ns 292701
  BM_StringFindPhase2/1 6 ns  6 ns  127378641
  BM_StringFindPhase2/8 6 ns  6 ns  117407388
  BM_StringFindPhase2/647 ns  7 ns   95998016
  BM_StringFindPhase2/512  18 ns 18 ns   38135928
  BM_StringFindPhase2/4k   83 ns 83 ns8452337
  BM_StringFindPhase2/32k 762 ns762 ns 925744
  BM_StringFindPhase2/128k   3255 ns   3255 ns 215141


https://reviews.llvm.org/D27068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D27068: Improve string::find

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

The numbers are from an x86_64-linux machine: Intel(R) Core(TM) i7-4790K CPU @ 
4.00GHz
Overall the patch is a win on the synthetic benchmark.
We have also seen this in a proprietary benchmark where it accounted for about 
10% speedup.


https://reviews.llvm.org/D27068



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D24991: Inline hot functions in libcxx shared_ptr implementation.

2016-11-28 Thread Sebastian Pop via Phabricator via cfe-commits
sebpop added a comment.

@mclow.lists could you please have a last look at this patch: the change is for 
a performance improvement (20% uplift on a proprietary benchmark), and all the 
issues mentioned in the review have been addressed.
The existing synthetic benchmark shows an overall improvement:

  master:
  Benchmark  Time   CPU Iterations
  
  BM_SharedPtrCreateDestroy 54 ns 54 ns   12388755
  BM_SharedPtrIncDecRef 37 ns 37 ns   19021739
  BM_WeakPtrIncDecRef   38 ns 38 ns   18421053
   
  master + patch:
  Benchmark  Time   CPU Iterations
  
  BM_SharedPtrCreateDestroy 44 ns 44 ns   14730639
  BM_SharedPtrIncDecRef 18 ns 18 ns   3889
  BM_WeakPtrIncDecRef   30 ns 30 ns   23648649


https://reviews.llvm.org/D24991



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits