Re: [PATCH][RFC] vector creation from two parts of two vectors produces TBL rather than ins (PR93720)

2020-07-17 Thread Dmitrij Pochepko
Thank you! On 17.07.2020 12:25, Richard Sandiford wrote: Dmitrij Pochepko writes: Hi, please take a look at updated patch with all comments addressed (attached). Thanks, pushed to master with a slightly tweaked changelog. Richard

Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

2020-07-17 Thread Dmitrij Pochepko
Thank you! On 17.07.2020 12:21, Richard Sandiford wrote: Dmitrij Pochepko writes: Hi, I please take a look at new version (attached). Thanks, pushed to master with a slightly tweaked changelog. Richard

[PATCH] non-power-of-2 group size can be vectorized for 2-element vectors case (PR96208)

2020-07-15 Thread Dmitrij Pochepko
-gnu. Thanks, Dmitrij >From acf12c34f4bebbb5c6000a87bf9aaa58e48418bb Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Date: Wed, 15 Jul 2020 18:07:26 +0300 Subject: [PATCH] non-power-of-2 group size can be vectorized for 2-element vectors case (PR96208) Support for non-power-of-2 group size

Re: [PATCH][RFC] vector creation from two parts of two vectors produces TBL rather than ins (PR93720)

2020-07-14 Thread Dmitrij Pochepko
ST_WIDE_INT) i) > continue; > done > Very minor, but the coding conventions don't put a space before “++”. > So: > > > + for (unsigned HOST_WIDE_INT i = 0; i < nelt; i ++) > > …this should be “i++” too. done >From 9acc14f4cdd10091daa5311f495daac

Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

2020-07-13 Thread Dmitrij Pochepko
time. I only realised when testing the > patch locally. > > Thanks, > Richard I restricted tests to aarch64_little_endian, because I don't have big-endian setup to check it. >From 197a9bc05f96c3f100b3f4748c9dd12a60de86d1 Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Dat

Re: [PATCH][RFC] vector creation from two parts of two vectors produces TBL rather than ins (PR93720)

2020-07-10 Thread Dmitrij Pochepko
reate an instance of > code_for_aarch64_simd_vec_copy_lane (mode). done ... > > > +/* { dg-final { scan-assembler-times "\[ \t\]*ins\[ \t\]+v\[0-9\]+\.s" 4 } > > } */ > > Same comment as the other patch about using {…} regexp quoting. > done >From 8e7cfa2da40

Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

2020-07-10 Thread Dmitrij Pochepko
agree > “[i+1]” looks better.) done >From 34b6b0803111609ec5a0a615a8f03b78921e8412 Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Date: Fri, 10 Jul 2020 15:42:40 +0300 Subject: [PATCH] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199) The following patch enables vector permutat

Re: [PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

2020-07-07 Thread Dmitrij Pochepko
Thanks, > Richard >From 71a3f4b05edc462bcceba35ff738c6f1b5ca3f0a Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Date: Tue, 7 Jul 2020 18:45:06 +0300 Subject: [PATCH] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199) The following patch enables vector permutations optimizat

[PATCH][RFC] vector creation from two parts of two vectors produces TBL rather than ins (PR93720)

2020-06-17 Thread Dmitrij Pochepko
introduced by me with Andrew Pinksi being involved later. Please note that test in this patch depends on another commit (PR82199), which I sent not long ago. (I have no write access to repo) Thanks, Dmitrij >From d4ccbcdf67648a095706213a0fe0ac856bb077bb Mon Sep 17 00:00:00 2001 From: Dmit

[PATCH][RFC] __builtin_shuffle sometimes should produce zip1 rather than TBL (PR82199)

2020-06-11 Thread Dmitrij Pochepko
* gcc.target/aarch64/vzip_4.c: New test Co-Authored-By: Dmitrij Pochepko Thanks, Dmitrij >From 3c9f3fe834811386223755fc58e2ab4a612eefcf Mon Sep 17 00:00:00 2001 From: Dmitrij Pochepko Date: Thu, 11 Jun 2020 14:13:35 +0300 Subject: [PATCH] __builtin_shuffle sometimes should produce zip1 rat

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-10-01 Thread Dmitrij Pochepko
PM Dmitrij Pochepko > wrote: > > > > Hi, > > > > can anybody take a look at v2? > > +(if (tree_to_uhwi (@4) == 1 > + && tree_to_uhwi (@10) == 2 && tree_to_uhwi (@5) == 4 > > those will still ICE for large __int128_t constants.

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-24 Thread Dmitrij Pochepko
Hi, can anybody take a look at v2? Thanks, Dmitrij On Mon, Sep 09, 2019 at 10:03:40PM +0300, Dmitrij Pochepko wrote: > Hi all. > > Please take a look at v2 (attached). > I changed patch according to review comments. The same testing was performed > again. > > Thanks,

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-09 Thread Dmitrij Pochepko
Hi all. Please take a look at v2 (attached). I changed patch according to review comments. The same testing was performed again. Thanks, Dmitrij On Thu, Sep 05, 2019 at 06:34:49PM +0300, Dmitrij Pochepko wrote: > This patch adds matching for Hamming weight (popcount) implementation.

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-09 Thread Dmitrij Pochepko
Hi, thank you for looking into it. On Fri, Sep 06, 2019 at 12:13:34PM +, Wilco Dijkstra wrote: > Hi, > > +(simplify > + (convert > +(rshift > + (mult > > > is the outer convert really necessary? That is, if we change > > the simplification result to > > Indeed that should be "co

Re: [PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-09 Thread Dmitrij Pochepko
Hi, thank you for looking into it. On Fri, Sep 06, 2019 at 12:23:40PM +0200, Richard Biener wrote: > On Thu, Sep 5, 2019 at 5:35 PM Dmitrij Pochepko > wrote: > > > > This patch adds matching for Hamming weight (popcount) implementation. The > > following source

[PATCH] PR tree-optimization/90836 Missing popcount pattern matching

2019-09-05 Thread Dmitrij Pochepko
This patch adds matching for Hamming weight (popcount) implementation. The following sources: int foo64 (unsigned long long a) { unsigned long long b = a; b -= ((b>>1) & 0xULL); b = ((b>>2) & 0xULL) + (b & 0xULL); b = ((b>>4) + b) &