k-byte memset/memcpy/strlen builtins

2017-01-11 Thread Robin Dapp
Hi, When examining the performance of some test cases on s390 I realized that we could do better for constructs like 2-byte memcpys or 2-byte/4-byte memsets. Due to some s390-specific architectural properties, we could be faster by e.g. avoiding excessive unrolling and using dedicated memory instr

Re: k-byte memset/memcpy/strlen builtins

2017-01-12 Thread Robin Dapp
> Yes, for memset with larger element we could add an optab plus > internal function combination and use that when the target wants. Or > always use such IFN and fall back to loopy expansion. So, adding additional patterns in tree-loop-distribute.c (and mapping them to dedicated optabs) is fine?

Vectorization regression on s390x GCC6 vs GCC5

2017-01-26 Thread Robin Dapp
Hi, while analyzing a test case with a lot of nested loops (>7) and double floating point operations I noticed a performance regression of GCC 6/7 vs GCC 5 on s390x. It seems due to GCC 6 vectorizing something GCC 5 couldn't. Basically, each loop iterates over three dimensions, we fully unroll so

Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
>> To fix it, is it necessary to support 'vec_unpack' ? > > both same units would be sext, not vec_unpacks_{lo,hi} - the vectorizer > ties its hands by choosing vector types early and based on the number > of incoming/outgoing vectors it chooses one or the other method. > > More precise dumping w

Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
> it's target dependent what we choose first so it's going to be > a bit difficult to adjust testcases like this (and it looks like > a testsuite issue). I think for this specific testcase changing > scan-tree-dump-times to scan-tree-dump is reasonable. Note we > really want to check that for the

Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
>> I am wondering whether we do have some situations that >> vec_pack/vec_unpack/vec_widen_xxx/dot_prod pattern can be >> beneficial for RVV ? I have ever met some situation that vec_unpack >> can be beneficial when working on SELECT_VL but I don't which >> case > > With fixed size vectors y

Re: Question about wrapv-vect-reduc-dot-s8b.c

2023-08-30 Thread Robin Dapp via Gcc
> the dump-scans. Can we do sth like > "vect_recog_dot_prod_pattern: detected\n(!FAILED)*SUCCEEDED", thus > after the dot-prod pattern dumping allow arbitrary stuff but _not_ > a "failed" and then require a "succeeded"? It took some fighting with tcl syntax until I arrived at the regex pattern be

Re: gcc relies on RISC-V vcompress instruction undefined behaviour

2024-10-31 Thread Robin Dapp via Gcc
> Hi, > > I think gcc is relying on undefined behaviour with the vcompress instruction. > Unfortunately my test case isn't reproducing on mainline, but gcc looks to > use the fields between the last mask selected field and vl while setting > tail agnostic. > > This thread explains how vcompress is

Re: [RISC-V] vector segment load/store width as a riscv_tune_param

2025-03-25 Thread Robin Dapp via Gcc
I am revisiting an effort to make the number of lanes for vector segment load/store a tunable parameter. A year ago, Robin added minimal and not-yet-tunable common_vector_cost::segment_permute_[2-8] But it is tunable, just not a param? :) We have our own cost structure in our downstream repo,

Re: [RISC-V] vector segment load/store width as a riscv_tune_param

2025-03-26 Thread Robin Dapp via Gcc
You won't see failures in the testsuite. The failures only show-up when I attempt to impose huge costs on NF above threshold. A quick & dirty way to expose the bug is apply the appended patch, then observe that you get output from this only for mask_struct_store-*.c and not for mask_struct_load-*.