> This would mean that StructFlags and ClassFlags will also both have a
> wrong value as well.
Yes, can confirm that m_flags = 0 (instead of 1) for a struct containing
a pointer.
> If there's a compiler/library discrepancy, the compiler should be
> adjusted to write out the value at the correct s
Hi,
the unicode tables in std.internal.unicode_tables are apparently auto
generated and loaded at (libphobos) compile time. They are also in
little endian format. Is the tool to generate them available somewhere?
I wanted to start converting them to little endian before loading but
this will pr
Hi,
> Are the values inside the tables the problem? Or just some of the
> helper functions/templates that interact with them to generate the
> static data?
>
> If the latter, then a rebuild of the files may not be necessary.
I managed to get this to work without rebuilding the files. After
chec
Hi,
this patch adds the pipeline description and the cpu model number for
arch13.
Bootstrapped and regtested on s390x.
Regards
Robin
--
gcc/ChangeLog:
2019-04-10 Robin Dapp
* config/s390/8561.md: New file.
* config/s390/driver-native.c (s390_host_detect_local_cpu): Add
Hi Rainer,
> This will occur on any 32-bit target. The following patch (using
> ssize_t instead) allowed the code to compile:
thanks, included your fix and attempted a more generic version of the
186 test.
I also continued debugging some fails further:
- Most of the MurmurHash fails are simply
Hi,
> + Establish an ANTI dependency between r11 and r15 restores from FPRs
> + to prevent the instructions scheduler from reordering them since
> + this would break CFI. No further handling in the sched_reorder
> + hook is required since the r11 and r15 restore will never appear in
> +
Hi Rainer,
> I noticed you missed one piece of Iain's typeinfo.cc patch, btw.:
>
> diff --git a/gcc/d/typeinfo.cc b/gcc/d/typeinfo.cc
> --- a/gcc/d/typeinfo.cc
> +++ b/gcc/d/typeinfo.cc
> @@ -886,7 +886,7 @@ public:
> if (cd->isCOMinterface ())
> flags |= ClassFlags::isCOMclass;
>
ll: all-am
+PWD_COMMAND = $${PWDCMD-pwd}
.SUFFIXES:
$(srcdir)/Makefile.in: @MAINTAINER_MODE_TRUE@ $(srcdir)/Makefile.am
$(am__configure_deps)
Regards
Robin
--
gcc/d/ChangeLog:
2019-04-24 Robin Dapp
* typeinfo.cc (create_typeinfo): Set fields with proper length.
gcc/testsuite/Change
> Robin, have you been testing with --disable-multilib or something
> similar?
yes, I believe so... stupid mistake :(
Thanks for fixing it so quickly.
Hi,
while trying to improve s390 code generation for rotate and shift I
noticed superfluous subregs for shift count operands. In our backend we
already have quite cumbersome patterns that would need to be duplicated
(or complicated further by more subst patterns) in order to get rid of
the subregs
>> Bit tests on x86 also truncate [1], if the bit base operand specifies
>> a register, and we don't use BT with a memory location as a bit base.
>> I don't know what is referred with "(real or pretended) bit field
>> operations" in the documentation for SHIFT_COUNT_TRUNCATED:
>>
>> However, o
> It would really help if you could provide testcases which show the
> suboptimal code and any analysis you've done.
I tried introducing a define_subst pattern that substitutes something
one of two other subst patterns already changed.
The first subst pattern helps remove a superfluous and on the
Hi,
this patch adds -march=z900 to a test case that expects larl for loading
a value via the GOT. On z10 and later, lgrl is used which is tested in
a new test case.
Regards
Robin
--
gcc/testsuite/ChangeLog:
2019-05-15 Robin Dapp
* gcc.target/s390/global-array-element-pic.c: Add
Hi,
this patch changes three gen-vect testcases so they do not expect
vectorization of an unaligned access. Vectorization happens regardless,
we just ignore misalignment.
Regards
Robin
--
gcc/testsuite/ChangeLog:
2019-05-15 Robin Dapp
* gcc.dg/tree-ssa/gen-vect-26.c: Do not
Hi,
this patch implements vector copysign using vector select on S/390.
Regtested and bootstrapped on s390x.
Regards
Robin
--
gcc/ChangeLog:
2019-02-07 Robin Dapp
* config/s390/vector.md: Implement vector copysign.
gcc/testsuite/ChangeLog:
2019-02-07 Robin Dapp
extra function for now because I find
extract_range_from_binary_expr_1 somewhat lengthy and hard to follow
already :) Wouldn't it be better to "separate concerns"/split it up in
the long run and merge the functionality needed here at some time?
Bootstrapped and reg-tested on s390
gah, this
+ return true;
+ if (TREE_CODE (t1) != SSA_NAME)
should of course be like this
+ if (TREE_CODE (t1) != SSA_NAME)
+ return true;
in the last patch.
This causes a performance regression in the xalancbmk SPECint2006
benchmark on s390x. At first sight, the produced asm output doesn't look
too different but I'll have a closer look. Is the fwprop order supposed
to have major performance implications?
Regards
Robin
> This changes it from PRE on t
Ping.
diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2beadbc..d66fcb1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see
#include "internal-fn.h"
#include "case-cfn-macros.h"
#include "gimp
n
--
gcc/ChangeLog:
2017-10-17 Robin Dapp
* config/s390/s390.c (s390_bb_fallthru_entry_likely): New
function.
(s390_sched_init): Do not reset s390_sched_state if we entered
the current basic block via a fallthru edge and all others are
very unlikely.
di
> Preserving the sched state across basic blocks for your case works only if
> the BBs are traversed
> with the fall through edges coming first. Is that the case? We probably
> should have a description
> for s390_last_sched_state stating this.
Committed as attached with an additional comment an
> While the initialization value doesn't matter (wi::add will overwrite it)
> better initialize both to false ;) Ah, you mean because we want to
> transform only if get_range_info returned VR_RANGE. Indeed somewhat
> unintuitive (but still the best variant for now).
> so I'm still missing a comm
[3/3] Tests
--
gcc/testsuite/ChangeLog:
2017-07-05 Robin Dapp
* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-signed-2.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify
d the body_cost_vec parameter
which is not used elsewhere.
Regards
Robin
--
gcc/ChangeLog:
2017-07-12 Robin Dapp
* (vect_enhance_data_refs_alignment):
Remove body_cost_vec from _vect_peel_extended_info.
tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
D
Hi,
recently I wondered why a snippet like the following is not being
if-converted at all on s390:
int foo (int *a, unsigned int n)
{
int min = 99;
int bla = 0;
for (int i = 0; i < n; i++)
{
if (a[i] < min)
{
min = a[i];
bla = 1;
}
}
> Do you have an example where wrong code is generated through the
> noce_convert_multiple_sets_p path (with or without bodged costs)?
>
> Both AArch64 and x86-64 reject your testcase along this codepath because
> of the constant set of 1. If we work around that by setting bla = n rather
> than bl
ChangeLog:
2017-07-31 Robin Dapp
* MAINTAINERS (write after approval): Add myself.
Index: MAINTAINERS
===
--- MAINTAINERS (revision 250740)
+++ MAINTAINERS (working copy)
@@ -356,6 +356,7 @@
Lawrence Crowl
Ian Dall
> So the new part is the last point? There's a lot of refactoring in
3/3 that
> makes it hard to see what is actually changed ... you need to resist
> in doing this, it makes review very hard.
The new part is actually spread across the three last "-"s. Attached is
a new version of [3/3] split u
gcc/ChangeLog:
2017-05-08 Robin Dapp
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peel info.
(vect_enhance_data_refs_alignment):
Compute full costs when peeling for unknown alignment, compare
to costs for peeling for known
gcc/ChangeLog:
2017-05-08 Robin Dapp
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Remove unused variable.
(vect_enhance_data_refs_alignment):
Compare best peelings costs to doing no peeling and choose no
peeling if equal.
diff --git a
ping.
Included the requested changes in the patches (to follow). I removed
the alignment count check now altogether.
> I'm not sure why you test for unlimited_cost_model here as I said
> elsewhere I'm not sure
> what not cost modeling means for static decisions. The purpose of
> unlimited_cost_model
>
gcc/ChangeLog:
2017-05-11 Robin Dapp
* tree-vectorizer.h (dr_misalignment): Introduce
DR_MISALIGNMENT_UNKNOWN.
* tree-vect-data-refs.c (vect_compute_data_ref_alignment): Refactoring.
(vect_update_misalignment_for_peel): Use DR_MISALIGNMENT_UNKNOWN
gcc/ChangeLog:
2017-05-11 Robin Dapp
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Change
comment and rename variable.
(vect_get_peeling_costs_all_drs): New function.
(vect_peeling_hash_get_lowest_cost): Use.
(vect_peeling_supportable
gcc/ChangeLog:
2017-05-11 Robin Dapp
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peeling info and set costs to zero for unlimited cost
model.
(vect_enhance_data_refs_alignment): Also inspect all datarefs
with unknown
gcc/ChangeLog:
2017-05-11 Robin Dapp
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment):
Remove check for supportable_dr_alignment, compute costs for
doing no peeling at all, compare to the best peeling costs so
far and do no peeling if cheaper.
diff
gcc/testsuite/ChangeLog:
2017-05-11 Robin Dapp
* gcc.target/s390/vector/vec-nopeel-2.c: New test.
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 000..9b67793
--- /dev/null
+++ b/gcc
Included the workaround for SLP now. With it, testsuite is clean on x86
as well.
gcc/ChangeLog:
2017-05-11 Robin Dapp
* tree-vect-data-refs.c (vect_get_data_access_cost):
Workaround for SLP handling.
(vect_enhance_data_refs_alignment):
Remove check for
> Hmm, won't (uint32_t + uint32_t-CST) doesn't overflow be sufficient
> condition for such transformation?
Yes, in principle this should suffice. What we're actually looking for
is something like a "proper" (or no) overflow, i.e. an overflow in both
min and max of the value range. In
(a + cst1
This tries to fold unconditionally and fixes some test cases.
gcc/ChangeLog:
2017-05-18 Robin Dapp
* tree-ssa-propagate.c
(substitute_and_fold_dom_walker::before_dom_children):
Always try to fold.
gcc/testsuite/ChangeLog:
2017-05-18 Robin Dapp
* g++.dg/tree
match.pd part of the patch.
gcc/ChangeLog:
2017-05-18 Robin Dapp
* match.pd: Simplify wrapped binary operations.
* tree-vrp.c (extract_range_from_binary_expr_1): Add overflow
parameter.
(extract_range_from_binary_expr): Likewise.
* tree-vrp.h: Export
New testcases.
gcc/testsuite/ChangeLog:
2017-05-18 Robin Dapp
* gcc.dg/wrapped-binop-simplify-signed-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-1.c: New test.
* gcc.dg/wrapped-binop-simplify-unsigned-2.c: New test.
diff --git a/gcc/testsuite/gcc.dg
> Any reason to expose tree-vrp.c internal interface here? The function
> looks quite expensive. Overflow check can be done by get_range_info
> and simple wi::cmp calls. Existing code like in
> tree-ssa-loop-niters.c already does that. Also could you avoid using
> comma expressions in condition
> I can guess what is happening here. It's a 40 bits unsigned long long
> field, (s.b-8) will be like:
> _1 = s.b
> _2 = _1 + 0xf8
> Also get_range_info returns value range [0, 0xFF] for _1.
> You'd need to check if _1(with range [0, 0xFF]) + 0xf8
> overflows agains
The last version of the patch series caused some regressions for ppc64.
This was largely due to incorrect handling of unsupportable alignment
and should be fixed with the new version.
p2 and p5 have not changed but I'm posting the whole series again for
reference. p1 only changed comment wording,
gcc/ChangeLog:
2017-05-23 Robin Dapp
* tree-vect-data-refs.c (vect_compute_data_ref_alignment):
Create DR_HAS_NEGATIVE_STEP.
(vect_update_misalignment_for_peel): Define DR_MISALIGNMENT.
(vect_enhance_data_refs_alignment): Use
gcc/ChangeLog:
2017-05-23 Robin Dapp
* tree-vect-data-refs.c (vect_update_misalignment_for_peel):
Rename.
(vect_get_peeling_costs_all_drs): Create function.
(vect_peeling_hash_get_lowest_cost):
Use vect_get_peeling_costs_all_drs
gcc/ChangeLog:
2017-05-23 Robin Dapp
* tree-vect-data-refs.c (vect_peeling_hash_choose_best_peeling):
Return peeling info and set costs to zero for unlimited cost
model.
(vect_enhance_data_refs_alignment): Also inspect all datarefs
with unknown
gcc/ChangeLog:
2017-05-23 Robin Dapp
* tree-vect-data-refs.c (vect_get_data_access_cost):
Workaround for SLP handling.
(vect_enhance_data_refs_alignment):
Compute costs for doing no peeling at all, compare to the best
peeling costs so far and avoid
gcc/testsuite/ChangeLog:
2017-05-23 Robin Dapp
* gcc.target/s390/vector/vec-nopeel-2.c: New test.
diff --git a/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c b/gcc/testsuite/gcc.target/s390/vector/vec-nopeel-2.c
new file mode 100644
index 000..9b67793
--- /dev/null
+++ b
> Not sure I've understood the series TBH, but is the npeel == vf / 2
> there specifically for the "unknown number of peels" case? How do
> we distinguish that from the case in which the number of peels is
> known to be vf / 2 at compile time? Or have I missed the point
> completely? (probably ye
but the old series itself (-p3)
doesn't apply to trunk anymore (because of the change in
vect_enhance_data_refs_alignment).
Regards
Robin
--
gcc/ChangeLog:
2017-05-24 Robin Dapp
* tree-vect-data-refs.c (vect_get_peeling_costs_all_drs):
Introduce unknown_misalignment
> Since this commit (r248678), I've noticed regressions on some arm targets.
> Executed from: gcc.dg/tree-ssa/tree-ssa.exp
> gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect "Alignment
> of access forced using peeling" 1
> gcc.dg/tree-ssa/gen-vect-26.c scan-tree-dump-times vect
> "
> Patch 6 breaks no-vfa-vect-57.c on powerpc.
Which CPU model (power6/7/8?) and which compile options (-maltivec/
-mpower8-vector?) have been used for running and compiling the test? As
discussed in
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80925
this has an influence on the cost function and
Perhaps I'm still missing how some cases are handled or not handled,
sorry for the noise.
> I'm not sure there is anything to "interpret" -- the operation is unsigned
> and overflow is when the operation may wrap around zero. There might
> be clever ways of re-writing the expression to
> (uint64_
Ping.
To put it shortly, I'm not sure how to differentiate between:
example range of a: [3,3]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(-1 + 1), sign extend
example range of a: [0,0]
(ulong)(a + UINT_MAX) + 1 --> (ulong)(a) + (ulong)(UINT_MAX + 1), no
sign extend
In this case, there is
Hi,
this patch fixes the vcond shift testcase that failed since setting
PARAM_MIN_VECT_LOOP_BOUND in the s390 backend.
Regards
Robin
--
gcc/testsuite/ChangeLog:
2017-03-27 Robin Dapp
* gcc.target/s390/vector/vcond-shift.c (void foo): Increase
iteration count and assume
Hi,
when looking at various vectorization examples on s390x I noticed that
we still peel vf/2 iterations for alignment even though vectorization
costs of unaligned loads and stores are the same as normal loads/stores.
A simple example is
void foo(int *restrict a, int *restrict b, unsigned int n)
Hi Bin,
> Seems Richi added code like below comparing costs between aligned and
> unsigned loads, and only peeling if it's beneficial:
>
> /* In case there are only loads with different unknown misalignments,
> use
> peeling only if it may help to align other accesses in the loop
> Note I was very conservative here to allow store bandwidth starved
> CPUs to benefit from aligning a store.
>
> I think it would be reasonable to apply the same heuristic to the
> store case that we only peel for same cost if peeling would at least
> align two refs.
Do you mean checking if peel
Hi,
> This one only works for known misalignment, otherwise it's overkill.
>
> OTOH if with some refactoring we can end up using a single cost model
> that would be great. That is for the SAME_ALIGN_REFS we want to
> choose the unknown misalignment with the maximum number of
> SAME_ALIGN_REFS. A
Some refactoring and definitions to use for (unknown) DR_MISALIGNMENT,
gcc/ChangeLog:
2017-04-26 Robin Dapp
* tree-data-ref.h (struct data_reference): Create DR_HAS_NEGATIVE_STEP.
* tree-vectorizer.h (dr_misalignment): Define DR_MISALIGNMENT.
* tree-vect-data-refs.c
Wrap some frequently used snippets in separate functions.
gcc/ChangeLog:
2017-04-26 Robin Dapp
* tree-vect-data-refs.c (vect_update_misalignment_for_peel): Rename.
(vect_get_peeling_costs_all_drs): Create function.
(vect_peeling_hash_get_lowest_cost):
Use
gcc/ChangeLog:
2017-04-26 Robin Dapp
* tree-vect-data-refs.c (vect_peeling_hash_get_lowest_cost):
Change cost model.
(vect_peeling_hash_choose_best_peeling): Return extended peel info.
(vect_peeling_supportable): Return peeling status.
diff --git a/gcc/tree
This patch introduces balancing of long-running instructions that may clog the
pipeline.
gcc/ChangeLog:
2017-10-11 Robin Dapp
* config/s390/s390.c (NUM_SIDES): New constant.
(LONGRUNNING_THRESHOLD): New constant.
(LATENCY_FACTOR): New constant
This patch fixes cases where we start a new group although the previous one has
not ended.
Regression tested on s390x.
gcc/ChangeLog:
2017-10-11 Robin Dapp
* config/s390/s390.c (s390_has_ok_fallthru): New function.
(s390_sched_score): Temporarily change s390_sched_state
I skimmed through the code to see where transformation like
(a - 1) -> (a + UINT_MAX) are performed. It seems there are only two
places, match.pd (/* A - B -> A + (-B) if B is easily negatable. */)
and fold-const.c.
In order to be able to reliably know whether to zero-extend or to
sign-extend the
Hi,
the following patch changes "nopr %r7" to "nopr %r0" which is
advantageous from a hardware perspective. It will only be emitted for
hotpatching and should not impact normal code.
Bootstrapped and regression tested on s390 and s390x.
Regards
Robin
gcc/ChangeLog:
20
ening.
Regards
Robin
[1] https://gcc.gnu.org/ml/gcc/2017-01/msg00234.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01562.html
--
gcc/ChangeLog:
2017-03-02 Robin Dapp
* config/s390/s390.c (s390_option_override_internal): Set
PARAM_MIN_VECT_LOOP_BOUND
diff --git a/gc
s390x but did not yet perform bootstrapping
and more testing due to the premature nature of the patch.
Thanks
Robin
gcc/ChangeLog:
2016-03-17 Robin Dapp
* cfgloop.h (struct GTY): Add second number of iterations
* loop-doloop.c (doloop_condition_get): Fix whitespace
regressions on s390x and amd64.
Regards
Robin
--
gcc/ChangeLog:
2016-04-13 Robin Dapp
* tree-vectorizer.h
(dr_misalignment): Introduce named DR_MISALIGNMENT constants.
(aligned_access_p): Use constants.
(known_alignment_for_access_p): Likewise
t is usable despite the overflow. Do you think it
should be handled differently?
Revised version attached.
Regards
Robin
--
gcc/ChangeLog:
2016-09-20 Robin Dapp
PR middle-end/69526
This enables combining of wrapped binary operations and fixes
the tree level par
i_p().
ok to commit?
Regards
Robin
--
gcc/ChangeLog:
2016-09-26 Robin Dapp
* tree-vect-loop-manip.c (create_intersect_range_checks_index):
Add tree_fits_uhwi_p check.
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 8203040..8be0c17 100644
--- a/gcc/t
(I didn't manage to run it independently in this
directory via RUNTESTFLAGS=vect.exp=... or otherwise)
Bootstrapped on x86 and s390.
--
gcc/ChangeLog:
2016-09-26 Robin Dapp
* tree-vect-loop-manip.c (create_intersect_range_checks_index):
Add tree_fits_shwi_p check.
g
> Also the '=' in the split line goes to the next line according to
> coding conventions.
fixed, I had only looked at an instance one function above which had it
wrong as well. Also changed comment grammar slightly.
Regards
Robin
--
gcc/ChangeLog:
2016-09-27 Robin Dapp
This introduces an ICE ("bogus comparison result type") on s390 for the
following test case:
#include
void foo(int dim)
{
int ba, sign;
ba = abs (dim);
sign = dim / ba;
}
Doing
diff --git a/gcc/match.pd b/gcc/match.pd
index ba7e013..2455592 100644
--- a/gcc/match.pd
+++ b/gcc/match.
Ping.
Ping :)
As described in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69526, we
currently fail to simplify cases like
(unsigned long)(a - 1) + 1
to
(unsigned long)a
when VRP knows that (a - 1) does not overflow.
This patch introduces a match.pd pattern as well as a helper function
that checks for overf
Ping.
>> + /* Sign-extend @1 to TYPE. */
>> + w1 = w1.from (w1, TYPE_PRECISION (type), SIGNED);
>>
>> not sure why you do always sign-extend. If the inner op is unsigned
>> and we widen then that's certainly bogus considering your UINT_MAX
>> example above. Does
>>
>>
Ping. Any idea how to tackle this?
> So we have (uint64_t)(uint32 + -1U) + 1 and using TYPE_SIGN (inner_type)
> produces (uint64_t)uint32 + -1U + 1. This simply means that we cannot ignore
> overflow of the inner operation and for some reason your change
> to extract_range_from_binary_expr didn't catch this. That is _8 + 429496729
Hi,
recently, I came across a problem that keeps a load instruction in a
loop although it is loop-invariant.
A simple example is:
#include
#define SZ 256
int a[SZ], b[SZ], c[SZ];
int main() {
int i;
for (i = 0; i < SZ; i++) {
a[i] = b[i] + c[i];
}
printf("%d\n", a[0]);
}
The re
Found some time to look into this again.
> Index: tree-ssa-propagate.c
> ===
> --- tree-ssa-propagate.c(revision 240133)
> +++ tree-ssa-propagate.c(working copy)
> @@ -1105,10 +1105,10 @@ substitute_and_fold_dom_walker
ree-level.
Bootstrapped and regression-tested on s390.
Regards
Robin
gcc/ChangeLog:
2015-12-15 Robin Dapp
* config/s390/s390.c (s390_expand_vcond): Convert vector
conditional into shift.
* config/s390/vector.md: Change operand predicate.
gcc/testsuite/ChangeLog:
2015-12-
Hi,
the attached patch renames the constm1_operand predicate to
all_ones_operand and introduces a check for int mode.
It should be applied on top of the last patch ([Patch] S/390: Simplify
vector conditionals).
Regtested on s390.
Regards
Robin
gcc/ChangeLog:
2015-12-15 Robin Dapp
Hi,
in compute_nregs_for_mode we expect that the current variable's mode is
at most as large as the biggest mode to be used for vectorization.
This might not be true for constants as they don't actually have a mode.
In that case, just use the biggest mode so max_number_of_live_regs
returns 1.
Th
> Quick question. We did something like this to aid internal
> testing/bringup. Our variant adjusted a ton of the mode iterators in
> vector-iterators.md and the TUPLE_ENTRY stuff in riscv-vector-switch.def.
>
> Robin, do you remember why you had to adjust all the iterators? Was it
> that LTO
Hi,
this is probably rather an RFC than a patch as I'm not sure whether
reassoc is the right place to fix it. On top, the heuristic might
be a bit "ad-hoc". Maybe we can also work around it in the vectorizer?
The following function is vectorized in a very inefficient way because we
construct ve
This patch adds an else operand to vectorized masked load calls.
The current implementation adds else-value arguments to the respective
target-querying functions that is used to supply the vectorizer with the
proper else value.
Right now, the only spot where a zero else value is actually enforced
This patch adds a zero else operand to masked loads, in particular the
masked gather load builtins that are used for gather vectorization.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_special_args_builtin):
Add else-operand handling.
(ix86_expand_builtin): Ditt
This patch amends the documentation for masked loads (maskload,
vec_mask_load_lanes, and mask_gather_load as well as their len
counterparts) with an else operand.
gcc/ChangeLog:
* doc/md.texi: Document masked load else operand.
---
gcc/doc/md.texi | 63 ---
When predicating a load we implicitly assume that the else value is
zero. This matters in case the loaded value is padded (like e.g.
a Bool) and we must ensure that the padding bytes are zero on targets
that don't implicitly zero inactive elements.
In order to formalize this this patch queries th
This adds zero else operands to masked loads and their intrinsics.
I needed to adjust more than initially thought because we rely on
combine for several instructions and a change in a "base" pattern
needs to propagate to all those.
For the lack of a better idea I used a function call property to s
This patch adds else operands to masked loads. Currently the default
else operand predicate accepts "undefined" (i.e. SCRATCH) as well as
all-ones values.
Note that this series introduces a large number of new RVV FAILs for
riscv. All of them are due to us not being able to elide redundant
vec_c
wer10, x86 and aarch64.
Regtested on rv64gcv.
Testing on GCN would be much appreciated.
Robin Dapp (8):
docs: Document maskload else operand and behavior.
ifn: Add else-operand handling.
tree-ifcvt: Enforce zero else value after maskload.
vect: Add maskload else value support.
aarch64
This patch adds else-operand handling to the internal functions.
gcc/ChangeLog:
* internal-fn.cc (add_mask_and_len_args): Rename...
(add_mask_else_and_len_args): ...to this and add else handling.
(expand_partial_load_optab_fn): Use adjusted function.
(expand_partia
This patch adds an undefined else operand to the masked loads.
gcc/ChangeLog:
* config/gcn/predicates.md (maskload_else_operand): New
predicate.
* config/gcn/gcn-valu.md: Use new predicate.
---
gcc/config/gcn/gcn-valu.md | 12
gcc/config/gcn/predicates.md |
> Interesting - this is bleh | bswap (..), right, so having
> bla1 | (bleh | bla2) fails to recognize bla1 | bla2 as bswap.
Yes, exactly.
> I'd expect this kind of pattern to fail bswap detection easily
> if you mangle it a bit. So possibly bswap detection should learn
> to better pick the "piec
501 - 600 of 1291 matches
Mail list logo