Sorry for the late reply!
-----Original Message-----
From: Richard Biener [mailto:[email protected]]
Sent: Wednesday, July 22, 2020 3:02 PM
> First of all I think giving users more control over vectorization is
> good. Now as for "#pragma GCC no_reduc_chain" I'd like to avoid
> negatives and terms internal to GCC. I also would like to see
> vectorization pragmas to be grouped somehow, also to avoid bit
> explosion in struct loop. There's already annot_expr_no_vector_kind
> and annot_expr_vector_kind both only used by the fortran FE at the
> moment. Note ANNOATE_EXPR already allows an extra argument thus only
> annot_expr_vector_kind should prevail with its argument specifying a
> bitmask of vectorizer hints. We'd have an extra enum for those like
>
> enum annot_vector_subkind {
> annot_vector_never = 0,
> annot_vector_auto = 1, // this is the default
> annot_vector_always = 3,
> your new flag
> };
>
> and the user would specify it via
>
> #pragma GCC vect [(never|always|auto)] [your new flag]
I'll add the code to complete the above modifications.
> now, I honestly have a difficulty in suggesting a better name than
> no_reduc_chain. Quoting the testcase:
>
> +double f(double *a, double *b)
> +{
> + double res1 = 0;
> + double res0 = 0;
> +#pragma GCC no_reduc_chain
> + for (int i = 0 ; i < 1000; i+=4) {
> + res0 += a[i] * b[i];
> + res1 += a[i+1] * b[i*1];
> + res0 += a[i+2] * b[i+2];
> + res1 += a[i+3] * b[i+3];
> + }
> + return res0 + res1;
> +}
>
> for your case with IIRC V2DF vectors using reduction chains will
> result in a vectorization factor of two while with a SLP reduction the
> vectorization factor is one.
I reconfirmed the situation. Using reduction chains also result in a
vectorization factor of one as the same as using reductions.
Related logs are followed:
1.using reduction chains
pr96053.c:6:3: note: Final SLP tree for instance:
pr96053.c:6:3: note: node 0x1a4f4ba0 (max_nunits=2, refcnt=2)
pr96053.c:6:3: note: stmt 0 res1_37 = _14 + res1_48;
pr96053.c:6:3: note: stmt 1 res1_39 = _28 + res1_37;
...
pr96053.c:6:3: note: === vect_make_slp_decision ===
pr96053.c:6:3: note: Decided to SLP 2 instances. Unrolling factor 1
pr96053.c:6:3: note: === vect_detect_hybrid_slp ===
pr96053.c:6:3: note: === vect_update_vf_for_slp ===
pr96053.c:6:3: note: Loop contains only SLP stmts
pr96053.c:6:3: note: Updating vectorization factor to 1.
pr96053.c:6:3: note: vectorization_factor = 1, niters = 250
2.using reductions
pr96053.c:6:3: note: Final SLP tree for instance:
pr96053.c:6:3: note: node 0x3a7f2cb0 (max_nunits=2, refcnt=2)
pr96053.c:6:3: note: stmt 0 res0_38 = _21 + res0_36;
pr96053.c:6:3: note: stmt 1 res1_39 = _28 + res1_37;
...
pr96053.c:6:3: note: === vect_make_slp_decision ===
pr96053.c:6:3: note: Decided to SLP 1 instances. Unrolling factor 1
pr96053.c:6:3: note: === vect_detect_hybrid_slp ===
pr96053.c:6:3: note: === vect_update_vf_for_slp ===
pr96053.c:6:3: note: Loop contains only SLP stmts
pr96053.c:6:3: note: Updating vectorization factor to 1.
pr96053.c:6:3: note: vectorization_factor = 1, niters = 250
> So maybe it is better to give the user control over the vectorization factor?
> That's desirable in other cases where the user wants to force a larger
> VF to get extra unrolling for example. For the testcase above you'd
> use
>
> #pragma GCC vect vf(1)
>
> or so (syntax to be discussed). The side-effect would be that with a
> reduction chain the VF request cannot be fulfilled but with a SLP
> reduction it can. Of course no_reduc_chain is much easier to actually
> implement in a strict way while specifying VF will likely need to be
> documented as a hint (with an eventual diagnostic if it wasn't
> fulfilled)
>
> Richard/Jakub, any thoughts?
>
> Thanks,
> Richard.
Thanks,
Kaipeng Zhou