Hi,
> This one only works for known misalignment, otherwise it's overkill.
>
> OTOH if with some refactoring we can end up using a single cost model
> that would be great. That is for the SAME_ALIGN_REFS we want to
> choose the unknown misalignment with the maximum number of
> SAME_ALIGN_REFS. And if we know the misalignment of a single
> ref then we still may want to align a unknown misalign ref if that has
> more SAME_ALIGN_REFS (I think we always choose the known-misalign
> one currently).
[0/3]
Attempt to unify the peeling cost model as follows:
- Keep the treatment of known misalignments.
- Save the load and store with the most frequent misalignment.
- Compare their costs and get the hardware-preferred one via costs.
- Choose the best peeling from the best peeling with known
misalignment and the best with unknown misalignment according to
the number of aligned data refs.
- Calculate costs for leaving everything misaligned and compare with
the best peeling so far.
I also performed some refactoring that seemed necessary during writing
but which is not strictly necessary anymore ([1/3] and [2/3]) yet imho
simplifies understanding the code. The bulk of the changes is in [3/3].
Testsuite on i386 and s390x is clean. I guess some additional test
cases won't hurt and I will add them later, however I didn't succeed
defining a test cases with two datarefs with same but unknown
misalignment. How can this be done?
A thing I did not understand when going over the existing code: In
vect_get_known_peeling_cost() we have
/* If peeled iterations are known but number of scalar loop
iterations are unknown, count a taken branch per peeled loop. */
retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
NULL, 0, vect_prologue);
retval = record_stmt_cost (prologue_cost_vec, 1, cond_branch_taken,
NULL, 0, vect_epilogue);
In all uses of the function, prologue_cost_vec is discarded afterwards,
only the return value is used. Should the second statement read retval
+=? This is only executed when the number of loop iterations is
unknown. Currently we indeed count one taken branch, but why then
execute record_stmt_cost twice or rather not discard the first retval?
Regards
Robin