Hi Alex,

> It might be nice to at least experiment with supporting DRs with
> different steps as follow-on work.  I agree that we should leave it out
> for the first version to keep things simple.

> FWIW, in case it's of interest, I wrote a script to calculate the
> possible combinations of alignments that we could mutually peel for DRs
> with different steps.  The script also calculates the probability of
> having a viable combination of alignments at runtime, assuming the
> alignments are chosen independently and uniformly at random.  These are
> the results for N = 2 DRs (assuming target vector alignment is 16):

> DR sizes: [2, 1]
> -> DR alignments: [0, 8], peel_niters = 8
> -> DR alignments: [2, 9], peel_niters = 7
> -> DR alignments: [4, 10], peel_niters = 6
> -> DR alignments: [6, 11], peel_niters = 5
> -> DR alignments: [8, 12], peel_niters = 4
> -> DR alignments: [10, 13], peel_niters = 3
> -> DR alignments: [12, 14], peel_niters = 2
> -> DR alignments: [14, 15], peel_niters = 1
> Pr(success) = 8/127 = 6.30%

> DR sizes: [4, 1]
> -> DR alignments: [0, 12], peel_niters = 4
> -> DR alignments: [4, 13], peel_niters = 3
> -> DR alignments: [8, 14], peel_niters = 2
> -> DR alignments: [12, 15], peel_niters = 1
> Pr(success) = 4/63 = 6.35%

> DR sizes: [8, 1]
> -> DR alignments: [0, 14], peel_niters = 2
> -> DR alignments: [8, 15], peel_niters = 1
> Pr(success) = 2/31 = 6.45%

> DR sizes: [4, 2]
> -> DR alignments: [0, 8], peel_niters = 4
> -> DR alignments: [4, 10], peel_niters = 3
> -> DR alignments: [8, 12], peel_niters = 2
> -> DR alignments: [12, 14], peel_niters = 1
> Pr(success) = 4/31 = 12.90%

> DR sizes: [8, 2]
> -> DR alignments: [0, 12], peel_niters = 2
> -> DR alignments: [8, 14], peel_niters = 1
> Pr(success) = 2/15 = 13.33%

> DR sizes: [8, 4]
> -> DR alignments: [0, 8], peel_niters = 2
> -> DR alignments: [8, 12], peel_niters = 1
> Pr(success) = 2/7 = 28.57%

> in general as the number of DRs increases, the probability of success
> of this strategy tends to decrease.  The lowest probability combination
> with N = 3 is:

> DR sizes: [2, 1, 1]
> -> DR alignments: [0, 8, 8], peel_niters = 8
> -> DR alignments: [2, 9, 9], peel_niters = 7
> -> DR alignments: [4, 10, 10], peel_niters = 6
> -> DR alignments: [6, 11, 11], peel_niters = 5
> -> DR alignments: [8, 12, 12], peel_niters = 4
> -> DR alignments: [10, 13, 13], peel_niters = 3
> -> DR alignments: [12, 14, 14], peel_niters = 2
> -> DR alignments: [14, 15, 15], peel_niters = 1
> Pr(success) = 8/2047 = 0.39%

> but there are still some N = 3 combinations with reasonable chances of
> success:

> DR sizes: [8, 8, 2]
> -> DR alignments: [0, 0, 12], peel_niters = 2
> -> DR alignments: [8, 8, 14], peel_niters = 1
> Pr(success) = 2/31 = 6.45%

> DR sizes: [8, 4, 4]
> -> DR alignments: [0, 8, 8], peel_niters = 2
> -> DR alignments: [8, 12, 12], peel_niters = 1
> Pr(success) = 2/31 = 6.45%

> DR sizes: [8, 8, 4]
> -> DR alignments: [0, 0, 8], peel_niters = 2
> -> DR alignments: [8, 8, 12], peel_niters = 1
> Pr(success) = 2/15 = 13.33%

> So we could in theory attempt to calculate at compile time how likely
> the mutual peeling strategy is to succeed, or otherwise use some
> heuristic to decide whether to attempt it.  It seems like it might be
> worth doing at least for N = 2, especially if it seems to help in
> practice.

> I think the runtime alignment check boils down to calculating how many
> niters would be needed to peel the first DR, and then checking that you
> get the same result for the other DRs.  I.e. checking:

> distance(a1)/step(a1) = ... = distance(aN)/step(aN)

> where distance(dr) gives the distance required to peel, i.e.
> distance(a) = V - (a % V) where V is the target vector alignment.  Given
> the steps and V are powers of two known at compile time, it should be
> possible to optimize this check to be relatively efficient (although
> likely not quite as fast as the xor check for the same-step case).

Thank you for your suggestion and providing the probability data above. I agree
that supporting two DRs with different steps can serve as a future enhancement.
Currently, even when two DRs have different steps, the vectorized code path
still applies - specifically in common cases like one of the two unsupported DRs
is known aligned. In this case, do_peeling is turned off and versioning is
applied alone.

> Very minor, but I think this could technically overflow the tmp_name
> buffer.  The worst-case length of this string (including NT) is 9 + 2k
> where k is length(str(i)).  When k = 6, i.e. if we have > 100,000 DRs,
> then this will be a buffer overflow.  Presumably something else will
> fall over / fail vectorization well before we get to having quite so
> many DRs, but perhaps we should be more defensive here anyway, e.g. use
> snprintf, and/or a bigger buffer?

I'm currently investigating a spurious warning issue which seems related to
this patch. If a fix for that issue is necessary in this patch, I will address
this minor overflow issue in the meanwhile.

--
Thanks,
Pengfei

Reply via email to