Hi Alex, > It might be nice to at least experiment with supporting DRs with > different steps as follow-on work. I agree that we should leave it out > for the first version to keep things simple.
> FWIW, in case it's of interest, I wrote a script to calculate the > possible combinations of alignments that we could mutually peel for DRs > with different steps. The script also calculates the probability of > having a viable combination of alignments at runtime, assuming the > alignments are chosen independently and uniformly at random. These are > the results for N = 2 DRs (assuming target vector alignment is 16): > DR sizes: [2, 1] > -> DR alignments: [0, 8], peel_niters = 8 > -> DR alignments: [2, 9], peel_niters = 7 > -> DR alignments: [4, 10], peel_niters = 6 > -> DR alignments: [6, 11], peel_niters = 5 > -> DR alignments: [8, 12], peel_niters = 4 > -> DR alignments: [10, 13], peel_niters = 3 > -> DR alignments: [12, 14], peel_niters = 2 > -> DR alignments: [14, 15], peel_niters = 1 > Pr(success) = 8/127 = 6.30% > DR sizes: [4, 1] > -> DR alignments: [0, 12], peel_niters = 4 > -> DR alignments: [4, 13], peel_niters = 3 > -> DR alignments: [8, 14], peel_niters = 2 > -> DR alignments: [12, 15], peel_niters = 1 > Pr(success) = 4/63 = 6.35% > DR sizes: [8, 1] > -> DR alignments: [0, 14], peel_niters = 2 > -> DR alignments: [8, 15], peel_niters = 1 > Pr(success) = 2/31 = 6.45% > DR sizes: [4, 2] > -> DR alignments: [0, 8], peel_niters = 4 > -> DR alignments: [4, 10], peel_niters = 3 > -> DR alignments: [8, 12], peel_niters = 2 > -> DR alignments: [12, 14], peel_niters = 1 > Pr(success) = 4/31 = 12.90% > DR sizes: [8, 2] > -> DR alignments: [0, 12], peel_niters = 2 > -> DR alignments: [8, 14], peel_niters = 1 > Pr(success) = 2/15 = 13.33% > DR sizes: [8, 4] > -> DR alignments: [0, 8], peel_niters = 2 > -> DR alignments: [8, 12], peel_niters = 1 > Pr(success) = 2/7 = 28.57% > in general as the number of DRs increases, the probability of success > of this strategy tends to decrease. The lowest probability combination > with N = 3 is: > DR sizes: [2, 1, 1] > -> DR alignments: [0, 8, 8], peel_niters = 8 > -> DR alignments: [2, 9, 9], peel_niters = 7 > -> DR alignments: [4, 10, 10], peel_niters = 6 > -> DR alignments: [6, 11, 11], peel_niters = 5 > -> DR alignments: [8, 12, 12], peel_niters = 4 > -> DR alignments: [10, 13, 13], peel_niters = 3 > -> DR alignments: [12, 14, 14], peel_niters = 2 > -> DR alignments: [14, 15, 15], peel_niters = 1 > Pr(success) = 8/2047 = 0.39% > but there are still some N = 3 combinations with reasonable chances of > success: > DR sizes: [8, 8, 2] > -> DR alignments: [0, 0, 12], peel_niters = 2 > -> DR alignments: [8, 8, 14], peel_niters = 1 > Pr(success) = 2/31 = 6.45% > DR sizes: [8, 4, 4] > -> DR alignments: [0, 8, 8], peel_niters = 2 > -> DR alignments: [8, 12, 12], peel_niters = 1 > Pr(success) = 2/31 = 6.45% > DR sizes: [8, 8, 4] > -> DR alignments: [0, 0, 8], peel_niters = 2 > -> DR alignments: [8, 8, 12], peel_niters = 1 > Pr(success) = 2/15 = 13.33% > So we could in theory attempt to calculate at compile time how likely > the mutual peeling strategy is to succeed, or otherwise use some > heuristic to decide whether to attempt it. It seems like it might be > worth doing at least for N = 2, especially if it seems to help in > practice. > I think the runtime alignment check boils down to calculating how many > niters would be needed to peel the first DR, and then checking that you > get the same result for the other DRs. I.e. checking: > distance(a1)/step(a1) = ... = distance(aN)/step(aN) > where distance(dr) gives the distance required to peel, i.e. > distance(a) = V - (a % V) where V is the target vector alignment. Given > the steps and V are powers of two known at compile time, it should be > possible to optimize this check to be relatively efficient (although > likely not quite as fast as the xor check for the same-step case). Thank you for your suggestion and providing the probability data above. I agree that supporting two DRs with different steps can serve as a future enhancement. Currently, even when two DRs have different steps, the vectorized code path still applies - specifically in common cases like one of the two unsupported DRs is known aligned. In this case, do_peeling is turned off and versioning is applied alone. > Very minor, but I think this could technically overflow the tmp_name > buffer. The worst-case length of this string (including NT) is 9 + 2k > where k is length(str(i)). When k = 6, i.e. if we have > 100,000 DRs, > then this will be a buffer overflow. Presumably something else will > fall over / fail vectorization well before we get to having quite so > many DRs, but perhaps we should be more defensive here anyway, e.g. use > snprintf, and/or a bigger buffer? I'm currently investigating a spurious warning issue which seems related to this patch. If a fix for that issue is necessary in this patch, I will address this minor overflow issue in the meanwhile. -- Thanks, Pengfei