Le sunnuntaina 4. syyskuuta 2022, 9.39.36 EEST Lynne a écrit : > In particular, doing the tail, which consists of 2 equal length transforms. > On AVX we interleave the coefficients from 2x4pt transforms during > lookups since we can do them simultaneously and save on > shuffles. Doing them individually wouldn't be as efficient.
I'm not going to boldy state that one size fits all, because I am pretty sure that it would come back to bite me in soft and sensitive tissue. But unlike SIMD extensions, RISC-V V and ARM SVE favour the use of offsets and masks to deal with misaligned edges, so I'm not sure how useful the insights from AVX are. > > And besides, how do you want to get the value if not with assembler? This > > is currently not found in ELF HWCAP and probably never will be. > Sucks, knowing how wide the units are is as important as > knowing how much L1 cache you have for me. I understand that for some multidimensional calculations, you need to make special cases. The obvious case would be if the vector is too short to fit a column or row of elements whilst performing a transposition. But even then, and even if we end up later on with, say, an arch_prctl() call to find the vector size, I don't think exposing it in CPU flags would be a good idea. VSETVL & VSETIVL also account for the element size and the vector group multiplier, so it seems better to use either of them than to reimplement the same logic in C based on the raw vector bit length. -- レミ・デニ-クールモン http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list [email protected] https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email [email protected] with subject "unsubscribe".
