On Fri, Jan 3, 2014 at 12:59 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Fri, Jan 03, 2014 at 12:25:00PM +0100, Uros Bizjak wrote: >> I am testing a patch that removes "max_align" part from ix86_data_alignment. > > That looks like unnecessary pessimization. Note the hunk in question is > guarded with opt, which means it is an optimization rather than ABI issue, > it can increase alignment, but the compiler can only assume the increased > alignment if the symbol is not public or if it is public, but can't be > preempted by another TU's definition. Even in that case it can be > worthwhile to increase the alignment, say if doing versioning for alignment, > or say just doing some AVX/AVX2/AVX512F version of memcpy/memset that can > handle it faster if it is sufficiently aligned by testing it at runtime. > > So, IMHO we shouldn't drop it, just improve it. > Perhaps: > > int max_align = optimize_size ? BITS_PER_WORD > : MIN (512, MAX_OFILE_ALIGNMENT); > > if (opt > && AGGREGATE_TYPE_P (type) > && TYPE_SIZE (type) > && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST > && align < max_align) > { > int this_align = 256; > for (this_align = 256; this_align <= max_align; this_align *= 2) > if (TREE_INT_CST_LOW (TYPE_SIZE (type)) < (unsigned) this_align > && !TREE_INT_CST_HIGH (TYPE_SIZE (type))) > break; > else if (align < this_align) > align = this_align; > } > > which will handle both the 256 bit alignment for >= 256 bit objects, > 512 bit alignment for >= 512 bit objects and will be prepared for the > future. > > 128 bit I think doesn't need to be handled, DATA_ALIGNMENT has been > using 256-bit test already since it's introduction in 1998: > http://gcc.gnu.org/ml/gcc-bugs/1998-03/msg00011.html
Thanks for the pointer, there is indeed the recommendation in optimization manual [1], section 3.6.4, where it is said: --quote-- Misaligned data access can incur significant performance penalties. This is particularly true for cache line splits. The size of a cache line is 64 bytes in the Pentium 4 and other recent Intel processors, including processors based on Intel Core microarchitecture. An access to data unaligned on 64-byte boundary leads to two memory accesses and requires several μops to be executed (instead of one). Accesses that span 64-byte boundaries are likely to incur a large performance penalty, the cost of each stall generally are greater on machines with longer pipelines. ... A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64. Sorting data in decreasing size order is one heuristic for assisting with natural alignment. As long as 16- byte boundaries (and cache lines) are never crossed, natural alignment is not strictly necessary (though it is an easy way to enforce this). --/quote-- So, this part has nothing to do with AVX512, but with cache line width. And we do have a --param "l1-cache-line-size=64", detected with -march=native that could come handy here. This part should be rewritten (and commented) with the information above in mind. [1] http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html Uros.