> To my surprise, clang 6.0 is willing to generate vld1.8 when no > particular CPU model is specified: > https://godbolt.org/g/i5PqcQ
This sample for vld1.8 will be valid due to element size aligned. Also, although this generator generates hardfp abi as default, if using gcc 7 (with -march=armv7-a -mfpu=neon -mfloat-abi=hard -O3 -mno-unaligned-access -mthumb) , generated code is following. 00000000 <aligned>: 0: f920 0adf vld1.64 {d0-d1}, [r0 :64] 4: 4770 bx lr 6: bf00 nop 00000008 <unaligned>: 8: b500 push {lr} a: b085 sub sp, #20 c: 4601 mov r1, r0 e: 2210 movs r2, #16 10: 4668 mov r0, sp 12: f7ff fffe bl 0 <memcpy> 16: f92d 0adf vld1.64 {d0-d1}, [sp :64] 1a: b005 add sp, #20 1c: f85d fb04 ldr.w pc, [sp], #4 Although gcc doesn't optimize memcpy with -mno-unaligned-access, if using -munaligned-acces, it uses ldr, stmia and vld1.64, not vld1.8. ARM has big endian support, all case cannot replace *.16/*.32/*.64 with *.8 to support both endians. > Is unaligned NEON allowed on any ARMv7 CPU without trapping after all > even if unaligned ALU loads/stores might not be? If unaligned access with alignment identifier, it will cause trap. And it will depend on element size. -- Makoto Kato On Thu, Mar 29, 2018 at 8:38 PM, Henri Sivonen <hsivo...@hsivonen.fi> wrote: > On Thu, Mar 29, 2018 at 4:09 AM, Makoto Kato <mk...@mozilla.com> wrote: > > Since SCTLR isn't allowed on userland, there is no way to detect > unalignment > > access support without trap. Generally, unalignement access causes > SIGBUS, > > so we might get a data from crash reporter. Android armv7-a ABI doesn't > > define that hardware configuration has to set alignment bit of SCTLR, so > we > > should consider both unfortunately. > > To my surprise, clang 6.0 is willing to generate vld1.8 when no > particular CPU model is specified: > https://godbolt.org/g/i5PqcQ > > Is unaligned NEON allowed on any ARMv7 CPU without trapping after all > even if unaligned ALU loads/stores might not be? > > > ARM document of Cortex-A8 says [*1], alignment identifier is 64 > > (VLD2.16@64), it requires 2 cycles, but alignment identifier is 128 > > (VLD2.16@128), it is 1 cycle. And on Cortex-A9, unalignment access > requires > > additional cycles [*2]. > ... > > [*1] > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm. > doc.ddi0344h/ch16s06s07.html > > [*2] > > http://infocenter.arm.com/help/index.jsp?topic=/com.arm. > doc.ddi0344h/ch16s06s07.html > > Thank you. Was [*2] meant to be a different URL? > > On Wed, Mar 28, 2018 at 6:36 PM, Gregory Szorc <g...@mozilla.com> wrote: > > Is > > http://fastcompression.blogspot.fr/2015/08/accessing- > unaligned-memory.html > > and/or the comments for MEM_FORCE_MEMORY_ACCESS at > > https://github.com/facebook/zstd/blob/dev/lib/common/mem.h useful? > > Thanks, but unfortunately these don't address my issue. These are > about getting GCC to perform an unaligned load efficiently when the > programmer has already decided to want an unaligned load. > > I'm trying to figure out whether it's worthwhile to spend cycles to > move pointers to alignment if possible or whether it makes sense to > just use unaligned operations unconditionally. (Also, GCC doesn't > matter in my case, since I'm planning Rust code.) > > In non-ARMv7 cases my findings are that moving to alignment doesn't > look empirically worthwhile on aarch64 (tested RPi3 and ThunderX, > which both have in-order cores; should test an out-of-order core, but > documentation supports the empirical results) or on Haswell > (documentation indicates that the key is Nehalem or newer). On Core 2 > Duo, moving to alignment is worthwhile. > > -- > Henri Sivonen > hsivo...@hsivonen.fi > https://hsivonen.fi/ > _______________________________________________ > dev-platform mailing list > dev-platform@lists.mozilla.org > https://lists.mozilla.org/listinfo/dev-platform > _______________________________________________ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform