Re: RFC: -mimplicit-it and GCC upstream
On Wed, Nov 17, 2010 at 11:22 AM, Dave Martin wrote: > On Wed, Nov 17, 2010 at 2:53 AM, Michael Hope wrote: >> In general the product should move forward and drop work-arounds like >> -mimplicit-it. We (the greater ARM community) should fix these >> package problems as they are found. Here's a bunch of quick-fire [...] Having discussed this further with Richard, it sounds like there are enough issues blocking -mimplicit-it upstream that we should not expect it to be supported by default upstream in the foreseeable future: * -mimplicit-it disables some important sanity-checking on the compiler output (by not checking compiler-generated ITs, or the absence thereof). We could in principle make the assembler only inject ITs in between #APP and #NO_APP, but the assembler doesn't support this yet; nor does any arm gcc I've seen systematically generate these directives for inline asms; so implementing this would probably result in a flag day when everyone has to move to up-to-date gcc and gas. Upstreams are unlikely to go for that. * with -mimplicit-it, the compiler must be pretty conservative about inline asm block size (assuming 6 byte per statement) - that's feasible, but very suboptimal and is likely to result in the need for yet another compiler option to turn it on; again, this is unlikely to become the default upstream. * add-hoc workarounds can be used, such as wrapping GCC to compile in multiple passes so that the correct inline asm size for each block can be determined. But such approaches are likely to be too cumbersome to get merged in any project. So I've now come round to the view that we _should_ probably bite the bullet and fix the inline asm directly. So: * We need to verify which binutils permit (and ignore) the IT instructions in non-unified (ARM) syntax. I've observed that 2.19.1 definitely supports this; I don't know about earlier versions -- this is probably something the toolchain group should investigate. * We should be proactive about making these changes upstream. Writing some standard wording to explain the reason for the change and the likely impact would probably be a good idea. Cheers ---Dave ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] 15th - 19th November
LP:663939 - Thumb2 constants * Continued testing, found a few bugs. Tidied a few bits up. * Wrote some new testcases to go with the patch. LP:618684 - ICE * Begun looking at this one. So far I can't reproduce it. I have a debuggable native toolchain building, but it'd been delayed by hardware issues. In the course of testing I discovered that the ARM FSF config wasn't testing the right thing, so begun work on a new, more appropriate FSF build/test config for Linaro work. Also found the the SD card rootfs in my IGEPv2 board was corrupted. I've restored it from backup, and now it's working once more. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Of instruction timings
Hi Richard, As per the discussion at this mornings call; I've reread the TRM and I agree with you about the LSLS being the same speed as the TST. (1 cycle) However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle. On the space v perf theme, one thing that would be interesting to know is whether there are any icache/issue stage limitations; i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed as 1 cycle and are all in i-cache, can they be fetched and issued fast enough, or is there a performance advantage to short instructions? Dave ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
__sync barriers
For the record, the thing I half-remembered on the call was: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html and: http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html The problem is that all __sync operations besides __sync_lock_test_and_set and __sync_lock_release are defined to be full barriers. Using something like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq and __arch_compare_and_exchange_val_*_rel may on some architectures be too heavyweight, since those macros only need acquire/after and release/before barriers. See in particular: http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html from the first thread, where the feeling was that the future wasn't these __sync builtins, but the new C and C++ atomic memory support. Probably already known, sorry. I just wasn't sure that trying to convert everyone (not just ARM) to __sync_* was necessarily going to go down well. Richard ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [PATCH, WIP] NEON quadword vectors in big-endian mode (#10061, #7306)
On 17 November 2010 13:21, Julian Brown wrote: >> > We'd need to figure out what the RTL for such loads/stores should >> > look like, and whether it can represent alignment constraints, or >> > strides, or loads/stores of multiple vector registers simulateously. Alignment info is kept in struct ptr_info_def. Is it necessary to represent stride? Multiple loads/stores seem the most complicated part to me. In neon.md vld is implemented with output_asm_insn. Is it going to change? Does this assure consecutive (or stride two) registers? >> > Getting it right might be a bit awkward, especially if we want to >> > consider a scope wider than just NEON, i.e. other vector >> > architectures also. >> >> I think we need to somehow enhance MEM_REF, or maybe generate a >> MEM_REF for the first vector and a builtin after it. > > Yeah, keeping these things looking like memory references to most of > the compiler seems like a good plan. Is it possible to have a list of MEM_REFs and a builtin after them: v0 = MEM_REF (addr) v1 = MEM_REF (addr + 8B) v2 = MEM_REF (addr + 16B) builtin (v0, v1, v2, stride=3, reg_stride=1,...) to be expanded into: (addr) NOTE (...) and then combined into vld3? Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: RFC: -mimplicit-it and GCC upstream
On Mon, 22 Nov 2010, Dave Martin wrote: > So I've now come round to the view that we _should_ probably bite the > bullet and fix the inline asm directly. So: > >* We need to verify which binutils permit (and ignore) the IT > instructions in non-unified (ARM) syntax. I've observed that 2.19.1 > definitely supports this; I don't know about earlier versions -- this > is probably something the toolchain group should investigate. >* We should be proactive about making these changes upstream. > Writing some standard wording to explain the reason for the change and > the likely impact would probably be a good idea. I hope there is at least a validation of the IT instructions by the assembler with regards to the condition codes on the following instructions (and vice versa) to make sure they are all coherent, and even so for ARM mode compilation. Nicolas ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] November 15-21st
== Linaro GCC == * Continued looking at big-endian/quad-vector patch: attempted to figure out the proper semantics for vec_extract in big endian mode (about 1 day). Put on hold temporarily to work on lp675347, QT failing to build due to constraint failure in inline asm statements used for atomic operations: found the patch which introduced the failure, and suggested a workaround to the OP. Came up with a plausible-looking patch, and started testing it, after spending some time trying to figure out why ARM Linux mainline doesn't build at present. Patch sent upstream. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Status Report 11-22-2010
== Last Week == * Reached the point with understanding libunwind where I can begin writing patches for parsing unwind information out of .ARM.exidx and .ARM.extab ELF sections. == This Week == * Begin writing support for ARM-specific unwind information to libunwind. -- Zach Welch CodeSourcery zwe...@codesourcery.com (650) 331-3385 x743 ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: RFC: -mimplicit-it and GCC upstream
Hi, On Mon, Nov 22, 2010 at 2:39 PM, Nicolas Pitre wrote: [...] > I hope there is at least a validation of the IT instructions by the > assembler with regards to the condition codes on the following > instructions (and vice versa) to make sure they are all coherent, and > even so for ARM mode compilation. In unified syntax, yes; in traditional syntax, no. When driving gas via GCC, this means that the checking is done only when building for Thumb-2 (since traditional syntax is always used for ARM code generated by GCC, and this probably isn't going to change for now). In traditional syntax, at least some binutils versions totally ignore IT instructions. However, for code generated by GCC itself (i.e., not inline asm), GCC is supposed to generate the correct IT instructions when generating Thumb-2 code. -mimplicit-it may therefore cause some code generation errors to go unnoticed, particularly where the compiler misses out an IT instruction which it was supposed to insert, but the assembler silently inserts the missing instruction. Cheers ---Dave ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: __sync barriers
On Tue, Nov 23, 2010 at 12:34 AM, Richard Sandiford wrote: > For the record, the thing I half-remembered on the call was: > > http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00697.html > and: > http://gcc.gnu.org/ml/gcc-patches/2009-09/msg02112.html > > The problem is that all __sync operations besides __sync_lock_test_and_set > and __sync_lock_release are defined to be full barriers. Using something > like __sync_val_compare_and_swap for __arch_compare_and_exchange_val_*_acq > and __arch_compare_and_exchange_val_*_rel may on some architectures be too > heavyweight, since those macros only need acquire/after and release/before > barriers. See in particular: > > http://gcc.gnu.org/ml/gcc-patches/2009-08/msg00928.html > > from the first thread, where the feeling was that the future wasn't > these __sync builtins, but the new C and C++ atomic memory support. > > Probably already known, sorry. I just wasn't sure that trying to > convert everyone (not just ARM) to __sync_* was necessarily going > to go down well. Good point. Using __sync in ARM only is fine, but please do bring the topic up with upstream. I'd forgotten about LLVM when we were talking yesterday. Both GCC and LLVM supply sync primitives and I hope RVDS will soon as well. -- Michael ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain