Volatile Memory accesses in Branch Delay Slots
Hi all, we are currently porting GCC to our own RISC architecture, which is similar to MIPS. This architecture contains one unconditional branch delay slot. The effect I noticed also occurs on MIPS, so I will be focusing on that architecture in the following. I noticed that GCC never puts accesses to volatile variables into the branch delay slot. For example, compiling this code on MIPS: extern volatile int a; void writeA() { a = 42; } Leads to this assembly code: writeA: .frame $sp,0,$31 # vars= 0, regs= 0/0, args= 0, gp= 0 .mask 0x,0 .fmask 0x,0 .setnoreorder .setnomacro lui $28,%hi(__gnu_local_gp) addiu $28,$28,%lo(__gnu_local_gp) lw $2,%got(a)($28) li $3,42 # 0x2a sw $3,0($2) jr $31 nop jr's delay slot is not filled. However, if the declaration of a is changed to `extern int a`, the delay slot is filled with the sw. The function responsible for this behavior seems to be resource_conflicts_p in reorg.c. Sadly, I could not find any comments explaining why volatile accesses cannot be put into delay slots. What is the reason for this behavior? I am unable to think of any situation where allowing volatile memory accesses in branch delay slots leads to problems. Am I missing a case? Or are negative effects limited to other architectures? Regards, Jakob -- M.Sc. Jakob Wenzel Fachgebiet Rechnersysteme Fachbereich 18, Elektrotechnik und Informationstechnik Technische Universität Darmstadt Merckstraße 25 D-64283 Darmstadt Tel: 06151-1621154
Re: Volatile Memory accesses in Branch Delay Slots
> The function responsible for this behavior seems to be > resource_conflicts_p in reorg.c. Sadly, I could not find any comments > explaining why volatile accesses cannot be put into delay slots. > > What is the reason for this behavior? I am unable to think of any > situation where allowing volatile memory accesses in branch delay slots > leads to problems. Am I missing a case? Or are negative effects limited > to other architectures? Delay slot filling is a code movement optimization and such optimizations are not valid for volatile memory accesses in the general case. -- Eric Botcazou
GCC 7.2 Status report (2017-07-25)
Status == It's time to do a GCC 7.2 release and thus please check if you have backports for regression or wrong-code bugs pending. The plan is to do GCC 7.2 RC1 mid next week and a release roughly a week after that. Quality Data Priority # Change from last report --- --- P10 P2 141 + 45 P33 P4 156 + 9 P5 28 - 1 --- --- Total P1-P3 144 + 45 Total 328 + 53 Previous Report === https://gcc.gnu.org/ml/gcc/2017-04/msg00080.html
Re: Volatile Memory accesses in Branch Delay Slots
On Tue, 2017-07-25 at 10:47 +0200, Jakob Wenzel wrote: > > jr's delay slot is not filled. However, if the declaration of a is > changed to `extern int a`, the delay slot is filled with the sw. > > The function responsible for this behavior seems to be > resource_conflicts_p in reorg.c. Sadly, I could not find any > comments > explaining why volatile accesses cannot be put into delay slots. > > What is the reason for this behavior? I am unable to think of any > situation where allowing volatile memory accesses in branch delay > slots leads to problems. Am I missing a case? Or are negative > effects limited to other architectures? Maybe because the code that does the delay slot stuffing does not do sophisticated checks whether such instruction reordering would not violate anything? So it's playing safe and bails out if it sees "volatile mem". Same thing happens also with insns that have multiple sets. Ideally it should do some more fine grained checks and give the backend an option to opt-in or opt-out. Cheers, Oleg
64-bit PowerPC and small data area?
Hello, in the PowerPC ELFv2 specification https://members.openpowerfoundation.org/document/dl/576 we have "3.4.2 Use of the Small Data Area For a data item in the .sdata or .sbss sections, a compiler may generate short-form one-instruction refer- ences. In an executable file or shared library, such a reference is relative to the address of the TOC base symbol (which can be obtained from r2 if a TOC pointer is initialized). A compiler that generates code using the small data area should provide an option to select the maximum size of objects placed in the small data area, and a means of disabling any use of the small data area. When generating code for ELF shared libraries, the small data area should not be used for default-visibility global objects. This is to satisfy ELF shared-library symbol interposition rules. That is, an ordinary global symbol in a shared library may be over- ridden by a symbol of the same name defined in the executable or another shared library. Supporting interpo- sition when using TOC-pointer relative addressing would require text relocations." I tried to generate code using the small data area on a 64-bit PowerPC GCC, but I was not successful. We have in the GCC sources (gcc/config/rs6000/rs6000.c): /* Return 1 for an operand in small memory on V.4/eabi. */ int small_data_operand (rtx op ATTRIBUTE_UNUSED, machine_mode mode ATTRIBUTE_UNUSED) { #if TARGET_ELF rtx sym_ref; if (rs6000_sdata == SDATA_NONE || rs6000_sdata == SDATA_DATA) return 0; if (DEFAULT_ABI != ABI_V4) return 0; So, it looks like the small data stuff is not support for ABI_ELFv2? Are there main issues with the small data area using ELVv2 or is this simply not implemented due to a lack of interested? -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: Volatile Memory accesses in Branch Delay Slots
On 07/25/2017 06:32 AM, Oleg Endo wrote: > On Tue, 2017-07-25 at 10:47 +0200, Jakob Wenzel wrote: >> >> jr's delay slot is not filled. However, if the declaration of a is >> changed to `extern int a`, the delay slot is filled with the sw. >> >> The function responsible for this behavior seems to be >> resource_conflicts_p in reorg.c. Sadly, I could not find any >> comments >> explaining why volatile accesses cannot be put into delay slots. >> >> What is the reason for this behavior? I am unable to think of any >> situation where allowing volatile memory accesses in branch delay >> slots leads to problems. Am I missing a case? Or are negative >> effects limited to other architectures? > > Maybe because the code that does the delay slot stuffing does not do > sophisticated checks whether such instruction reordering would not > violate anything? So it's playing safe and bails out if it sees > "volatile mem". Same thing happens also with insns that have multiple > sets. Ideally it should do some more fine grained checks and give the > backend an option to opt-in or opt-out. Essentially, the mantra has always been "be very conservative with volatile objects" -- in the context of reorg that means little/no effort is expended in trying to use a volatile memory access to fill a delay slot. A volatile memory reference in a nullified delay slot may not do the expected thing, depending on when/how nullification occurs within the processor. More generally, all of the speculative delay slot fillers would be a concern if volatile accesses were allowed in delay slots. I could speculate that fill_simple_delay_slots could probably safely be improved to utilize instructions with volatile memory operands to fill slots. But it hardly seems worth the effort given the directions in processor design/implementation over the last 20+ years. Jeff
Summer 2017 GNU Toolchain Update
Hi Guys, It has been a long time since my last post on the developments in the toolchain, so there is lots to report: --- Binutils: Version 2.29 has been released. In addition to previous changes already detailed in this blog, this release also contains: * Support for placing sections into special memory areas on systems that use virtual memory managers. This is like the MEMORY command in linker scripts except that that only works on systems without a memory management unit. With the new system sections can be marked as requiring a particular kind of special memory. The linker collects together all of the sections with the same requirements and places them into a specially marked segment. The loader can then detect this segment's requirements and ensure that the right kind of memory is used. * Support for the WebAssembly file format and conversion to the wasm32 ELF format. * The PowerPC assembler now checks that the correct register class is used in instructions. * The ARM assemblers now support the ARMv8-R architecture and Cortex-R52 processors. * The linker now supports ELF GNU program properties. These are run-time notes intended for the loader that tell it more about the binary that it is initializing. * The linker contains support for Intel's Indirect Branch Tracking (IBT) enhancement. This is a technology intended to help fight malicious code that abuses the stack to force unwanted behaviour from a program. For more information see: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf * Section groups can now be resolved (the group deleted and the group members placed like normal sections) at partial link time either using the new linker option --force-group-allocation or by placing FORCE_GROUP_ALLOCATION into the linker script. * The MIPS port now supports: + MicroMIPS eXtended Physical Addressing (XPA) instructions. + Release 5 of the ISA. + Imagination interAptiv MR2 processor. + MIPS16e2 ASE for assembly and disassembly. * The SPARC port now supports the SPARC M8 processor, which implements the Oracle SPARC Architecture 2017. * Objdump's --line-numbers option can now be augmented via the new --inlines option so that inlined functions will display their nesting information. * Objcopy now has an option '--merge-notes' to reduce the size of notes in a binary file by merging and deleting redundant entries. * The AVR assembler has support for the __gcc_isr pseudo-instruction. This instruction is generated by GCC when it wants to create the prologue or epilogue of an interrupt handler. The assembler then ensures that the most optimal code possible is generated. Meanwhile in the mainline binutils sources: * The assembler now has support for location views in DWARF debug line information. This is part of a project to help improve the source code location information that the compiler provides to the debugger: https://developers.redhat.com/blog/2017/07/11/statement-frontier-notes-and-location-views/#more-437095 GDB Version 8.0 has been released. This release contains: * Support for C++ rvalue references. * Python scripting enhancements: + New functions to start, stop and access a running btrace recording. + Rvalue reference support in gdb.Type. * GDB commands interpreter: + User commands now accept an unlimited number of arguments. + The "eval" command now expands user-defined arguments. * DWARF version 5 support * GDB/MI enhancements: + New -file-list-shared-libraries command to list the shared libraries in the program. + New -target-flash-erase command, to erase flash memory. * Support for native FreeBSD/mips (mips*-*-freebsd) * Support for the Synopsys ARC and FreeBSD/mips targets. For a complete list and more details on each item, please see the gdb/NEWS file in the release sources. Meanwhile in the development sources the following new features have been added: * On Unix systems, GDBserver now does globbing expansion and variable substitution in inferior command line arguments. * New commands + set debug separate-debug-file + show debug separate-debug-file These control the display of debug output about separate debug file search. -- GCC Version 7.1 has been released. Most of the enhancements and new features in this release have already been reported in earlier versions of th
gcc-5-20170725 is now available
Snapshot gcc-5-20170725 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20170725/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch revision 250547 You'll find: gcc-5-20170725.tar.xzComplete GCC SHA256=0598d42f7f296375fb471386aec4e600a26a6864f8d99ca6676403a195f12c3b SHA1=917cb555e88ed50e14dd5647beca173526cef72c Diffs from 5-20170718 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Remove broken GCC 7.1 GCJ manual links
Hi Krisztian, On Thu, 29 Jun 2017, Paczári Krisztián wrote: > GCJ has been removed from GCC 7.1, so these broken links should also be > removed from the documentation page (https://gcc.gnu.org/onlinedocs/) > and probably from the scripts generating them: "GCC 7.1 GCJ Manual (also > in PDF or PostScript or an HTML tarball)" thanks for the heads-up. Just to confirm that Jakub has addressed this two weeks ago. If you seen anything left or anything else, please advise. Gerald