Volatile Memory accesses in Branch Delay Slots

2017-07-25 Thread Jakob Wenzel

Hi all,

we are currently porting GCC to our own RISC architecture, which is 
similar to MIPS. This architecture contains one unconditional branch 
delay slot. The effect I noticed also occurs on MIPS, so I will be 
focusing on that architecture in the following.


I noticed that GCC never puts accesses to volatile variables into the 
branch delay slot. For example, compiling this code on MIPS:


extern volatile int a;

void writeA() {
a = 42;
}

Leads to this assembly code:

writeA:
.frame  $sp,0,$31   # vars= 0, regs= 0/0, args= 0, gp= 0
.mask   0x,0
.fmask  0x,0
.setnoreorder
.setnomacro
lui $28,%hi(__gnu_local_gp)
addiu   $28,$28,%lo(__gnu_local_gp)
lw  $2,%got(a)($28)
li  $3,42   # 0x2a
sw  $3,0($2)
jr  $31
nop

jr's delay slot is not filled. However, if the declaration of a is 
changed to `extern int a`, the delay slot is filled with the sw.


The function responsible for this behavior seems to be 
resource_conflicts_p in reorg.c. Sadly, I could not find any comments 
explaining why volatile accesses cannot be put into delay slots.


What is the reason for this behavior? I am unable to think of any 
situation where allowing volatile memory accesses in branch delay slots 
leads to problems. Am I missing a case? Or are negative effects limited 
to other architectures?


Regards,
Jakob

--
M.Sc. Jakob Wenzel
Fachgebiet Rechnersysteme
Fachbereich 18, Elektrotechnik und Informationstechnik
Technische Universität Darmstadt
Merckstraße 25
D-64283 Darmstadt
Tel: 06151-1621154


Re: Volatile Memory accesses in Branch Delay Slots

2017-07-25 Thread Eric Botcazou
> The function responsible for this behavior seems to be
> resource_conflicts_p in reorg.c. Sadly, I could not find any comments
> explaining why volatile accesses cannot be put into delay slots.
> 
> What is the reason for this behavior? I am unable to think of any
> situation where allowing volatile memory accesses in branch delay slots
> leads to problems. Am I missing a case? Or are negative effects limited
> to other architectures?

Delay slot filling is a code movement optimization and such optimizations are 
not valid for volatile memory accesses in the general case.

-- 
Eric Botcazou


GCC 7.2 Status report (2017-07-25)

2017-07-25 Thread Richard Biener

Status
==

It's time to do a GCC 7.2 release and thus please check if you have
backports for regression or wrong-code bugs pending.  The plan is to
do GCC 7.2 RC1 mid next week and a release roughly a week after that.


Quality Data


Priority  #   Change from last report
---   ---
P10
P2  141   +  45
P33
P4  156   +   9
P5   28   -   1
---   ---
Total P1-P3 144   +  45
Total   328   +  53


Previous Report
===

https://gcc.gnu.org/ml/gcc/2017-04/msg00080.html


Re: Volatile Memory accesses in Branch Delay Slots

2017-07-25 Thread Oleg Endo
On Tue, 2017-07-25 at 10:47 +0200, Jakob Wenzel wrote:
> 
> jr's delay slot is not filled. However, if the declaration of a is 
> changed to `extern int a`, the delay slot is filled with the sw.
> 
> The function responsible for this behavior seems to be 
> resource_conflicts_p in reorg.c. Sadly, I could not find any
> comments 
> explaining why volatile accesses cannot be put into delay slots.
> 
> What is the reason for this behavior? I am unable to think of any 
> situation where allowing volatile memory accesses in branch delay
> slots  leads to problems. Am I missing a case? Or are negative
> effects limited  to other architectures?

Maybe because the code that does the delay slot stuffing does not do
sophisticated checks whether such instruction reordering would not
violate anything?  So it's playing safe and bails out if it sees
"volatile mem".  Same thing happens also with insns that have multiple
sets.  Ideally it should do some more fine grained checks and give the
backend an option to opt-in or opt-out.

Cheers,
Oleg


64-bit PowerPC and small data area?

2017-07-25 Thread Sebastian Huber

Hello,

in the PowerPC ELFv2 specification

https://members.openpowerfoundation.org/document/dl/576

we have

"3.4.2 Use of the Small Data Area

For a data item in the .sdata or .sbss sections, a compiler may generate 
short-form one-instruction refer-
ences. In an executable file or shared library, such a reference is 
relative to the address of the TOC base
symbol (which can be obtained from r2 if a TOC pointer is initialized). 
A compiler that generates code using
the small data area should provide an option to select the maximum size 
of objects placed in the small data
area, and a means of disabling any use of the small data area. When 
generating code for ELF shared
libraries, the small data area should not be used for default-visibility 
global objects. This is to satisfy ELF
shared-library symbol interposition rules. That is, an ordinary global 
symbol in a shared library may be over-
ridden by a symbol of the same name defined in the executable or another 
shared library. Supporting interpo-
sition when using TOC-pointer relative addressing would require text 
relocations."


I tried to generate code using the small data area on a 64-bit PowerPC 
GCC, but I was not successful. We have in the GCC sources 
(gcc/config/rs6000/rs6000.c):


/* Return 1 for an operand in small memory on V.4/eabi.  */

int
small_data_operand (rtx op ATTRIBUTE_UNUSED,
machine_mode mode ATTRIBUTE_UNUSED)
{
#if TARGET_ELF
  rtx sym_ref;

  if (rs6000_sdata == SDATA_NONE || rs6000_sdata == SDATA_DATA)
return 0;

  if (DEFAULT_ABI != ABI_V4)
return 0;

So, it looks like the small data stuff is not support for ABI_ELFv2? Are 
there main issues with the small data area using ELVv2 or is this simply 
not implemented due to a lack of interested?



--
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.hu...@embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.



Re: Volatile Memory accesses in Branch Delay Slots

2017-07-25 Thread Jeff Law
On 07/25/2017 06:32 AM, Oleg Endo wrote:
> On Tue, 2017-07-25 at 10:47 +0200, Jakob Wenzel wrote:
>>  
>> jr's delay slot is not filled. However, if the declaration of a is 
>> changed to `extern int a`, the delay slot is filled with the sw.
>>
>> The function responsible for this behavior seems to be 
>> resource_conflicts_p in reorg.c. Sadly, I could not find any
>> comments 
>> explaining why volatile accesses cannot be put into delay slots.
>>
>> What is the reason for this behavior? I am unable to think of any 
>> situation where allowing volatile memory accesses in branch delay
>> slots  leads to problems. Am I missing a case? Or are negative
>> effects limited  to other architectures?
> 
> Maybe because the code that does the delay slot stuffing does not do
> sophisticated checks whether such instruction reordering would not
> violate anything?  So it's playing safe and bails out if it sees
> "volatile mem".  Same thing happens also with insns that have multiple
> sets.  Ideally it should do some more fine grained checks and give the
> backend an option to opt-in or opt-out.
Essentially, the mantra has always been "be very conservative with
volatile objects" -- in the context of reorg that means little/no effort
is expended in trying to use a volatile memory access to fill a delay slot.

A volatile memory reference in a nullified delay slot may not do the
expected thing, depending on when/how nullification occurs within the
processor.   More generally, all of the speculative delay slot fillers
would be a concern if volatile accesses were allowed in delay slots.

I could speculate that fill_simple_delay_slots could probably safely be
improved to utilize instructions with volatile memory operands to fill
slots.  But it hardly seems worth the effort given the directions in
processor design/implementation over the last 20+ years.


Jeff


Summer 2017 GNU Toolchain Update

2017-07-25 Thread Nick Clifton
Hi Guys,

  It has been a long time since my last post on the developments in
  the toolchain, so there is lots to report:

---
Binutils:

  Version 2.29 has been released.

  In addition to previous changes already detailed in this blog, this
  release also contains:

* Support for placing sections into special memory areas on
  systems that use virtual memory managers.  This is like the
  MEMORY command in linker scripts except that that only works
  on systems without a memory management unit.

  With the new system sections can be marked as requiring a
  particular kind of special memory.  The linker collects together
  all of the sections with the same requirements and places them
  into a specially marked segment.  The loader can then detect
  this segment's requirements and ensure that the right kind of
  memory is used.

* Support for the WebAssembly file format and conversion to the
  wasm32 ELF format.

* The PowerPC assembler now checks that the correct register class
  is used in instructions.

* The ARM assemblers now support the ARMv8-R architecture and
  Cortex-R52 processors.

* The linker now supports ELF GNU program properties.  These are
  run-time notes intended for the loader that tell it more about
  the binary that it is initializing.

* The linker contains support for Intel's Indirect Branch Tracking
  (IBT) enhancement.  This is a technology intended to help fight
  malicious code that abuses the stack to force unwanted behaviour
  from a program.  For more information see:
  
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-enforcement-technology-preview.pdf

* Section groups can now be resolved (the group deleted and the
  group members placed like normal sections) at partial link time
  either using the new linker option --force-group-allocation or
  by placing FORCE_GROUP_ALLOCATION into the linker script.
  
* The MIPS port now supports:

+ MicroMIPS eXtended Physical Addressing (XPA) instructions.
+ Release 5 of the ISA.
+ Imagination interAptiv MR2 processor.
+ MIPS16e2 ASE for assembly and disassembly.

* The SPARC port now supports the SPARC M8 processor, which
  implements the Oracle SPARC Architecture 2017.

* Objdump's --line-numbers option can now be augmented via the new
  --inlines option so that inlined functions will display their
  nesting information. 

* Objcopy now has an option '--merge-notes' to reduce the size of
  notes in a binary file by merging and deleting redundant entries. 

* The AVR assembler has support for the __gcc_isr
  pseudo-instruction.  This instruction is generated by GCC when
  it wants to create the prologue or epilogue of an interrupt
  handler.  The assembler then ensures that the most optimal code
  possible is generated.

  Meanwhile in the mainline binutils sources:

  * The assembler now has support for location views in DWARF debug
line information.  This is part of a project to help improve the
source code location information that the compiler provides to the
debugger:

https://developers.redhat.com/blog/2017/07/11/statement-frontier-notes-and-location-views/#more-437095


GDB

  Version 8.0 has been released.  This release contains:

* Support for C++ rvalue references.

* Python scripting enhancements:
  + New functions to start, stop and access a running btrace
recording.
  + Rvalue reference support in gdb.Type.

* GDB commands interpreter:
  + User commands now accept an unlimited number of arguments.
  + The "eval" command now expands user-defined arguments.

* DWARF version 5 support

* GDB/MI enhancements:
  + New -file-list-shared-libraries command to list the shared
libraries in the program. 
  + New -target-flash-erase command, to erase flash memory.

* Support for native FreeBSD/mips (mips*-*-freebsd)

* Support for the Synopsys ARC and FreeBSD/mips targets.

  For a complete list and more details on each item, please see the
  gdb/NEWS file in the release sources.

  Meanwhile in the development sources the following new features have
  been added:

* On Unix systems, GDBserver now does globbing expansion and
  variable substitution in inferior command line arguments.

* New commands  
  + set debug separate-debug-file
  + show debug separate-debug-file
  These control the display of debug output about separate debug
  file search.

--
GCC

  Version 7.1 has been released.  Most of the enhancements and new
  features in this release have already been reported in earlier
  versions of th

gcc-5-20170725 is now available

2017-07-25 Thread gccadmin
Snapshot gcc-5-20170725 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20170725/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 250547

You'll find:

 gcc-5-20170725.tar.xzComplete GCC

  SHA256=0598d42f7f296375fb471386aec4e600a26a6864f8d99ca6676403a195f12c3b
  SHA1=917cb555e88ed50e14dd5647beca173526cef72c

Diffs from 5-20170718 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Remove broken GCC 7.1 GCJ manual links

2017-07-25 Thread Gerald Pfeifer
Hi Krisztian,

On Thu, 29 Jun 2017, Paczári Krisztián wrote:
> GCJ has been removed from GCC 7.1, so these broken links should also be 
> removed from the documentation page (https://gcc.gnu.org/onlinedocs/) 
> and probably from the scripts generating them: "GCC 7.1 GCJ Manual (also 
> in PDF or PostScript or an HTML tarball)"

thanks for the heads-up.  Just to confirm that Jakub has addressed
this two weeks ago.  If you seen anything left or anything else, please
advise.

Gerald