Re: git conversion in progress

2020-01-14 Thread Georg-Johann Lay

Am 11.01.20 um 02:18 schrieb Joseph Myers:

I encourage people to continue to work on improving the documentation for
using git with GCC
( and
 list some of
the things that it seems it might be useful to document).  Once the
conversion is done we'll need to update various places referring to SVN on
the website and in the manuals.


Hi, some random notes:

In order to set user.name and user.email, the doc is using --global.  At 
least for me, the latter is not what I want because I am using git in 
other contexts than contributing to GCC (and am using different e-mail 
then).


In my git setup there is the following which I found quite useful when 
changing .md, or more specifically diff'ing md's and having them reviewed.


diff files allow to indicate what has been changed, for example when you 
changed some function, in the respective hunk the function name is 
displayed.  We can achieve the same for define_insn and similar:


In .gitconfig:

[diff "md"]
  # .git/info/attributes maps *.md to diff=md
  # You can also use .gitattributes for this
  xfuncname = "^[ \t]*\\([ \t]*define_.*$"

This defines the hunk to match define_insn, define_expand etc.  The 
mapping can be performed in the files mentioned above and will read:


# Allow for custom diff hunks in your (global) .gitconfig
# e.g. for machine descriptions the respective entry could read
#
# [diff "md"]
#  # GCC/.gitattributes maps *.md to diff=md
#  xfuncname = "^[ \t]*\\([ \t]*define_.*$"

# GCC machine description
*.md  diff=md

I am not planning to propose respective changes, so if you find it 
helpful, maybe some maintainer can add it.


To see the difference, try

$ git show ec9b602c167fb7dfba717

for example.  In the case it's regarded as helpful, maybe it can added 
where it applies to all branches, not only to one specific branch. 
(Dunno what changing .git/info, .git/config actually does, presumably it 
only applies if the repo is cloned).



Sometimes, one wants a different editor for commit and merge messages 
than the default one (dunno which one this is, presumably determined by 
some env variable).


The respective entry in .gitconfig is:
[core]
editor = my-editor

or can be achieved by

git config [--global] core.editor="...".  In my case, it's a script that 
gets diffs from ChangeLog deltas and composes them to a commit message 
and then calls my preferred editor on that.


And a question wrt e-mail addresses: I have 2 addresses associated with 
my GCC contributions: one @gcc.gnu.org from my GCC account and one 
non-gnu as mentioned in MAINTAINERS.  Which one should I use for commits?
As far as I remember, the gnu address can cause some problems, but I 
don't remember where...



Johann


Re: git conversion in progress

2020-01-14 Thread Georg-Johann Lay

Am 11.01.20 um 02:18 schrieb Joseph Myers:

I encourage people to continue to work on improving the documentation for
using git with GCC
( and
 list some of
the things that it seems it might be useful to document).


Nitpick:

https://gcc.gnu.org/git.html#tags "Branches and Tags"

writes:

A branch called branchname can be checked out with the following command:

git clone -b branchname ...

Referring to this as "checking out" is confusing IMO, because it may be 
confused with

git checkout -b branchname

Whereas the latter is a checkout of some branch (or changing the branch) 
within the same local copy, the former is *cloning* into an individual, 
fresh local copy which has only the mentioned branch.


Dito for tags.

Cloning with only 1 branch is usually done because it faster downloads 
and consumes less local memory.  When one works with more than one 
branch (usually when backporting stuff etc.), the mentioned


git clone --reference original-gcc ...

might be preferable.

Johann


Re: git conversion in progress

2020-01-14 Thread Georg-Johann Lay

Am 14.01.20 um 12:34 schrieb Andreas Schwab:

On Jan 14 2020, Georg-Johann Lay wrote:


git clone --reference original-gcc ...


Don't use --reference.  It is too easy to lose work if you don't know
what you are doing.

Andreas.


Well, then it should not be proposed in git.html then?

There may be git noobs like me that follow the recommendations from there.

Johann





Re: git conversion in progress

2020-01-16 Thread Georg-Johann Lay

Am 11.01.20 um 02:18 schrieb Joseph Myers:

I encourage people to continue to work on improving the documentation for
using git with GCC


Hi, the front page reads "Our sources are readily and freely available 
via SVN...", similar recommendation for SVN in


https://gcc.gnu.org/snapshots.html

Johann


Re: subversion status on gcc.gnu.org

2020-04-02 Thread Georg-Johann Lay

Am 20.03.20 um 18:37 schrieb Frank Ch. Eigler via Gcc:

Hi -

Both svn: and ssh+svn: now work for your archeological needs.
Further, URLs such as

https://gcc.gnu.org/viewcvs?rev=279160&root=gcc&view=rev
https://gcc.gnu.org/r123456

are mapped to gitweb searches that try to locate the matching
From-SVN: rABCDEF commit.  This way, historical URLs from bugzilla
should work.


This does not work as expected.  For example,

https://gcc.gnu.org/r2000

maps to a search and you get thousands of hits for SVN: r2000**

Johann



Re: Support for named address spaces in C++

2020-06-26 Thread Georg-Johann Lay

Andrew Pinski via Gcc schrieb:

On Wed, Jun 3, 2020 at 2:32 PM Max Ruttenberg via Gcc  wrote:

Hi all,

I’ve added a named address space to our backend and I noticed that it is only 
support in C.
Has anyone had experience porting this feature to C++? Is there any technical 
reason why it’s not supported?


The main issue is how it is interacts with templates and then
mangling.  There was a few other issues that have been posted about
before every time it is raised.

Thanks,
Andrew



AFAIK llvm / clang supports named address spaces in C++, so it is 
obviously possible and feasible.


Johann



Re: GTY / gengtype question - adding a new header file

2015-09-01 Thread Georg-Johann Lay

Steve Ellcey schrieb:

I have a question about gengtype and GTY.  I was looking at adding some
code to mips.c and it occurred to me that that file was getting very
large (19873 lines).  So I wanted to add a new .c file instead but that
file needed some types that were defined in mips.c and not in a header file.
Specifically it needed the MIPS specific machine_function structure that
is defined in mips.c with:

struct GTY(())  machine_function {

I think I could just move this to mips.h and things would be fine but
I didn't want to do that because mips.h is included in tm.h and is visible
to the generic GCC code.  Currently machine_function is not visible to the
generic GCC code and so I wanted to put machine_function in a header file
that could only be seen/used by mips specific code.  So I created
mips-private.h and added it to extra_headers in config.gcc.

The problem is that if I include mips-private.h in mips.c instead of
having the actual definition of machine_function in mips.c then my
build fails and I think it is due to how and where gengtype scans for GTY
uses.


Shouldn't it be enough to add the file to config.gcc:target_gtfiles ? 
...and of course including the respective gt-*.h at the bottom of the 
cxx source.



I couldn't find an example of a platform that has a machine specific header
file that was not visible to the generic GCC code and that has GTY types
in it so I am not sure what I need to do to get gengtype to scan
mips-private.h or if this is even possible (or wise).


I'd have a look at what BEs are using non-default target_gtfiles.

Johann




Re: gcc-4.9.1 generating different code between two successive builds

2015-12-30 Thread Georg-Johann Lay

Cole schrieb:

Hi,

I am busy trying to generate a package for gcc that is consistent
between two successive builds, and I am now down to the final few
files.

I am stuck with the file: cilk-abi-cilk-for.o, which is obviously
built with -O2, but between two successive builds, the assembly code
generated is different. Please refer to [1] for output and details.

Would anyone have any suggestions that I might be able to try to get
the file to generate the same assembly code?


IIRC the mentioned problem with qsort is in the register allocator as it 
sorts registetr classes according to their cost.  If two different 
classes have the same cost then the result depends on the qsort 
implementation.  This can lead to different code across different 
platforms, but the code built on one specific platform should be the same.


Some parts of the compiler use the address of objects to compute hashes, 
but I don't remember which part(s) actually do this.  That technique can 
lead to different code for different runs of the compiler even on the 
same system.  This is hard to reproduce as it depends on how the OS is 
supplying memory, and it might depend on the "history" of the machine 
and the actual OS.


Johann



Re: avr-gcc doesn't know address space wraps?

2016-01-03 Thread Georg-Johann Lay

Ralph Doncaster schrieb:

avr-gcc 4.9.2 doesn't seem to know that the address space wraps, so
that an rjmp in the last 2KB of the address space can reach code in
the first 2KB.  The following code works fine with a jmp, but if I
change the jmp ResetVector to rjmp, I get:
(.bootloader+0x4): relocation truncated to fit: R_AVR_13_PCREL against
`no symbol'

I'm compiling with:
avr-gcc -mmcu=atmega328p -nostartfiles
-Wl,-section-start=.bootloader=0x7E00   picobootSTK500.S   -o
picobootSTK500


I don't quite get what your question has to do with the compiler proper. 
  As you are programming in assembly, the only thing that avr-gcc does 
is calling the assembler and the linker for you.



.text
; this will be address 0x which is the reset vector
; user application will over-write blink code
; dimly lights LED (using internal pullup) with a 1.5 s cycle time
ResetVector:
Blink:
sbi PINB, LEDPIN
ldi ZH, 60
DelayLoop:
; 11.8M cycles =~ .74s @ 16Mhz
rcall Delay3Cycle   ; 256 * 3 cycles
sbiw ZL, 1
brne DelayLoop
rjmp Blink

; delay 3 cycles * r24 + 4 cycles (ret instruction)
Delay3Cycle:
dec r24
brne Delay3Cycle
ret


.section .bootloader,"ax",@progbits
; use -WL,--section-start=.bootloader=0xXf00

Boot:
in Temp, MCUSR
sbrs Temp, EXTRF
JmpReset:
jmp ResetVector; jump to application code





Re: avr-gcc doesn't know address space wraps?

2016-01-03 Thread Georg-Johann Lay

Ralph Doncaster schrieb:

On Sun, Jan 3, 2016 at 1:22 PM, Georg-Johann Lay  wrote:

Ralph Doncaster schrieb:

avr-gcc 4.9.2 doesn't seem to know that the address space wraps, so
that an rjmp in the last 2KB of the address space can reach code in
the first 2KB.  The following code works fine with a jmp, but if I
change the jmp ResetVector to rjmp, I get:
(.bootloader+0x4): relocation truncated to fit: R_AVR_13_PCREL against
`no symbol'

I'm compiling with:
avr-gcc -mmcu=atmega328p -nostartfiles
-Wl,-section-start=.bootloader=0x7E00   picobootSTK500.S   -o
picobootSTK500


I don't quite get what your question has to do with the compiler proper.
As you are programming in assembly, the only thing that avr-gcc does is
calling the assembler and the linker for you.


I don't quite get your point.  Are you suggesting I should report this
on the binutils list instead of here?


Well, you are using RJMP with an address that's out of scope, hence the 
linker/locator complains.  The compiler would not generate code like



rjmp ResetVector


The compiler would just emit "JMP ", hence you can use the same 
code sequence which always works.


Or are you having trouble with the compiler driver?  This is actually 
the only part of the compiler that's involved here...


If you prefer RJMP over JMP provided address is in the range of RJMP, 
you would usually link with --relax or call the driver with -mrelax.


If it's vital for your code to RJMP instead of JMP provided address is 
not in the range of RJMP but address+0x8000 happens to be, you can make 
explicit use of wrap around capability of RJMP by RJMP +0x8000.


Third approach is to link with --pmem-wrap-around= so that no explicit 
offset and no explicit RJMP is needed in the code.





Re: avr-gcc doesn't know address space wraps?

2016-01-03 Thread Georg-Johann Lay

Ralph Doncaster schrieb:

On Sun, Jan 3, 2016 at 2:36 PM, Georg-Johann Lay  wrote:

Ralph Doncaster schrieb:


On Sun, Jan 3, 2016 at 1:22 PM, Georg-Johann Lay  wrote:

Ralph Doncaster schrieb:

avr-gcc 4.9.2 doesn't seem to know that the address space wraps, so
that an rjmp in the last 2KB of the address space can reach code in
the first 2KB.

[...]

Well, you are using RJMP with an address that's out of scope, hence the
linker/locator complains.  The compiler would not generate code like


rjmp ResetVector

[...]

Third approach is to link with --pmem-wrap-around= so that no explicit
offset and no explicit RJMP is needed in the code.


Thanks for the reference to the --pmem-wrap-around= option.  I
understand what's going on now; even though the compiler knows the MCU
(-mmcu=), that does not get passed to the linker.  Since the linker
doesn't know the exact flash size of the part it is linking for, that
has to be specified with the --pmem-wrap-around option.  In this case,
with a m328p, adding "-Wl,--relax -Wl,--pmem-wrap-around=32k" does the
trick.


In the old days, the driver tried to deduce the flash size from -mmcu= 
by means of a specs trick.  Up to 4.9, avr.h reads:


#define LINK_SPEC "\
%{mrelax:--relax\
 %{mpmem-wrap-around:%{mmcu=at90usb8*:--pmem-wrap-around=8k}\
 %{mmcu=atmega16*:--pmem-wrap-around=16k}\
 %{mmcu=atmega32*|\
   mmcu=at90can32*:--pmem-wrap-around=32k}\
 %{mmcu=atmega64*|\
   mmcu=at90can64*|\
  mmcu=at90usb64*:--pmem-wrap-around=64k}}}\
 ..."

I ported this to the GCC 5 spec file approach of device support so that 
it should work in v5.  However the device names are much to diverse, and 
such an approach is not very sensible today (-mpmen-wrap-around isn't 
even documented).  A proper approach would add flash sizes to 
avr-mcus.def, and whereby eventually to device-specs.  The device-specs 
for your device, specs-atmega328p, reads


*link_pmem_wrap:
%{mpmem-wrap-around: --pmem-wrap-around=32k}

*link_relax:
%{mrelax:--relax %(link_pmem_wrap)}

so that it's a bit more convenient to let addresses wrap (link_relax is 
part of LINK_SPEC).  -mpmem-wrap-around is supposed to work with 4.9, 
but it does not for some reasons.


Need help with PR71976 combine.c::get_last_value returns a wrong result

2016-07-25 Thread Georg-Johann Lay
Tracking this wrong-code bug, I ended up in combine.c::get_last_value() which 
returns a wrong result.  gcc generates wrong code at least with: 4.9 head, 5.2, 
6.1 and trunk from today.


Before combine we have:

(insn 43 31 44 2 (set (reg:QI 18 r18)
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 44 43 45 2 (set (reg:QI 19 r19 [+1 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 45 44 46 2 (set (reg:QI 20 r20 [+2 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 46 45 47 2 (set (reg:QI 21 r21 [+3 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 47 46 48 2 (set (reg:QI 22 r22 [+4 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 48 47 49 2 (set (reg:QI 23 r23 [+5 ])
(reg:QI 65 [ _3+5 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 49 48 50 2 (set (reg:QI 24 r24 [+6 ])
(reg:QI 66 [ _3+6 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 50 49 51 2 (set (reg:QI 25 r25 [+7 ])
(reg:QI 67 [ _3+7 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 51 50 52 2 (set (reg:QI 16 r16)
(const_int 40 [0x28])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 52 51 61 2 (set (reg:DI 18 r18)
(ashiftrt:DI (reg:DI 18 r18)
(reg:QI 16 r16))) bug-combin.c:29 1417 {ashrdi3_insn}
 (expr_list:REG_DEAD (reg:QI 16 r16)
(nil)))

insn-combine combines 51 -> 52 and then tries to simplify the resulting
 (ashiftrt:DI (reg:DI 18 r18) (const_int 40)) in

simplify_shift_const_1 (code=ASHIFTRT, result_mode=DImode, 
varop=0x7734ee70, orig_count=40) at 
../../../gcc.gnu.org/trunk/gcc/combine.c:10181


note that reg:DI 18 is a hard register.  In that function, the following piece 
of code triggers which turns the shift offset from 40 to 0:


(gdb) p varop
$95 = (rtx) 0x7734ee70
(gdb) pr
(reg:DI 18 r18)

  /* An arithmetic right shift of a quantity known to be -1 or 0
 is a no-op.  */
  if (code == ASHIFTRT
  && (num_sign_bit_copies (varop, shift_mode)
  == GET_MODE_PRECISION (shift_mode)))
{
  count = 0;
  break;
}

num_sign_bit_copies() eventually calls 
combine.c::reg_num_sign_bit_copies_for_combine()


which returns const0_rtx because reg 18 is set in insn 43 to const0_rtx.  Total 
outcome is that the right shift of reg:DI 18 is transformed to a no-op move and 
cancelled out in the remainder.


IIUC get_last_value should return 0 in this case?

  /* If we don't have a value, or if it isn't for this basic block and
 it's either a hard register, set more than once, or it's a live
 at the beginning of the function, return 0.

 Because if it's not live at the beginning of the function then the reg
 is always set before being used (is never used without being set).
 And, if it's set only once, and it's always set before use, then all
 uses must have the same last value, even if it's not from this basic
 block.  */

  if (value == 0
  || (rsp->last_set_label < label_tick_ebb_start
  && (regno < FIRST_PSEUDO_REGISTER
  || regno >= reg_n_sets_max
  || REG_N_SETS (regno) != 1
  || REGNO_REG_SET_P
 (DF_LR_IN (ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb), regno
return 0;

value is const_int 0
rsp->last_set_label is 2
label_tick_ebb_start is 1

so that the test "regno < FIRST_PSEUDO_REGISTER" never applies...

It follows

get_last_value_validate (loc=0x7fff8ee0, insn=0x7730cc00, tick=2, 
replace=0) at ../../../gcc.gnu.org/trunk/gcc/combine.c:13062


(gdb) p insn
$5 = (rtx_insn *) 0x7730cc00
(gdb) pr
(insn 43 31 44 2 (set (reg:QI 18 r18)
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))

(gdb) p *loc
$6 = (rtx) 0x773094f0
(gdb) pr
(const_int 0 [0])

The comment of that function read:

/* Verify that all the registers and memory references mentioned in *LOC are
   still valid.  *LOC was part of a value set in INSN when label_tick was
   equal to TICK.  Return 0 if some are not.  If REPLACE is nonzero, replace
   the invalid references with (clobber (const_int 0)) and return 1.  This
   replacement is useful because we often can get useful information about
   the form of a value (e.g., if it was produced by a shift that always
   produces -1 or 0) even though we don't know exactly what registers it
   was produced from.  */

static int
get_last_value_validate (rtx *loc, rtx_insn *insn, int tick, int replace)
{

which does not make sense if *loc is a const_int because it has no "register" 
or "memory references".


Which place of get_last_value should handle such sets of hard regs, and how?

Should get_last_value_validate be called for const0_rtx ?

Johann



Re: Need help with PR71976 combine.c::get_last_value returns a wrong result

2016-07-26 Thread Georg-Johann Lay

On 25.07.2016 23:05, Segher Boessenkool wrote:

On Mon, Jul 25, 2016 at 02:28:28PM +0200, Georg-Johann Lay wrote:

(insn 43 31 44 2 (set (reg:QI 18 r18)
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))



(insn 51 50 52 2 (set (reg:QI 16 r16)
(const_int 40 [0x28])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 52 51 61 2 (set (reg:DI 18 r18)
(ashiftrt:DI (reg:DI 18 r18)
(reg:QI 16 r16))) bug-combin.c:29 1417 {ashrdi3_insn}
 (expr_list:REG_DEAD (reg:QI 16 r16)
(nil)))



  /* An arithmetic right shift of a quantity known to be -1 or 0
 is a no-op.  */
  if (code == ASHIFTRT
  && (num_sign_bit_copies (varop, shift_mode)
  == GET_MODE_PRECISION (shift_mode)))
{
  count = 0;
  break;
}

num_sign_bit_copies() eventually calls
combine.c::reg_num_sign_bit_copies_for_combine()

which returns const0_rtx because reg 18 is set in insn 43 to const0_rtx.
Total outcome is that the right shift of reg:DI 18 is transformed to a
no-op move and cancelled out in the remainder.


Why does num_sign_bit_copies return something bigger than 8?


Because it thinks the last value was const0_rtx (which is incorrect) and hence 
thinks a shift of a reg that's always 0 is the same as a no-op move of the reg 
to itself.


The problem might be that get_last_value() is called for reg:DI 18 and 
combine's internal bookkeeping is incorrect


struct reg_stat_type {
  /* Record last point of death of (hard or pseudo) register n.  */
  rtx_insn  *last_death;

  /* Record last point of modification of (hard or pseudo) register n.  */
  rtx_insn  *last_set;

.last_set is the set for reg:QI 18 but later insns also change hard reg:DI 18, 
for example the set of reg:QI 19.  This means that the information in 
reg_stat_type does not tell the whole story for hard regs.


One fix could be to get the mode precision of the SET from the last_set insn 
and only use the information if that mode is at least as wide as the mode of 
the register for which get_last_value is called.  reg_stat_type has 
.last_set_mode for this purpose (QImode in the test case).



IIUC get_last_value should return 0 in this case?

  /* If we don't have a value, or if it isn't for this basic block and
 it's either a hard register, set more than once, or it's a live
 at the beginning of the function, return 0.


We do have a value, and it is for this bb.


But not for the complete hard register; it's only for reg:QI 18 which is one 
byte of reg:DI 18, thus the code should never reach this point in the first 
place and return earlier.  For example:


Index: combine.c
===
--- combine.c   (revision 238701)
+++ combine.c   (working copy)
@@ -13206,6 +13206,13 @@ get_last_value (const_rtx x)
   && DF_INSN_LUID (rsp->last_set) >= subst_low_luid)
 return 0;

+  /* If the lookup is for a hard register make sure that value contains at 
least
+ as many bits as x does.  */
+
+  if (regno < FIRST_PSEUDO_REGISTER
+  && GET_MODE_PRECISION (rsp->last_set_mode) < GET_MODE_PRECISION 
(GET_MODE (x)))

+return 0;
+
   /* If the value has all its registers valid, return it.  */
   if (get_last_value_validate (&value, rsp->last_set, rsp->last_set_label, 0))
 return value;


Johann




Segher





Re: Need help with PR71976 combine.c::get_last_value returns a wrong result

2016-07-26 Thread Georg-Johann Lay

On 26.07.2016 14:51, Segher Boessenkool wrote:

On Tue, Jul 26, 2016 at 02:14:49PM +0200, Georg-Johann Lay wrote:

which returns const0_rtx because reg 18 is set in insn 43 to const0_rtx.
Total outcome is that the right shift of reg:DI 18 is transformed to a
no-op move and cancelled out in the remainder.


Why does num_sign_bit_copies return something bigger than 8?


Because it thinks the last value was const0_rtx (which is incorrect) and


Why is that incorrect?  See insn 43.  Is there another insn I missed?


Yes, the insn stream that matters is (prior to combine):

(insn 43 31 44 2 (set (reg:QI 18 r18)
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 44 43 45 2 (set (reg:QI 19 r19 [+1 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 45 44 46 2 (set (reg:QI 20 r20 [+2 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 46 45 47 2 (set (reg:QI 21 r21 [+3 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 47 46 48 2 (set (reg:QI 22 r22 [+4 ])
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 48 47 49 2 (set (reg:QI 23 r23 [+5 ])
(reg:QI 65 [ _3+5 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 49 48 50 2 (set (reg:QI 24 r24 [+6 ])
(reg:QI 66 [ _3+6 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 50 49 51 2 (set (reg:QI 25 r25 [+7 ])
(reg:QI 67 [ _3+7 ])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 51 50 52 2 (set (reg:QI 16 r16)
(const_int 40 [0x28])) bug-combin.c:29 56 {movqi_insn}
 (nil))
(insn 52 51 61 2 (set (reg:DI 18 r18)
(ashiftrt:DI (reg:DI 18 r18)
(reg:QI 16 r16))) bug-combin.c:29 1417 {ashrdi3_insn}
 (expr_list:REG_DEAD (reg:QI 16 r16)
(nil)))

The SETs of insns 43...50 all overlap reg:DI 18.  This is what expand does at 
it breaks up a DImode move into 8 QImode moves.  The first 8 instructions 
comprise an &= 0xff00.


It's corect that QI:18 is set to 0, but this does not apply to DI:18 of course.


The problem might be that get_last_value() is called for reg:DI 18 and
combine's internal bookkeeping is incorrect

struct reg_stat_type {
  /* Record last point of death of (hard or pseudo) register n.  */
  rtx_insn  *last_death;

  /* Record last point of modification of (hard or pseudo) register n.  */
  rtx_insn  *last_set;

.last_set is the set for reg:QI 18 but later insns also change hard reg:DI
18, for example the set of reg:QI 19.  This means that the information in
reg_stat_type does not tell the whole story for hard regs.

One fix could be to get the mode precision of the SET from the last_set
insn and only use the information if that mode is at least as wide as the
mode of the register for which get_last_value is called.  reg_stat_type has
.last_set_mode for this purpose (QImode in the test case).


Yes, last_set_mode, and there is last_set_sign_bit_copies too.  Are those
correct?


(gdb) p *rsp
$9 = {last_death = 0x0, last_set = 0x7730cc00, last_set_value = 
0x773094f0, last_set_table_tick = 2, last_set_label = 2, 
last_set_nonzero_bits = 0, last_set_sign_bit_copies = 8 '\b', last_set_mode = 
QImode, last_set_invalid = 1 '\001', sign_bit_copies = 0 '\000', nonzero_bits = 
0, truncation_label = 2, truncated_to_mode = DImode}


(gdb) p rsp->last_set
$10 = (rtx_insn *) 0x7730cc00
(gdb) pr
(insn 43 31 44 2 (set (reg:QI 18 r18)
(const_int 0 [0])) bug-combin.c:29 56 {movqi_insn}
 (nil))

(gdb) p rsp->last_set_value
$11 = (rtx) 0x773094f0
(gdb) pr
(const_int 0 [0])

The .last_set_sign_bit_copies is correct if it is applied to QImode, but the 
sign bits of reg:DI 18 are contained in reg:QI 25 set by insn 50.


As already said, rsp[regno] is not used to get the number of sign bit copies 
but to conclude that DI:18 is entirely set to 0, and the number of sign bit 
copies is then computed for that 0.



We do have a value, and it is for this bb.


But not for the complete hard register; it's only for reg:QI 18 which is
one byte of reg:DI 18,


The rest of the hard register bits are set to unspecified values.


No, insns 44...50 set the rest of reg:DI 18 to specified values.  This is what 
expand does as is lowers DImode to word_mode (QImode for avr).


FYI, FIRST_PSEUDO_REGISTER is 34.


thus the code should never reach this point in the
first place and return earlier.  For example:

Index: combine.c
===
--- combine.c   (revision 238701)
+++ combine.c   (working copy)
@@ -13206,6 +13206,13 @@ get_last_value (const_rtx x)
   && DF_INSN_LUID (rsp->last_set) >= subst_low_luid)
 return 0;

+  /* If the lookup is for a hard register make sure that value contains at
least
+ as many bits as x does.  */
+
+  if (regno < FIRST_PSEUDO_REGISTER
+  &

Re: Need help with PR71976 combine.c::get_last_value returns a wrong result

2016-07-27 Thread Georg-Johann Lay

Segher Boessenkool schrieb:

On Tue, Jul 26, 2016 at 03:38:18PM +0200, Georg-Johann Lay wrote:

@@ -13206,6 +13206,13 @@ get_last_value (const_rtx x)
  && DF_INSN_LUID (rsp->last_set) >= subst_low_luid)
return 0;

+  /* If the lookup is for a hard register make sure that value contains 
at

least
+ as many bits as x does.  */
+
+  if (regno < FIRST_PSEUDO_REGISTER
+  && GET_MODE_PRECISION (rsp->last_set_mode) < GET_MODE_PRECISION
(GET_MODE (x)))
+return 0;
+
  /* If the value has all its registers valid, return it.  */
  if (get_last_value_validate (&value, rsp->last_set, 
  rsp->last_set_label,

  0))
return value;

That might be a bit harsh.

In what respect?


It disables all kinds of optimisations that work now.


First things first: is the information recorded correct?
I think yes, but care must be taken what may be concluded from that 
information.  From a set of 8 bits you cannot draw conclusion about all 64 
bits; this should be obvious enough :-)


:-)

In the above case rsp[regno] holds only information for 1 sub-byte.  In 
order to get the complete DImode story we would have to get the info for 
all sub-parts and then put them together...


Yes, and the rsp stuff does not do that.

I am testing the following patch.  Thanks,


Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index 77e0d2b..dec6226 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -9977,6 +9977,9 @@ reg_num_sign_bit_copies_for_combine (const_rtx x, 
machine_mode mode,
   return NULL;
 }
 
+  if (GET_MODE_PRECISION (rsp->last_set_mode) != GET_MODE_PRECISION (mode))

+return NULL;
+
   tem = get_last_value (x);
   if (tem != 0)
 return tem;


But the problem is inside get_last_value.  You'd have to add such a 
check at /all/ call sites of that function.  Using a value for a hard 
reg that's been set in a smaller mode than the mode of get_last_value is 
just wrong.


For example combine tracks bits which are known to be zero, and if 
get_last_value is used in a similar situation, then the conclusion about 
zero-bits is also wrong.


Would be interesting to patch get_last_value so that it issues some 
diagnostic if


if (regno < FIRST_PSEUDO_REGISTER
&& (GET_MODE_PRECISION (rsp->last_set_mode)
< GET_MODE_PRECISION (GET_MODE (x

and then run the testsuite against that compiler.  I would expect that 
the condition doesn't even trigger on x86.


Johann



Re: Need help with PR71976 combine.c::get_last_value returns a wrong result

2016-07-29 Thread Georg-Johann Lay

On 29.07.2016 09:47, Segher Boessenkool wrote:

Hi Johann,

I tested a variant of your patch, building Linux for 32 different
(sub-)architectures; surprisingly (to me) there are no regressions
at all.


I am not so surprised because most backends don't make such an intense use of 
hard-regs like the avr BE does.




Do you want to send it to gcc-patches?


https://gcc.gnu.org/ml/gcc-patches/2016-07/msg01921.html

There might still problems linger around if hard-regs are used:

Suppose we set the reg in DImode and then get_last_value is called for the same 
reg in SImode.  Using the DI value might be wrong, e.g. if it is used to 
compute the sign (on little endian) or the number of sign bit copies.


Anyway, for now I am happy with the current solution.  Many thanks for your 
patient help.


Johann



Segher


diff --git a/gcc/combine.c b/gcc/combine.c
index 77e0d2b..750bf83 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -13240,6 +13240,12 @@ get_last_value (const_rtx x)
   && DF_INSN_LUID (rsp->last_set) >= subst_low_luid)
 return 0;

+  /* If fewer bits were set than what we are asked for now, we cannot use
+ the value.  */
+  if (GET_MODE_PRECISION (rsp->last_set_mode)
+  < GET_MODE_PRECISION (GET_MODE (x)))
+return 0;
+
   /* If the value has all its registers valid, return it.  */
   if (get_last_value_validate (&value, rsp->last_set, rsp->last_set_label, 0))
 return value;





Re: [avr] fno-caller-saves and regression tests

2016-08-12 Thread Georg-Johann Lay

On 09.08.2016 07:33, Senthil Kumar Selvaraj wrote:

Hi Johann,

  Turning off -fcaller-saves for AVR makes the testcase I had for PR 71873
  pass unless I explicitly add -fcaller-saves to force the compiler to
  generate the triggering insn patterns.

  Wonder if we should modify the existing test cases in gcc.target/avr to
  be tested both ways (with and without caller saves)? At least the
  register allocation related ones probably won't catch regressions.

Regards
Senthil


Testing both way makes only sense, IMO, if we actually "support" -fcaller-saves 
and if someone is going to fix reg alloc bugs.


The current live cycle of such ICEs is:

1) Someone reports register alloc ICE to bugzilla

2) It's not fixed because there are no resources

3) Bug is closed because life cycle of gcc branch ends

4) Problem pops up on newer branch and with different test case

5) goto 1)

I am already very happy if RA ICEs for default options get fixed.
Many thanks to you for digging into that difficult matter.

Re you original question, I am fine with whatever approach the one that gets 
involved with these RA bugs will prefer.


From my experience, caller-saves degrades code quality and increases 
probability of ICEs from reload.  If -fcaller-saves makes it easier for you to 
hunt or monitor respective bugs, feel free to adjust the test cases.  Or you 
might consider adding -fcaller-saves variant(s) to your torture options.


Johann




Re: How to avoid constant propagation into functions?

2016-12-07 Thread Georg-Johann Lay

On 07.12.2016 14:47, Jakub Jelinek wrote:

On Wed, Dec 07, 2016 at 04:12:48PM +0300, Alexander Monakov wrote:

[adding gcc@ for the compiler-testsuite-related discussion, please drop either
gcc@ or gcc-help@ from Cc: as appropriate in replies]

On Wed, 7 Dec 2016, Segher Boessenkool wrote:

For example, this might have impact on writing test for GCC:

When I am writing a test with noinline + noclone then my
expectation is that no such propagation happens, because
otherwise a test might turn trivial...


The usual ways to prevent that are to add some volatile, or an
asm("" : "+g"(some_var));   etc.


No, that doesn't sound right.  As far as I can tell from looking that the GCC
testsuite, the prevailing way is actually the noinline+noclone combo, not the
per-argument asms or volatiles.

This behavior is new in gcc-7 due to new IPA-VRP functionality. So -fno-ipa-vrp
gets the old behavior.  I think from the testsuite perspective the situation got
a bit worse due to this, as now in existing testcases stuff can get propagated
where the testcase used noinline+noclone to suppress propagation. This means
that some testcases may get weaker and no longer test what they were supposed
to.  And writing new testcases gets less convenient too.

However, this actually demonstrates how the noinline+noclone was not
future-proof, and in a way backfired now.  Should there be, ideally, a single
'noipa' attribute encompassing noinline, noclone, -fno-ipa-vrp, -fno-ipa-ra and
all future transforms using inter-procedural knowledge?


Or just disable IPA-VRP into functions with noclone attribute, consider
IPA-VRP as kind of virtual cloning.

Jakub



Are you saying that this is worth being reported as PR?

Johann



Re: Change the calling conventions only for the intrinsic functions.

2014-05-08 Thread Georg-Johann Lay

Am 05/07/2014 04:20 PM, schrieb Umesh Kalappa:

Hi All ,

We are porting GCC  4.8.1 for the customized hardware, where the
current calling convention used as arguments are passed  by stack and
return value by register.

But we do have some intrinsic functions(that are supplied by hardware
folks ) which has the calling convention like both arguments and
return value are passed by stack.

So question is that currently the backend using  usually  conventions
like arguments are  passed by stack ad return value by register ,But
how do we can model that only intrinsic uses the caller stack space
for return value  over registers.


Introduce a new function attribute and attach this attribute to the built-in's 
decl, e.g. in TARGET_INIT_BUILTINS, TARGET_BUILTIN_DECL. Then use the attribute 
to adjust the calling convention in TARGET_FUNCTION_ARG and alike.



Appreciate  any hints or shed some lights  on the same.


Johann





Re: RFC: Doc update for attribute

2014-05-20 Thread Georg-Johann Lay

Am 05/16/2014 07:16 PM, schrieb Carlos O'Donell:

On 05/12/2014 11:13 PM, David Wohlferd wrote:

After updating gcc's docs about inline asm, I'm trying to improve
some of the related sections. One that I feel has problems with
clarity is __attribute__ naked.

I have attached my proposed update. Comments/corrections are
welcome.

In a related question:

To better understand how this attribute is used, I looked at the
Linux kernel. While the existing docs say "only ... asm statements
that do not have operands" can safely be used, Linux routinely uses
asm WITH operands.


That's a bug. Period. You must not use naked with an asm that has
operands. Any kind of operand might inadvertently cause the compiler
to generate code and that would violate the requirements of the
attribute and potentially generate an ICE.


There is target hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS that is intended to 
cater that case.  For example, the documentation indicates it only works with 
optimization turned off.  But I don't know how reliable it is in general.  For 
avr target it works as expected.


https://gcc.gnu.org/onlinedocs/gccint/Misc.html#index-TARGET_005fALLOCATE_005fSTACK_005fSLOTS_005fFOR_005fARGS-4969

Johann




Re: Roadmap for 4.9.1, 4.10.0 and onwards?

2014-05-21 Thread Georg-Johann Lay

Am 05/20/2014 06:04 PM, schrieb Paulo Matos:

From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf
Of Basile Starynkevitch
Sent: 20 May 2014 16:29
To: Bruce Adams
Cc: gcc@gcc.gnu.org
Subject: Re: Roadmap for 4.9.1, 4.10.0 and onwards?

On Tue, 2014-05-20 at 11:09 +0100, Bruce Adams wrote:

Hi,
 I've been tracking the latest releases of gcc since 4.7 or so

(variously interested in C++1y support, cilk and openmp).

One thing I've found hard to locate is information about planned

inclusions for future releases.

As much relies on unpredictable community contributions I don't

expect there to be a concrete or reliable plan.


However, equally I'm sure the steering committee have some ideas

over

what ought to be upcoming releases.


As a whole, the steering committee does not have any idea, because GCC
development is based upon volunteer contributions.



I understand the argument but I am not sure it's the way to go. Even if the 
project is based on volunteer contributions it would be interesting to have a 
tentative roadmap. This, I would think, would also help possible beginner 
volunteers know where to start if they wanted to contribute to the project. So 
the roadmap could be a list of features (big or small) of bug fixes that we 
would like fixed for a particular version. Even if we don't want to name it 
roadmap it would still be interesting to have a list of things that are being 
worked on or on the process of being merged into mainline and therefore will 
make it to the next major version.

That being said I know it's hard to set sometime apart to write this kind of 
thing given most of us prefer to be hacking on GCC. From a newcomer point of 
view, however, not having things like a roadmap makes it look like the project 
is heading nowhere.


Isn't https://gcc.gnu.org/projects/ supposed to be such a page? (linked from 
the front page).  And in particular its "projects for beginner GCC hackers" 
link?  Or is it complete doc-rot?


Johann




Re: RFC: Doc update for attribute

2014-05-21 Thread Georg-Johann Lay

Am 05/20/2014 03:31 PM, schrieb Carlos O'Donell:

On 05/20/2014 03:59 AM, Georg-Johann Lay wrote:

Am 05/16/2014 07:16 PM, schrieb Carlos O'Donell:

On 05/12/2014 11:13 PM, David Wohlferd wrote:

After updating gcc's docs about inline asm, I'm trying to
improve some of the related sections. One that I feel has
problems with clarity is __attribute__ naked.

I have attached my proposed update. Comments/corrections are
welcome.

In a related question:

To better understand how this attribute is used, I looked at the
Linux kernel. While the existing docs say "only ... asm
statements that do not have operands" can safely be used, Linux
routinely uses asm WITH operands.


That's a bug. Period. You must not use naked with an asm that has
operands. Any kind of operand might inadvertently cause the
compiler to generate code and that would violate the requirements
of the attribute and potentially generate an ICE.


There is target hook TARGET_ALLOCATE_STACK_SLOTS_FOR_ARGS that is
intended to cater that case.  For example, the documentation
indicates it only works with optimization turned off.  But I don't
know how reliable it is in general.  For avr target it works as
expected.

https://gcc.gnu.org/onlinedocs/gccint/Misc.html#index-TARGET_005fALLOCATE_005fSTACK_005fSLOTS_005fFOR_005fARGS-4969


It's still a bug for now. That hook is there because we've allowed
bad code to exist for so long that at this point we must for legacy
reasons allow some type of input arguments in the asm. However, that
doesn't mean we should actively promote this feature or let users
use it (until we fix it).

Ideally you do want to use the named input arguments as "r" types to
avoid needing to know the exact registers used in the call sequence.


It won't work to let the compiler pick the used registers.


Referencing the variables by name and letting gcc emit the right
register is useful, but only if it works consistently and today it
doesn't.


In order to use the passed arguments, you must know the ABI and how / where the 
arguments are passed.  Then the asm template can use this information 
explicitly, but the asm itself still has no arguments.


The hook mentioned above is to avoid extra code besides (non-existing) 
pro/epilogue and the asm itself.  So that naked function with arguments and asm 
without arguments work at any optimization level.



Features that fail to work depending on the optimization level should
not be promoted in the documentation. We should document what works
and file bugs or fix what doesn't work.

Cheers,
Carlos.





PR63633: May middle-end come up width hard regs for insn expanders?

2014-10-24 Thread Georg-Johann Lay
Investigating PR63633 turned out that the middle-end calls insn expanders with 
hard registers, namely mulsi3 from avr back-end:


(define_expand "mulsi3"
  [(parallel [(set (match_operand:SI 0 "register_operand" "")
   (mult:SI (match_operand:SI 1 "register_operand" "")
(match_operand:SI 2 "nonmemory_operand" "")))
  (clobber (reg:HI 26))
  (clobber (reg:DI 18))])]
...

is being called with operands[0] = (reg:SI 22).  This overlaps hard (reg:DI 18) 
which extends from R18...R25.


mulsi3 assumes all operands are pseudo registers and consequently the generated 
insn raises an ICE in the remainder, and there are other cases for other 
expanders where wrong code gets generated because the clobbers clobber an 
hard-reg input operand.


Is it in order to have hard registers as arguments to expanders (and maybe also 
to insns) like that at expand time (pass .expand) ?


It is easy enough to kick these hard regs into new pseudos in the respective 
expanders, however the question arises where the culprit is:


back-end or middle-end?

Until PR63633 I've never seen middle-end coming up with hard regs that way, 
thus bunch of avr insns have been written under that -- maybe wrong -- 
assumption...



Johann


Re: PR63633: May middle-end come up width hard regs for insn expanders?

2014-10-24 Thread Georg-Johann Lay

Jeff Law schrieb:

On 10/24/14 10:29, Jakub Jelinek wrote:


But I'd say, if you can't handle hard regs in the operands (either 
general,

or some specific ones), you should
force the hard regs into pseudos (all hard regs, or just the problematic
ones) in the expander.
So in this case, check if they overlap with those 2 regs, and force the
input operands into pseudos if they do; the output will be harder, guess
you'd need to emit the pattern into a pseudo and emit_move_insn it
afterwards to the hard reg.
I'm pretty sure that's what Georg is going to do, my comment was that 
the existence of the hard register operand _may_ point to something else 
that we would want to fix.


jeff


Yes, that's the straight forward approach which works so far.  Bit 
tedious, but well...


In one case expmed generated different code, though: divmodhi instead of 
mulhi_highpart for HI division by const_int.  Cheating with costs did 
not help.  However for now I am mostly after correct, ICE-less code.


What I am concerned about is:

1) May it happen that a value lives in a hard-reg across the expander? 
The expander has no means to detect that situation and interfere, e.g.


hard-reg = source_value // middle-end
expand-code // back-end
sink_value = hard-reg   // middle-end

where "expand-code" is one defind_expand / define_insn that clobbers / 
changes (parts of) hard-reg but does *not* get hard-reg as operand. 
This is wrong code obviously.



2) May later, non-strict-rtl passes also come up with such situations, 
e.g. as they are using the same middle-end functions to emit their code?


One example is expmed.c:expand_divmod() which is called from RTL loop 
optimizers (and stumbles upon PR59559 btw.)


What if no new pseudos are available at that time?


The point is that the middle-end could know that it is generating bad 
code...



Johann



Re: PR63633: May middle-end come up width hard regs for insn expanders?

2014-10-27 Thread Georg-Johann Lay

Am 10/24/2014 08:29 PM, schrieb Jakub Jelinek:

On Fri, Oct 24, 2014 at 08:19:57PM +0200, Georg-Johann Lay wrote:

Yes, that's the straight forward approach which works so far.  Bit tedious,
but well...

In one case expmed generated different code, though: divmodhi instead of
mulhi_highpart for HI division by const_int.  Cheating with costs did not
help.  However for now I am mostly after correct, ICE-less code.

What I am concerned about is:

1) May it happen that a value lives in a hard-reg across the expander? The
expander has no means to detect that situation and interfere, e.g.

hard-reg = source_value // middle-end
expand-code // back-end
sink_value = hard-reg   // middle-end

where "expand-code" is one defind_expand / define_insn that clobbers /
changes (parts of) hard-reg but does *not* get hard-reg as operand. This is
wrong code obviously.


It can happen, but if it happens, that would mean user code bug, like using
register asm with an register that is unsuitable for use as global or local
register var on the target, or it could be backend bug (expansion of some
pattern clobbering register that has other fixed uses).
You shouldn't ICE on it, but what happens is undefined.


Now I have a test case where exact that happens:

- An expander needs to clobber a hard register.
- That hard reg is live at that time and *not* used as operand to the expander
- That hard reg is neither fixed nor a global register.
- That hard reg is neither used for argument passing nor for frame pointer
  or stack pointer and not for any exotic use like static chain etc.

For the cases where the hard register is operand to the expander I have a 
patch; is it possible to get it in 4.9.2?  It fixes both ICEs and also (but not 
all) wrong code situations.


AFAIKT the remaining wrong code situations are as explained above.  The hard 
register is assigned a const_int too early so that it gets clobbered by the 
mentioned expander.


I still think that's a bug in the middle ends...

Johann



Before RA, the use of hard regs should be limited (pretty much just fixed
regs where really necessary, global and local register variables (user needs
to use with care), function arguments and return values (short lived around
the call patterns).

Jakub





Re: PR63633: May middle-end come up width hard regs for insn expanders?

2014-10-28 Thread Georg-Johann Lay

Am 10/27/2014 08:43 PM, schrieb Jeff Law:

On 10/27/14 13:33, Georg-Johann Lay wrote:

Am 10/24/2014 08:29 PM, schrieb Jakub Jelinek:

On Fri, Oct 24, 2014 at 08:19:57PM +0200, Georg-Johann Lay wrote:

Yes, that's the straight forward approach which works so far.  Bit
tedious,
but well...

In one case expmed generated different code, though: divmodhi instead of
mulhi_highpart for HI division by const_int.  Cheating with costs did
not
help.  However for now I am mostly after correct, ICE-less code.

What I am concerned about is:

1) May it happen that a value lives in a hard-reg across the
expander? The
expander has no means to detect that situation and interfere, e.g.

hard-reg = source_value // middle-end
expand-code // back-end
sink_value = hard-reg   // middle-end

where "expand-code" is one defind_expand / define_insn that clobbers /
changes (parts of) hard-reg but does *not* get hard-reg as operand.
This is
wrong code obviously.


It can happen, but if it happens, that would mean user code bug, like
using
register asm with an register that is unsuitable for use as global or
local
register var on the target, or it could be backend bug (expansion of some
pattern clobbering register that has other fixed uses).
You shouldn't ICE on it, but what happens is undefined.


Now I have a test case where exact that happens:

- An expander needs to clobber a hard register.
- That hard reg is live at that time and *not* used as operand to the
expander
- That hard reg is neither fixed nor a global register.
- That hard reg is neither used for argument passing nor for frame pointer
   or stack pointer and not for any exotic use like static chain etc.

So can you give more details about the live hard reg?  How did a value get into
that hard reg, how did it get used later? That's where I'd focus my efforts
since that probably shouldn't be happening except in very special circumstances.


The C test case is


__attribute__((noinline,noclone))
unsigned bug_mulhi_highpart (unsigned x)
{
register unsigned reg26 asm ("26") = 10;
a = x / 10;
__asm volatile ("; %0 " : "+r" (reg26));
return reg26;
}

int main (void)
{
if (10 != bug_mulhi_highpart (0))
__builtin_abort();
return 0;
}


The specification guarantees reg26 to be allocated to R26/27 only at the time 
it is used as operand to the inline asm.


I am fine if the code above is no legal use of local register variables. 
However the question arises how a user writing such code can know that the 
generated code will be wrong?


As you told the compiler has no means to detect the situation as reg26 is not 
used as operand for divmod.  Sometimes the middle-end generates mul_highpart 
instead so that a user cannot tell what registers are to be avoided even if he 
is familiar with GCC internals...


Johann



How to update reg_dead notes

2015-02-24 Thread Georg-Johann Lay
Hi, in order to fix PR64331 I tried to implement new target-specific passes 
whose sole purpose is to recompute REG_DEAD notes.


The avr BE relies on correct dead notes which are used in 
avr.c:reg_unused_after which uses dead_or_set_p.  avr BE needs correct dead 
notes in ADJUST_INSN_LENGTH, get_attr_length, get_attr_min_length, etc.


After trying for more than one day I am really frustrated; each approach ended 
up in seg_faults somewhere in df.


Following the source comments in df-core.c, recomputing dead notes should be as 
easy as



  df_note_add_problem ();
  df_analyze ();

in the execute() method of the new pass.


As this (and many many other tries using df_scan_alloc, df_scan_blocks 
df_finish_pass, df_insn_rescan_all, etc.) always crashes the compiler, I must 
do something completely wrong...


Could you give me some advice on correct usage of df or even more preferred 
point me to a comprehensible documentation of df which is more complete than in 
df-core.c?


Internals don't treat df, and the source comments are not really helpful, e.g. 
the complete documentation of df_analyze is /* Analyze dataflow info.  */.  Not 
a single word about prerequisites (except that it must run after df_init and 
before df_finish and needs correct cfg).


One example of a crash is that df->insns[uid] is being accessed and 
dereferenced where uid is a valid uid but df->insns[uid] is NULL.


df-core.c mentions "given instances of df".  How do I get one? The only 
instance I can find is the global struct df_f *df.  Does this mean one has to 
mess around with that global variable?





Re: How to update reg_dead notes

2015-02-24 Thread Georg-Johann Lay

Am 02/24/2015 um 02:11 PM schrieb Kenneth Zadeck:

On 02/24/2015 06:41 AM, Georg-Johann Lay wrote:

Hi, in order to fix PR64331 I tried to implement new target-specific passes
whose sole purpose is to recompute REG_DEAD notes.

The avr BE relies on correct dead notes which are used in
avr.c:reg_unused_after which uses dead_or_set_p.  avr BE needs correct dead
notes in ADJUST_INSN_LENGTH, get_attr_length, get_attr_min_length, etc.

After trying for more than one day I am really frustrated; each approach
ended up in seg_faults somewhere in df.

Following the source comments in df-core.c, recomputing dead notes should be
as easy as


  df_note_add_problem ();
  df_analyze ();

in the execute() method of the new pass.


As this (and many many other tries using df_scan_alloc, df_scan_blocks
df_finish_pass, df_insn_rescan_all, etc.) always crashes the compiler, I must
do something completely wrong...

Could you give me some advice on correct usage of df or even more preferred
point me to a comprehensible documentation of df which is more complete than
in df-core.c?


Hi, thanks for answering.


It is generally as easy as just adding the problem and calling df_analyze.
You tend to get into trouble if the rtl is not good, i.e. there is improper 
sharing or other violations of the canonical rtl rules.  DF does not like 
improperly shared rtl and it has not been uncommon for port specific passes to 
play loose with these rules.


Currently there are no port-specific passes. The only port-specific passes are 
the new passes which I am trying to add in order to keep REG_DEAD notes up to date.



I would suggest that first you try a build where you turn on all of the rtl 
checking and see if that comes out clean.

Kenny


Okay, here we go.  I configured avr-gcc from recent 4.9-branch with 
--enable-checking=all on x86-linux-gnu:


$ ../../gcc.gnu.org/gcc-4_9-branch/configure --target=avr 
--prefix=/local/gnu/install/gcc-4.9 --disable-nls --with-dwarf2 
--enable-target-optspace=yes --with-gnu-ld --with-gnu-as 
--enable-languages=c,c++ --enable-checking=all


The passes are added by means of the attached patch and installed right before 
passes which might use avr.c:reg_unused_after, i.e. passes that query for 
(minimum) insn lengths: .bbro, .compgotos, .shorten and .final.


The compiler crashes when it builds libgcc.  The test case is attached as 
libgcc2-addvsi3.c and compiled as


$ path-to-xgcc/xgcc -B path-to-xgcc -S libgcc2-addvsi3.c


The crash message (same with optimization) is for the pass which has been added 
just before .shorten:



libgcc2-addvsi3.c: In function '__addvhi3':
libgcc2-addvsi3.c:1514:1: internal compiler error: in df_refs_verify, at 
df-scan.c:4323

 }
 ^
0x82dfe8a df_refs_verify
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-scan.c:4323
0x82e7b34 df_insn_refs_verify
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-scan.c:4414
0x82e9906 df_bb_verify
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-scan.c:4441
0x82e9d8e df_scan_verify()
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-scan.c:4573
0x82d3787 df_verify()
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-core.c:1862
0x82d3809 df_analyze_1
../../../gcc.gnu.org/gcc-4_9-branch/gcc/df-core.c:1250
0x896188b avr_pass_recompute_notes::execute()
../../../gcc.gnu.org/gcc-4_9-branch/gcc/config/avr/avr.c:317
Please submit a full bug report,


Johann

Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 220738)
+++ config/avr/avr.c	(working copy)
@@ -51,6 +51,8 @@
 #include "target-def.h"
 #include "params.h"
 #include "df.h"
+#include "context.h"
+#include "tree-pass.h"	/* for current_pass */
 
 /* Maximal allowed offset for an address in the LD command */
 #define MAX_LD_OFFSET(MODE) (64 - (signed)GET_MODE_SIZE (MODE))
@@ -284,6 +286,48 @@ avr_to_int_mode (rtx x)
 : simplify_gen_subreg (int_mode_for_mode (mode), x, mode, 0);
 }
 
+
+static const pass_data avr_pass_data_recompute_notes =
+{
+  RTL_PASS, /* type */
+  "avr-notes1", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  false, /* has_gate */
+  true, /* has_execute */
+  TV_DF_SCAN, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+
+class avr_pass_recompute_notes : public rtl_opt_pass
+{
+public:
+  avr_pass_recompute_notes (gcc::context *ctxt)
+: rtl_opt_pass (avr_pass_data_recompute_notes, ctxt)
+  {}
+
+  /* opt_pass methods */
+  unsigned int execute ()
+  {
+df_note_add_problem ();
+df_analyze ();
+
+return 0;
+  }
+}; // avr_pass_recompute_notes
+
+
+static rtl_opt_pass*
+avr_make_pass_recompute_notes (gcc::context *ctxt, const char *name)
+{
+  rtl_opt_pass *pass = new avr_pass_recompute_notes (ctxt);
+  pass->name = name;
+  return pass;
+}
 
 /* Implement

Re: How to update reg_dead notes

2015-02-24 Thread Georg-Johann Lay

Am 02/24/2015 um 06:06 PM schrieb Eric Botcazou:

Could you give me some advice on correct usage of df or even more preferred
point me to a comprehensible documentation of df which is more complete than
in df-core.c?


Take a look at the c6x and mep ports.



Thanks for the hint.  I changed the execute method to:

unsigned int execute ()
  {
compute_bb_for_insn ();
//df_clear_flags (DF_LR_RUN_DCE);
df_note_add_problem ();
df_analyze ();
df_finish_pass (false);

return 0;
  }

bit it keeps aborting.  Actually I am just copy pasting code from some other 
passes / BEs, but these places also don't have explanation for why they must 
use, may use, must not use specific functions.


Compiling the mentioned test case from libgcc with -Os (so that less insns are 
left over) and patching df-scan.c:df_refs_verify to print the refs just before 
it does gcc_assert (0):


new_ref = { u-1(18){ }}
*old_rec = { u-1(18){ }u-1(19){ }u-1(24){ }u-1(25){ }}
libgcc2-addvsi3.c: In function '__addvhi3':
libgcc2-addvsi3.c:1514:1: internal compiler error: in df_refs_verify, at 
df-scan.c:4338


In df_insn_refs_verify which is in the call chain of df_refs_verify, insn reads:

(insn 21 19 22 5 (parallel [
(set (cc0)
(compare (reg/v:HI 18 r18 [orig:48 a ] [48])
(reg/v:HI 24 r24 [orig:46 w ] [46])))
(clobber (scratch:QI))
]) libgcc2-addvsi3.c:1511 408 {*cmphi}
 (expr_list:REG_DEAD (reg/v:HI 18 r18 [orig:48 a ] [48])
(nil)))

r18 and r24 are ordinary general-purpose hard registers.


The latest pass which runs before the crash is .split5, i.e. 
recog.c::pass_split_for_shorten_branches which executes split_all_insns_noflow 
which in turn reads:


/* Same as split_all_insns, but do not expect CFG to be available.
   Used by machine dependent reorg passes.  */

unsigned int
split_all_insns_noflow (void)
{ ...

Does this mean CFG is (or might be) messed up after this pass and this is the 
reason for why df crashes because df needs correct CFG?


Johann






Re: How to update reg_dead notes

2015-02-25 Thread Georg-Johann Lay

Am 02/24/2015 um 07:33 PM schrieb Kenneth Zadeck:

when i suggested that you do a build with all of the checking turned on, i
wanted you to this without you new code in it.there is a good possibility
that the problem is that your port is generating bad rtl.


Ah, ok.  This actually revealed an ice-checking; but that was not related to 
the df problems...


I think the problem was that I tried to run passes needing df after free_cfg.

My current solution works so far: It fixes the test case (correct code + insn 
lengths) and it did not ICE up to now.


Stuff is running a bit slow with all checks on so building and running tests 
will take a while...


FYI, you find my current patch attached.

Thanks for your help and time,

Johann



Index: config/avr/avr.c
===
--- config/avr/avr.c(revision 220963)
+++ config/avr/avr.c(working copy)
@@ -51,6 +51,8 @@
 #include "target-def.h"
 #include "params.h"
 #include "df.h"
+#include "context.h"
+#include "tree-pass.h"

 /* Maximal allowed offset for an address in the LD command */
 #define MAX_LD_OFFSET(MODE) (64 - (signed)GET_MODE_SIZE (MODE))
@@ -285,6 +287,58 @@ avr_to_int_mode (rtx x)
 }


+static const pass_data avr_pass_data_recompute_notes =
+{
+  RTL_PASS, /* type */
+  "avr-xx", /* name (will be patched) */
+  OPTGROUP_NONE, /* optinfo_flags */
+  false, /* has_gate */
+  true, /* has_execute */
+  TV_DF_SCAN, /* tv_id */
+  0, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  (TODO_df_finish | TODO_verify_rtl_sharing
+   | TODO_verify_flow ), /* todo_flags_finish */
+};
+
+
+class avr_pass_recompute_notes : public rtl_opt_pass
+{
+public:
+  avr_pass_recompute_notes (gcc::context *ctxt, const char *pass_name)
+: rtl_opt_pass (avr_pass_data_recompute_notes, ctxt)
+  {
+name = pass_name;
+  }
+
+  unsigned int execute ()
+  {
+df_note_add_problem ();
+df_analyze ();
+
+return 0;
+  }
+}; // avr_pass_recompute_notes
+
+
+static void
+avr_register_passes (void)
+{
+  /* This avr-specific pass (re)computes insn notes, in particular REG_DEAD
+ notes which are used by `avr.c::reg_unused_after' and branch offset
+ computations.  These notes must be correct, i.e. there must be no
+ dangling REG_DEAD notes; otherwise wrong code might result, cf. PR64331.
+
+ DF needs (correct) CFG, hence right before free_cfg is the last
+ opportunity to rectify notes.  */
+
+  register_pass (new avr_pass_recompute_notes (g, "avr-notes-free-cfg"),
+ PASS_POS_INSERT_BEFORE, "*free_cfg", 1);
+}
+
+
 /* Implement `TARGET_OPTION_OVERRIDE'.  */

 static void
@@ -346,6 +400,11 @@ avr_option_override (void)
   init_machine_status = avr_init_machine_status;

   avr_log_set_avr_log();
+
+  /* Register some avr-specific pass(es).  There is no canonical place for
+ pass registration.  This function is convenient.  */
+
+  avr_register_passes ();
 }

 /* Function to set up the backend function structure.  */



Re: GCC 5 Status Report (2015-03-20)

2015-03-22 Thread Georg-Johann Lay

Joel Sherrill a écrit:

I thought I would pass along a couple of data points from
the *-rtems targets.

Fourteen *-rtems target build OK on the head. The following
do not even complete building gcc+newlib.

v850 - PR65501. New and must be relatively recent. I built
  a C/C++ toolset on January 15.
m32c - PR63683. Reported against 4.9 branch. DJ may have a
  patch for this.

The avr is still broken. I don't know that it is worth the trouble
to even dig up the PRs anymore.


Like some other 3ary platforms, avr is plagued by spill fails, 
summarized as meta PR56183.


Apart from that the device selection scheme changed some while ago: 
Setting -mmcu= will trigger a target specific spec file to be applied. 
The initial version didn't factor out libc and focused solely on 
avr-libc.  I tried to rectify that in PR65296, i.e. spec files generator 
is now sensitive to avr-*-rtems configuration.  What does not yet work 
is spaces in the installation path.


Some of the avr core architectures are not well suited for "big" system 
like rtems + newlib.  There are even devices without RAM so that for 
some of the core architecture it makes hardly any sense to build newlib 
variants for them, in particular "avr1" and "avrtiny" maybe even more.


If parts of the device spec files are not generated as needed by rtems, 
it might be the case that the avr backend needs more adjustments, in 
particular specs.h, rtems.h, gen-avr-mmcu-specs.c, maybe also t-multilib 
 and its generator genmultilib.awk.


Johann



Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-17 Thread Georg-Johann Lay
I allowed me to CC Vladimir; maybe he can propose how the backend can describe 
an efficient, constraint-based solution.  The problem is about expanders 
producing insns with non-fixed hard-regs as in/out operands or clobbers.  This 
includes move insn from non-generic address spaces which require dedicated hard 
regs.  Issue is about correctness and efficiency of generated code.



Am 10/24/2014 um 08:29 PM schrieb Jakub Jelinek:

On Fri, Oct 24, 2014 at 08:19:57PM +0200, Georg-Johann Lay wrote:

Yes, that's the straight forward approach which works so far.  Bit tedious,
but well...

In one case expmed generated different code, though: divmodhi instead of
mulhi_highpart for HI division by const_int.  Cheating with costs did not
help.  However for now I am mostly after correct, ICE-less code.

What I am concerned about is:

1) May it happen that a value lives in a hard-reg across the expander? The
expander has no means to detect that situation and interfere, e.g.

hard-reg = source_value // middle-end
expand-code // back-end
sink_value = hard-reg   // middle-end

where "expand-code" is one defind_expand / define_insn that clobbers /
changes (parts of) hard-reg but does *not* get hard-reg as operand. This is
wrong code obviously.


It can happen, but if it happens, that would mean user code bug, like using
register asm with an register that is unsuitable for use as global or local
register var on the target, or it could be backend bug (expansion of some
pattern clobbering register that has other fixed uses).
You shouldn't ICE on it, but what happens is undefined.

Before RA, the use of hard regs should be limited (pretty much just fixed
regs where really necessary, global and local register variables (user needs
to use with care), function arguments and return values (short lived around
the call patterns).

Jakub


FYI, the problem with using hard regs returned, now as PR65657 with move insns 
which use hard regs.


There is one simple and obvious solution : Don't use hard regs, but instead 
introduce respective constraints and let the register allocator do the job of 
allocating the hard regs.


I tried that and it works in principle (for non-moves), but the code generated 
by the register allocator is *bloated* beyond all recognition.


The use of hard regs in the avr BE is motivated by avoiding standard ABI calls 
for support functions.  Many of these libgcc functions are written in assembly 
and have a much smaller footprint than ordinary ABI functions.  The move insns 
mentioned above perform loading from a non-standard address-sppace which is too 
complicated to be expanded inline -- and even if expanded inline, the code will 
need specific hard registers in specific operands because the instruction set 
is dictating that.


Bottom line is:  Using that simple and obvious hard-regs-by-constraint approach 
would lead to code that is not acceptable w.r.t its performance.


One only solution that might work without bloating the code as mad might be to 
perform the register selection by hand in a dedicated, pre-reload, 
target-specific pass as outlined in


https://gcc.gnu.org/ml/gcc-patches/2015-04/msg00823.html


Johann


Index: gcc-4_9-branch/gcc/config/avr/avr.md
===
--- gcc-4_9-branch/gcc/config/avr/avr.md	(revision 221321)
+++ gcc-4_9-branch/gcc/config/avr/avr.md	(working copy)
@@ -165,6 +165,13 @@ (define_attr "isa"
standard"
   (const_string "standard"))
 
+
+(define_attr "fixregs"
+  "xload_A, xloadQI_A,
+   no"
+  (const_string "no"))
+
+
 (define_attr "enabled" ""
   (cond [(eq_attr "isa" "standard")
  (const_int 1)
@@ -494,9 +501,9 @@ (define_insn "load__libgcc"
 ;; "xload8qi_A"
 ;; "xload8qq_A" "xload8uqq_A"
 (define_insn_and_split "xload8_A"
-  [(set (match_operand:ALL1 0 "register_operand" "=r")
-(match_operand:ALL1 1 "memory_operand""m"))
-   (clobber (reg:HI REG_Z))]
+  [(set (match_operand:ALL1 0 "register_operand"  "=r")
+(match_operand:ALL1 1 "memory_operand" "m"))
+   (clobber (match_operand:HI 2 "scratch_operand" "=z"))] ;; HI 30
   "can_create_pseudo_p()
&& !avr_xload_libgcc_p (mode)
&& avr_mem_memx_p (operands[1])
@@ -505,6 +512,8 @@ (define_insn_and_split "xload8_A"
   "&& 1"
   [(clobber (const_int 0))]
   {
+gcc_assert (SCRATCH != GET_CODE (operands[2]));
+
 /* ; Split away the high part of the address.  GCC's register allocator
; in not able to allocate segment registers and reload the resulting
; expressions.  Notice that no address register can hold a PSImode.  */
@@ -520,7 +529,9 @@ (define_insn_and_

How to get anchors to onlinedocs that can be used in external documents?

2015-04-20 Thread Georg-Johann Lay

How to add an anchor to one of the onlinedocs texi documents?

Suppose I'd like to add an HTML anchor to one of the onlinedocs, for example to 
link it from the gcc release notes.


Currently the only anchors are either auto-generated and contain some hashes 
(hence not usable from external documents) or are trivial like:



x86-Windows-Options.html#x86-Windows-Options
Submodel-Options.html#Submodel-Options
x86-transactional-memory-intrinsics.html#x86-transactional-memory-intrinsics
...

and are always pointing to the top of the respective HTML page.

How can I get an anchor to, say, a subsection without fragmenting the 
respective HTML document into dozens of individual documents and thereby 
changing the document layout, which is not what I want.


Examples as listed in the GCC onlindocs "Table of Contents":

- 3.17.1.1 -march and -mcpu Feature Modifiers
- 3.17.5.1 EIND and Devices with More Than 128 Ki Bytes of Flash
- 6.16.4 SPU Named Address Spaces
- 6.43.2.8 x86 Floating-Point asm Operands

All these entries in the TOC don't link to the indicated place but to the
respective document's header, e.g. "x86 Floating-Point asm Operands" does not 
forward to that section but to "Extended-Asm.html#Extended-Asm" which contains 
9 subsections.



Johann


Re: PR63633: May middle-end come up width hard regs for insn expanders?

2015-04-22 Thread Georg-Johann Lay

Am 04/20/2015 um 10:11 PM schrieb Vladimir Makarov:

On 17/04/15 05:58 AM, Georg-Johann Lay wrote:

I allowed me to CC Vladimir; maybe he can propose how the backend can
describe an efficient, constraint-based solution.  The problem is about
expanders producing insns with non-fixed hard-regs as in/out operands or
clobbers.  This includes move insn from non-generic address spaces which
require dedicated hard regs. Issue is about correctness and efficiency of
generated code.


I might be wrong but I think you have a bloated code because you use
scratches.  I already told several times that usage of scratch is always a bad
idea.  It was a bad idea for an old RA and is still a bad idea for IRA.  The
usage of scratches should be prohibited, probably we should write it
somewhere.  It is better to use just a regular pseudo instead.

Why it is a bad idea?  Because IRA (or the old global RA) does not take them
into account *at all*.  It means that IRA might think that there are enough
registers for pseudos but in reality it is wrong because of scratches in live
range of the pseudos.


Ok, thanks for that information!

The avr backend actually uses clobbers only if one is available peep2.  But 
there are insns that need always specific registers.



If it is not the case I should investigate why you have a bloated code and
small test would help here.

Thanks,  I hope my comments will be useful.


Attached is a C test program which produces fine results with

$ avr-gcc -S -O2 -mmcu=atmega8

Also attached is a respective patch against the trunk avr backend that 
indicates the transition from clobbers to hard-regs-by-constraint.


I don't actually remember when I tried this first; sometimes around when 4.8 
was in stage I or so.


If my recollection is right; the problem was not that small test programs with 
mulsi3 produced large code, but that "ordinary" code could get much worse.  I 
had the impression it was because the bunch of new, rarely used / rarely useful 
register classes, and that IRA's cost computation got confused resp. much less 
accurate than with the usual register classes (only 10 classes of GENERAL_REG).


The attached patch adds 27 new register classes, and to transform all insns 
even more classes might be needed: 8-bit, 16-bit and 24-bit multiplications 
including sign/zero extension of operands, fixed-point functions from 8...32 
bit, divmod, builtins implementations, support functions for address spaces, ...


The insns which are using this all have the following properties in common:

- Only 1 constraint alternative

- Register allocation is uniquely determined, i.e. reg allocator has no choice 
what register to pick for what operand (except for commutative constraints with 
'%' which give exactly 2 solutions).


The patch avoids clobbers or scratches altogether.  The only insn where a 
register is affected that is not the output, are transformed from single_set to 
parallels in split1.  The 2nd set describes setting a (reg:HI 26) to a useless 
value.  The insn is not expanded as parallel, because insn combine won't use 
them for combinations.


Is there a chance that register allocation gets worse just because so many 
register classes are added?


Johann

typedef __UINT8_TYPE__ u8;
typedef __INT8_TYPE__ s8;

typedef __UINT16_TYPE__ u16;
typedef __INT16_TYPE__ s16;

typedef __INT32_TYPE__ s32;

s32 muls16_const (s16 a)
{
return (s32) a * 12345;
}

s32 mulu16_const (u16 a)
{
return (s32) a * 12345;
}

s32 mul_s16_s16 (s16 a, s16 b)
{
return (s32) a * b;
}

s32 mul_s16_u16 (s16 a, u16 b)
{
return (s32) a * b;
}

s32 mul32 (s32 a, s32 b, s32 c)
{
return a * b * c * 257;
}

s32 mul32_uconst (s32 a)
{
return a * 12345;
}

s32 mul32_oconst (s32 a)
{
return a * -6;
}

s32 mul32_s8 (s32 a, s8 b)
{
return a * b;
}

s32 mul32_u8 (s32 a, u8 b)
{
return a * b;
}

s32 mul32_u8_const (u8 b)
{
return -1024L * b;
}

s32 mul32_s8_const (s8 b)
{
return 12345L * b;
}


u16 udiv10 (u16 a)
{
return a / 10;
}
Index: avr-protos.h
===
--- avr-protos.h	(revision 222143)
+++ avr-protos.h	(working copy)
@@ -19,7 +19,7 @@
along with GCC; see the file COPYING3.  If not see
<http://www.gnu.org/licenses/>.  */
 
-
+extern bool avr_split1_completed (void);
 extern int avr_function_arg_regno_p (int r);
 extern void avr_cpu_cpp_builtins (struct cpp_reader * pfile);
 extern enum reg_class avr_regno_reg_class (int r);
Index: constraints.md
===
--- constraints.md	(revision 222143)
+++ constraints.md	(working copy)
@@ -19,6 +19,40 @@
 
 ;; Register constraints
 
+(define_register_constraint "R16_1" "REGS_R16" "")
+(define_register_constraint "R16_2" "REGS_R16_R17" "")
+(define_register_constraint "R16_3" "REGS_R

Re: avr-gcc generating really dumb code

2015-04-30 Thread Georg-Johann Lay

Ralph Doncaster schrieb:

I wrote a small function to convert u8 to hex:
// converts 4-bit nibble to ascii hex
uint8_t nibbletohex(uint8_t value)
{
if ( value > 9 ) value += 'A' - '0';
return value + '0';
}

// returns value as 2 ascii characters in a 16-bit int
uint16_t u8tohex(uint8_t value)
{
uint16_t hexdigits;

uint8_t hidigit = (value >> 4);
hexdigits = (nibbletohex(hidigit) << 8);

uint8_t lodigit = (value & 0x0F);
hexdigits |= nibbletohex(lodigit);

return hexdigits;
}

I compiled it with avr-gcc -Os using 4.8 and 5.1 and got the same code:
007a :
  7a:   28 2f   mov r18, r24
  7c:   22 95   swapr18
  7e:   2f 70   andir18, 0x0F   ; 15
  80:   2a 30   cpi r18, 0x0A   ; 10
  82:   08 f0   brcs.+2 ; 0x86 
  84:   2f 5e   subir18, 0xEF   ; 239
  86:   20 5d   subir18, 0xD0   ; 208
  88:   30 e0   ldi r19, 0x00   ; 0
  8a:   32 2f   mov r19, r18
  8c:   22 27   eor r18, r18
  8e:   8f 70   andir24, 0x0F   ; 15
  90:   8a 30   cpi r24, 0x0A   ; 10
  92:   08 f0   brcs.+2 ; 0x96 
  94:   8f 5e   subir24, 0xEF   ; 239
  96:   80 5d   subir24, 0xD0   ; 208
  98:   a9 01   movwr20, r18
  9a:   48 2b   or  r20, r24
  9c:   ca 01   movwr24, r20
  9e:   08 95   ret

There's some completely pointless code there, like loading 0 to r19
and immediately overwriting it with the contents of r18 (88, 8a).
Other register use is convoluted.  The compiler should at least be
able to generate the following code (5 fewer instructions):
28 2f   mov r18, r24
22 95   swapr18
2f 70   andir18, 0x0F   ; 15
2a 30   cpi r18, 0x0A   ; 10
08 f0   brcs.+2 ; 0x86 
2f 5e   subir18, 0xEF   ; 239
20 5d   subir18, 0xD0   ; 208
32 2f   mov r25, r18
8f 70   andir24, 0x0F   ; 15
8a 30   cpi r24, 0x0A   ; 10
08 f0   brcs.+2 ; 0x96 
8f 5e   subir24, 0xEF   ; 239
80 5d   subir24, 0xD0   ; 208
08 95   ret


Looks like PR41076


Missing barrier in outof_cfglayout

2015-05-11 Thread Georg-Johann Lay
When pass outof_cfglayout is adding barriers, it appears that it misses some 
situations and then runs into "ICE: missing barrier" in the remainder (or, with 
checking disabled, into some other assertion).



cfgrtl.c:fixup_reorder_chain() reads:


  /* Now add jumps and labels as needed to match the blocks new
 outgoing edges.  */

  for (bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; bb ; bb = (basic_block)
   bb->aux)
{
  ...
  bb_end_insn = BB_END (bb);

  if (JUMP_P (bb_end_insn))
{
  ret_label = JUMP_LABEL (bb_end_insn);

  if (any_condjump_p (bb_end_insn))
{ ... }
  else if (extract_asm_operands (PATTERN (bb_end_insn)) != NULL)
{ ... }
  else
{
  /* Otherwise we have some return, switch or computed
 jump.  In the 99% case, there should not have been a
 fallthru edge.  */
  gcc_assert (returnjump_p (bb_end_insn) || !e_fall);
  continue;
}


The last else is entered for an UNconditional jump with e_fall = 0 so that 
after the unconditional jump_insn no barrier is emit.


Now I am unsure about the right condition when to add the missing barrier; the 
following change works, but presumably the condition is only 99% correct ;-)



@@ -3822,6 +3842,11 @@ fixup_reorder_chain (void)
 jump.  In the 99% case, there should not have been a
 fallthru edge.  */
  gcc_assert (returnjump_p (bb_end_insn) || !e_fall);
+ if (!returnjump_p (bb_end_insn)
+ && e_taken)
+   {
+ emit_barrier_after (bb_end_insn);
+   }
  continue;
}
}

The respective BB has just that unconditional jump_insn; it is generated by 
CSE1 as it optimized a switch statement and was originally some conditional 
jump (-fno-jump-tables) or computed jump (with -fjump-tables):



(note 172 109 180 19 [bb 19] NOTE_INSN_BASIC_BLOCK)
(jump_insn 180 172 117 19 (set (pc)
(label_ref 133)) bug.c:33 -1
 (nil)
 -> 133)

basic block 19, loop depth 0, count 0, freq 4623, maybe hot
Invalid sum of incoming frequencies 5156, should be 4623
 prev block 18, next block 20, flags: (REACHABLE, RTL, MODIFIED)
 pred:   18 [100.0%]  (FALLTHRU)
 succ:   23 [100.0%]


As I am not familiar with the assertions of outof_cfglayout, some help would be 
great.  Is that the right place to fix the problem at all?



Johann


Re: Missing barrier in outof_cfglayout

2015-05-11 Thread Georg-Johann Lay

Am 05/11/2015 um 06:58 PM schrieb Jeff Law:

Wow, that was fast!


On 05/11/2015 10:19 AM, Georg-Johann Lay wrote:

When pass outof_cfglayout is adding barriers, it appears that it misses
some situations and then runs into "ICE: missing barrier" in the
remainder (or, with checking disabled, into some other assertion).


[ ... ]


The last else is entered for an UNconditional jump with e_fall = 0 so
that after the unconditional jump_insn no barrier is emit.

Now I am unsure about the right condition when to add the missing
barrier; the following change works, but presumably the condition is
only 99% correct ;-)

But why is there a JUMP_INSN here to start with?   Prior to cfglayout I believe
simple unconditional jumps shouldn't exist in the INSN chain. Instead they are
inserted as we leave cfglayout based on the state of the CFG.


The respective BB has just that unconditional jump_insn; it is generated
by CSE1 as it optimized a switch statement and was originally some
conditional jump (-fno-jump-tables) or computed jump (with -fjump-tables):

This is where I'd focus my efforts.  Instead of modifying the JUMP_INSN
directly, we're supposed to be going through the various CFG routines.

Most likely it's this code:
  /* If this SET is now setting PC to a label, we know it used to
  be a conditional or computed branch.  */
   else if (dest == pc_rtx && GET_CODE (src) == LABEL_REF
&& !LABEL_REF_NONLOCAL_P (src))
...

  new_rtx = emit_jump_insn_before (gen_jump (XEXP (src, 0)), insn);
   JUMP_LABEL (new_rtx) = XEXP (src, 0);
   LABEL_NUSES (XEXP (src, 0))++;


Yet, it actually is!

Before cse1 (dfinit):

;; basic block 19, loop depth 0, count 0, freq 4623, maybe hot
;;  prev block 18, next block 20, flags: (RTL, MODIFIED)
;;  pred:   18 [89.7%]  (FALLTHRU)
;; bb 19 artificial_defs: { }
;; bb 19 artificial_uses: { u-1(26){ }u-1(30){ }u-1(32){ }}
(note 172 110 111 19 [bb 19] NOTE_INSN_BASIC_BLOCK)
(jump_insn 111 172 173 19 (set (pc)
(if_then_else (ltu (reg:SI 51 [ D.1451 ])
(const_int 1 [0x1]))
(label_ref 133)
(pc))) bug.c:33 140 {*bltu}
 (int_list:REG_BR_PROB 1394 (nil))
 -> 133)
;;  succ:   25 [13.9%]
;;  20 [86.1%]  (FALLTHRU)

And in cse1, following jumps:

deferring deletion of insn with uid = 111.
Purged edges from bb 19

...

;; basic block 19, loop depth 0, count 0, freq 4623, maybe hot
;; Invalid sum of incoming frequencies 5156, should be 4623
;;  prev block 18, next block 20, flags: (REACHABLE, RTL, MODIFIED)
;;  pred:   18 [100.0%]  (FALLTHRU)
(note 172 109 180 19 [bb 19] NOTE_INSN_BASIC_BLOCK)
(jump_insn 180 172 117 19 (set (pc)
(label_ref 133)) bug.c:33 -1
 (nil)
 -> 133)
;;  succ:   23 [100.0%]



And it almost certainly should be calling into the cfgrtl.c routines instead.


You mean something like try_redirect_by_replacing_jump?


BTW, what's the policy about unconditional jumps at that time?  There are 
plenty of unconditional jumps around and all are legitimate; just this one 
generated by cse1 is wrong?



Johann




Re: Missing barrier in outof_cfglayout

2015-05-12 Thread Georg-Johann Lay

Am 05/11/2015 um 10:43 PM schrieb Steven Bosscher:

On Mon, May 11, 2015 at 7:37 PM, Georg-Johann Lay wrote:

BTW, what's the policy about unconditional jumps at that time?  There are
plenty of unconditional jumps around and all are legitimate; just this one
generated by cse1 is wrong?


If you're in cfglayout mode, then there should be no unconditional
jumps to labels. The only JUMP_INSNs should be unconditional,
computed, or return jumps.

If you do have unconditional jumps at this stage, something is seriously broken.


Ah, yes.  The ICE is actually in verify_flow_info: "wrong number of branch 
edges after unconditional jump in bb 4".


It starts with an almost trivial jump table:


(jump_insn 82 81 83 19 (parallel [
(set (pc)
(reg/f:SI 60))
(use (label_ref 83))
])
 (nil)
 -> 83)
;;  succ:   20 [50.0%]
;;  31 [50.0%]

;; Insn is not within a basic block
(code_label 83 82 84 15 "" [3 uses])
;; Insn is not within a basic block
(jump_table_data 84 83 85 (addr_vec:SI [
(label_ref:SI 130)
(label_ref:SI 86)
(label_ref:SI 130)
(label_ref:SI 130)
]))

At .cse1, cse.c:cse_insn executes

  /* We don't normally have an insn matching (set (pc) (pc)), so
 check for this separately here.  We will delete such an
 insn below.

 For other cases such as a table jump or conditional jump
 where we know the ultimate target, go ahead and replace the
 operand.  While that may not make a valid insn, we will
 reemit the jump below (and also insert any necessary
 barriers).  */
  if (n_sets == 1 && dest == pc_rtx
  && (trial == pc_rtx
  || (GET_CODE (trial) == LABEL_REF
  && ! condjump_p (insn
{
  /* Don't substitute non-local labels, this confuses CFG.  */
  if (GET_CODE (trial) == LABEL_REF
  && LABEL_REF_NONLOCAL_P (trial))
continue;

  SET_SRC (sets[i].rtl) = trial;
  cse_jumps_altered = true;
  break;
}

on the jump insn and with trial = (label_ref 83).

The code pointed out by Jeff then replaces the original jump insn by

(jump_insn 139 81 82 16 (set (pc)
(label_ref 83)) bug.c:33 -1
 (nil))

That's obviously nonsense as it jumps to the jump_table_data!


Johann



Re: Missing barrier in outof_cfglayout

2015-05-13 Thread Georg-Johann Lay

Am 05/12/2015 um 05:13 PM schrieb Jeff Law:

On 05/12/2015 08:58 AM, Georg-Johann Lay wrote:

Ah, yes.  The ICE is actually in verify_flow_info: "wrong number of
branch edges after unconditional jump in bb 4".

It starts with an almost trivial jump table:


(jump_insn 82 81 83 19 (parallel [
 (set (pc)
 (reg/f:SI 60))
 (use (label_ref 83))
 ])
  (nil)
  -> 83)
;;  succ:   20 [50.0%]
;;  31 [50.0%]

;; Insn is not within a basic block
(code_label 83 82 84 15 "" [3 uses])
;; Insn is not within a basic block
(jump_table_data 84 83 85 (addr_vec:SI [
 (label_ref:SI 130)
 (label_ref:SI 86)
 (label_ref:SI 130)
 (label_ref:SI 130)
 ]))

At .cse1, cse.c:cse_insn executes

   /* We don't normally have an insn matching (set (pc) (pc)), so
  check for this separately here.  We will delete such an
  insn below.

  For other cases such as a table jump or conditional jump
  where we know the ultimate target, go ahead and replace the
  operand.  While that may not make a valid insn, we will
  reemit the jump below (and also insert any necessary
  barriers).  */
   if (n_sets == 1 && dest == pc_rtx
   && (trial == pc_rtx
   || (GET_CODE (trial) == LABEL_REF
   && ! condjump_p (insn
 {
   /* Don't substitute non-local labels, this confuses CFG.  */
   if (GET_CODE (trial) == LABEL_REF
   && LABEL_REF_NONLOCAL_P (trial))
 continue;

   SET_SRC (sets[i].rtl) = trial;
   cse_jumps_altered = true;
   break;
 }

on the jump insn and with trial = (label_ref 83).

The code pointed out by Jeff then replaces the original jump insn by

(jump_insn 139 81 82 16 (set (pc)
 (label_ref 83)) bug.c:33 -1
  (nil))

That's obviously nonsense as it jumps to the jump_table_data!

It's likely nonsensical.  You'd have to figure out why TRIAL has that seemingly



hmmm, it  would actually lead to valid machine code:  The jump table is 
actually a dispatch table, i.e. it contains a list of direct jumps to the case 
labels.  casesi loads the value of the label associated with the jump table 
into a register, adds an offset, and then jumps to that address (which in turn 
dispatches to the very cases via a direct jump).


In this specific case cse finds out that the offset (reg 59 below) is always 0, 
i.e. the offset computation will always yield the value of the label associated 
to the jump table (code_label 84), and then jumps to that label.


When expanded, casesi will yield something like

(note 138 77 78 19 [bb 19] NOTE_INSN_BASIC_BLOCK)

(insn 78 138 79 19 (set (reg:SI 61)
(high:SI (label_ref:SI 84

(insn 79 78 80 19 (set (reg:SI 60)
   (lo_sum:SI (reg:SI 61)
  (label_ref:SI 84

(insn 80 79 81 19 (set (reg:SI 59)
   (mult:SI (reg:SI 59)
(const_int 4

(insn 81 80 82 19 (set (reg:SI 60)
   (plus:SI (reg:SI 60)
(reg:SI 59

(jump_insn 82 81 83 19 (parallel [(set (pc)
   (reg:SI 60))
  (use (label_ref 84))])
 -> 84)
;;  succ:   20 [50.0%]
;;  31 [50.0%]

(barrier 83 82 84)

;; Insn is not within a basic block
(code_label 84 83 85 15 "" [3 uses])

;; Insn is not within a basic block
(jump_table_data 85 84 86 (addr_vec:SI [
(label_ref:SI 132)
(label_ref:SI 87)
(label_ref:SI 132)
(label_ref:SI 132)
]))
(barrier 86 85 87)



bogus value.  I would have expected it to have one of the entries from the
ADDR_VEC.


Other backends are also using dispatch tables and are jumping into the table, 
and internal docs for casesi read: "Instruction to jump through a dispatch 
table"...


Wrapping the tablejump's register (reg 60 above) into UNSPEC fixes the problem, 
but just because there is no insn to load such an unspec into a register and 
hence cse does not perform respective optimization.



But even with the right value in TRIAL, we shouldn't be slamming on the SET_SRC
of the jump, but instead should be using the routines from cfgrtl.c to
manipulate the CFG and RTL datastructures properly.

jeff


I am having the trouble with 4.9, but the current trunk still contains the same 
sequence...


Given the routines from from cfgrtl.c were used:  would it then be legal to 
jump to a label which is not contained in any basic block?


Johann



Code bloat due to silly IRA cost model?

2019-10-25 Thread Georg-Johann Lay

Hi,

I am trying to track down a code bloat issue and am stuck becauce I do 
not understand IRA's cose model.


The test case is as simple as it gets:

float func (float);

float call (float f)
{
return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
(reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
 (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
(nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
(reg:SF 44)) "bloat.c":4:1 85 {*movsf}
 (expr_list:REG_DEAD (reg:SF 44)
(nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
(reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
 (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
(nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

a1 (r44,l0) best NO_REGS, allocno NO_REGS
a0 (r43,l0) best NO_REGS, allocno NO_REGS

  a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
  a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000


which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor 
of 100:


a1 (r44,l0) best NO_REGS, allocno NO_REGS
a0 (r43,l0) best NO_REGS, allocno NO_REGS

  a0(r43,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320 
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000
  a1(r44,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320 
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000


What??? The REG costs are 100 times higher, and stille higher that the 
MEM costs.  What the heck is going on?


Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 
yiels:


  a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0
  a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0


as expected, i.e. there is no other hidden source of costs considered by 
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and 
TARGET_MEMORY_MOVE_COST = original gives:


  a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
  a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000


How the heck do I tell ira-costs that registers are way cheaper than MEM?

Johann


p.s.

test case compiled with

$ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v

Target: avr
Configured with: ../../gcc.gnu.org/trunk/configure --target=avr 
--prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls 
--with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
--enable-checking=release --enable-languages=c,c++ --disable-gcov

Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.0.0 20191021 (experimental) (GCC)








Re: BountySource campaign for gcc PR/91851

2019-10-30 Thread Georg-Johann Lay

John Paul Adrian Glaubitz schrieb:

Hello!

For anyone who isn't aware of it yet, there is an ongoing BountySource campaign
for gcc PR/91851 [1] which seeks to convert the m68k backend to MODE_CC so that
it can be kept in gcc versions beyond version 11.


Hi, have the cc0 backends been deprecated?

I didn't follow the lists for some time...  At least neither v9 or v10
release notes caveats mention such deprecation, neither is there
respective PRs for the cc0 targets.

Johann


The campaign is currently at $3,290 and can be found at [2].

Thanks,
Adrian


[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91851
[2] 
https://www.bountysource.com/issues/80706251-m68k-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases






Re: BountySource campaign for gcc PR/91851

2019-10-31 Thread Georg-Johann Lay

Peter Bergner schrieb:

On 10/30/19 2:31 PM, Georg-Johann Lay wrote:

Hi, have the cc0 backends been deprecated?

I didn't follow the lists for some time...  At least neither v9 or v10
release notes caveats mention such deprecation, neither is there
respective PRs for the cc0 targets.


https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01256.html

Peter


Ah, thanks.

So finally, the time has come to move to clang/llvm ?

Johann


Re: BountySource campaign for gcc PR/91851

2019-10-31 Thread Georg-Johann Lay

John Paul Adrian Glaubitz schrieb:

On 10/31/19 10:00 PM, Georg-Johann Lay wrote:

I didn't follow the lists for some time...  At least neither v9 or v10
release notes caveats mention such deprecation, neither is there
respective PRs for the cc0 targets.

https://gcc.gnu.org/ml/gcc-patches/2019-09/msg01256.html

Peter

Ah, thanks.

So finally, the time has come to move to clang/llvm ?


Not quite. Motorola 68k is currently not officially supported by LLVM/Clang.


For AVR -- an other port affected by cc0 removal -- there is a 
LLVM/Clang port.  It' not as mature as GCC's avr port, but what counts 
in the end is support / responsiveness from the community and an 
openness for the requirements of deeply embedded targets.  I had gcc 
patches rejected by global maintainers (just a no-op hook for other 
targets) because it appeared they didn't even understand what the patch 
is about (and kept proposing alternative "solutions" that totally missed 
the point).


And code quality is deteriorating from version to version.  Whatever you 
do in the backend to mitigate it, there's always global changes that 
shreds any improvements...


Btw, does GCC support clobbering registers in branches (or 
cbranch4 for that matter)?  This requirement would come up when 
transitioning avr to cc_mode because cbranches would live post reload.


Johann



There is a port that is half-finished though [1] and there is also one guy from
Czech Republic who wrote a Bachelor thesis on an m68k backend for LLVM [2]. I
have not been able to contact him though as his university address bounces.
I would love to see the code that was written there.

I have also played around with Rust on m68k based on the LLVM code on [1], but
it doesn't really work yet. I think finishing LLVM for m68k would be another
idea for a Bountysource campaign.

Adrian


[1] https://github.com/M680x0/M680x0-llvm/
[2] https://www.vutbr.cz/en/students/final-thesis?zp_id=34902
[3] https://github.com/glaubitz/rust/tree/m68k-linux





cc0 -> CCmode questions

2019-11-02 Thread Georg-Johann Lay

Segher Boessenkool schrieb:
Btw, does GCC support clobbering registers in branches (or 
cbranch4 for that matter)?  This requirement would come up when 
transitioning avr to cc_mode because cbranches would live post reload.


Of course.  You cannot have *reloads* on branches, that is all.

Segher


Does this also apply to input reloads?

Suppose cbranch with constraints like

"d,r"
"n,r"

for the operands to be compared, where d is a register class that
can be compared against immediate, but registers in r can't be
compared to n in general.

For a case #2 target (only ccmode clobbers before reload), reload might
generate an input reload for the constant in the cbranch.

So this is for bidden?

Johann



Re: GCC wwwdocs move to git done

2019-11-06 Thread Georg-Johann Lay

Am 09.10.19 um 02:27 schrieb Joseph Myers:

I've done the move of GCC wwwdocs to git (using the previously posted and
discussed scripts), including setting up the post-receive hook to do the
same things previously covered by the old CVS hooks, and minimal updates
to the web pages dealing with the CVS setup for wwwdocs.


Hi,

May it be the case that some parts are missing?  In particular, I cannot
find the source of

https://gcc.gnu.org/install/configure.html

Johann


Re: GCC wwwdocs move to git done

2019-11-06 Thread Georg-Johann Lay

Am 06.11.19 um 15:03 schrieb Georg-Johann Lay:

Am 09.10.19 um 02:27 schrieb Joseph Myers:

I've done the move of GCC wwwdocs to git (using the previously posted and
discussed scripts), including setting up the post-receive hook to do the
same things previously covered by the old CVS hooks, and minimal updates
to the web pages dealing with the CVS setup for wwwdocs.


Hi,

May it be the case that some parts are missing?  In particular, I cannot
find the source of

https://gcc.gnu.org/install/configure.html

Johann



Ok, found it in install/README. knew it had something special about it...

Johann






Re: Code bloat due to silly IRA cost model?

2019-12-10 Thread Georg-Johann Lay
Hi, doesn't actually anybody know know to make memory more expensive 
than registers when it comes to allocating registers?


Whatever I am trying for TARGET_MEMORY_MOVE_COST and 
TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more 
expensive than mem and therefore allocates values to stack slots instead 
of keeping them in registers.


Test case (for avr) is as simple as it gets:

float func (float);

float call (float f)
{
return func (f);
}

What am I missing?

Johann


Georg-Johann Lay schrieb:

Hi,

I am trying to track down a code bloat issue and am stuck because I do 
not understand IRA's cost model.


The test case is as simple as it gets:

float func (float);

float call (float f)
{
return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
(reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
 (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
(nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
(reg:SF 44)) "bloat.c":4:1 85 {*movsf}
 (expr_list:REG_DEAD (reg:SF 44)
(nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
(reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
 (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
(nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

a1 (r44,l0) best NO_REGS, allocno NO_REGS
a0 (r43,l0) best NO_REGS, allocno NO_REGS

  a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
  a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000


which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor 
of 100:


a1 (r44,l0) best NO_REGS, allocno NO_REGS
a0 (r43,l0) best NO_REGS, allocno NO_REGS

  a0(r43,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320 
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000
  a1(r44,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320 
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000


What??? The REG costs are 100 times higher, and stille higher that the 
MEM costs.  What the heck is going on?


Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0 
yiels:


  a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0
  a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0 
GENERAL_REGS:0 MEM:0


as expected, i.e. there is no other hidden source of costs considered by 
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and 
TARGET_MEMORY_MOVE_COST = original gives:


  a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
  a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000 
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000


How the heck do I tell ira-costs that registers are way cheaper than MEM?

Johann


p.s.

test case compiled with

$ avr-gcc bloat.c -S -Os -dp -da -fsplit-wide-types-early -v

Target: avr
Configured with: ../../gcc.gnu.org/trunk/configure --target=avr 
--prefix=/local/gnu/install/gcc-10 --disable-shared --disable-nls 
--with-dwarf2 --enable-target-optspace=yes --with-gnu-as --with-gnu-ld 
--enable-checking=release --enable-languages=c,c++ --disable-gcov

Thread model: single
Supported LTO compression algorithms: zlib
gcc version 10.0.0 20191021 (experimental) (GCC)





Re: Code bloat due to silly IRA cost model?

2019-12-13 Thread Georg-Johann Lay

Am 11.12.19 um 18:55 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

Hi, doesn't actually anybody know know to make memory more expensive
than registers when it comes to allocating registers?

Whatever I am trying for TARGET_MEMORY_MOVE_COST and
TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
expensive than mem and therefore allocates values to stack slots instead
of keeping them in registers.

Test case (for avr) is as simple as it gets:

float func (float);

float call (float f)
{
  return func (f);
}

What am I missing?

Johann


Georg-Johann Lay schrieb:

Hi,

I am trying to track down a code bloat issue and am stuck because I do
not understand IRA's cost model.

The test case is as simple as it gets:

float func (float);

float call (float f)
{
 return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
 (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
  (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
 (nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
 (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
  (expr_list:REG_DEAD (reg:SF 44)
 (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
 (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
  (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
 (nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

 a1 (r44,l0) best NO_REGS, allocno NO_REGS
 a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
of 100:

 a1 (r44,l0) best NO_REGS, allocno NO_REGS
 a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000
   a1(r44,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000

What??? The REG costs are 100 times higher, and stille higher that the
MEM costs.  What the heck is going on?

Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
yiels:

   a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0
   a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0

as expected, i.e. there is no other hidden source of costs considered by
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
TARGET_MEMORY_MOVE_COST = original gives:

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

How the heck do I tell ira-costs that registers are way cheaper than MEM?


I think this is coming from:

   /* FIXME: Ideally, the following test is not needed.
 However, it turned out that it can reduce the number
 of spill fails.  AVR and it's poor endowment with
 address registers is extreme stress test for reload.  */

   if (GET_MODE_SIZE (mode) >= 4
   && regno >= REG_X)
 return false;


This was introduced to "fix" unable to find a register to spill ICE.

What I do not understand is that the code with long (which is SImode on 
avr) is fine:


long lunc (long);

long callL (long f)
{
return lunc (f);
}

callL:
rjmp lunc;  7   [c=24 l=1]  call_value_insn/3



in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
moves between pointer registers and general registers have the highest
possible cost (65535) to prevent them for being used for SFmode.  So:

ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;

The costs for union classes are the maximum (worst-case) cost of
for each subclass, so this means that:

ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;

as well.


This means that, when there is an expensive class (because it only 
contains one register for example), then it will blow the cost of 
GENERAL_REGS to crazy values no matter what?


What's also strange is that the register allocator would not need to 
allocate a register at all:  The incoming parameter comes in SI:22 and 
is just be passed through to the callee, which also receives the value 
in SI:22.  Why would one move that value to memory?  Even if memory was 
cheaper, moving the value to mem just to load it again

Re: Code bloat due to silly IRA cost model?

2019-12-16 Thread Georg-Johann Lay

Am 11.12.19 um 18:55 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

Hi, doesn't actually anybody know know to make memory more expensive
than registers when it comes to allocating registers?

Whatever I am trying for TARGET_MEMORY_MOVE_COST and
TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
expensive than mem and therefore allocates values to stack slots instead
of keeping them in registers.

Test case (for avr) is as simple as it gets:

float func (float);

float call (float f)
{
  return func (f);
}

What am I missing?

Johann


Georg-Johann Lay schrieb:

Hi,

I am trying to track down a code bloat issue and am stuck because I do
not understand IRA's cost model.

The test case is as simple as it gets:

float func (float);

float call (float f)
{
 return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
 (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
  (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
 (nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
 (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
  (expr_list:REG_DEAD (reg:SF 44)
 (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
 (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
  (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
 (nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

 a1 (r44,l0) best NO_REGS, allocno NO_REGS
 a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
of 100:

 a1 (r44,l0) best NO_REGS, allocno NO_REGS
 a0 (r43,l0) best NO_REGS, allocno NO_REGS

   a0(r43,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000
   a1(r44,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000

What??? The REG costs are 100 times higher, and stille higher that the
MEM costs.  What the heck is going on?

Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
yiels:

   a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0
   a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0

as expected, i.e. there is no other hidden source of costs considered by
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
TARGET_MEMORY_MOVE_COST = original gives:

   a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
   a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

How the heck do I tell ira-costs that registers are way cheaper than MEM?


I think this is coming from:

   /* FIXME: Ideally, the following test is not needed.
 However, it turned out that it can reduce the number
 of spill fails.  AVR and it's poor endowment with
 address registers is extreme stress test for reload.  */

   if (GET_MODE_SIZE (mode) >= 4
   && regno >= REG_X)
 return false;

in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
moves between pointer registers and general registers have the highest
possible cost (65535) to prevent them for being used for SFmode.  So:

ira_register_move_cost[SFmode][POINTER_REGS][GENERAL_REGS] = 65535;

The costs for union classes are the maximum (worst-case) cost of
for each subclass, so this means that:

ira_register_move_cost[SFmode][GENERAL_REGS][GENERAL_REGS] = 65535;

as well.

Removing the code above fixes it.  If you don't want to do that, an
alternative might be to add a class for r0-r25 (but I've not tested that).


I am still having some headache understanding that...

For example, currently R26 is forbidden for SFmode, but the same applies 
to R25 or any odd registers (modes >= 2 regs have to start in even 
registers).


Then this would imply, even after the condition regno >= 26 was removed, 
the costs would still be astronomically high because HI:21 is refused 
and SI:23 is refused etc, and due to that the cost of that class will be 
0x1 for modes >= 2 regs?


How can the register allocator tell apart whether a register is rejected 
due to its mode or due to the register number?  AFAIK there is no other 
ws than rejecting odd registers in that hook, because register classes 
must not have holes.  Or did that change meanwhile?


Johann



Thanks,
Richard


Re: Code bloat due to silly IRA cost model?

2020-01-09 Thread Georg-Johann Lay

Am 13.12.19 um 13:45 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

Am 11.12.19 um 18:55 schrieb Richard Sandiford:

Georg-Johann Lay  writes:

Hi, doesn't actually anybody know know to make memory more expensive
than registers when it comes to allocating registers?

Whatever I am trying for TARGET_MEMORY_MOVE_COST and
TARGET_REGISTER_MOVE_COST, ira-costs.c always makes registers more
expensive than mem and therefore allocates values to stack slots instead
of keeping them in registers.

Test case (for avr) is as simple as it gets:

float func (float);

float call (float f)
{
   return func (f);
}

What am I missing?

Johann


Georg-Johann Lay schrieb:

Hi,

I am trying to track down a code bloat issue and am stuck because I do
not understand IRA's cost model.

The test case is as simple as it gets:

float func (float);

float call (float f)
{
  return func (f);
}

IRA dump shows the following insns:


(insn 14 4 2 2 (set (reg:SF 44)
  (reg:SF 22 r22 [ f ])) "bloat.c":4:1 85 {*movsf}
   (expr_list:REG_DEAD (reg:SF 22 r22 [ f ])
  (nil)))
(insn 2 14 3 2 (set (reg/v:SF 43 [ f ])
  (reg:SF 44)) "bloat.c":4:1 85 {*movsf}
   (expr_list:REG_DEAD (reg:SF 44)
  (nil)))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 7 2 (set (reg:SF 22 r22)
  (reg/v:SF 43 [ f ])) "bloat.c":5:12 85 {*movsf}
   (expr_list:REG_DEAD (reg/v:SF 43 [ f ])
  (nil)))
(call_insn/j 7 6 8 2 (parallel [

#14 sets pseudo 44 from arg register R22.
#2 moves it to pseudo 43
#6 moves it to R22 as it prepares for call_insn #7.

There are 2 allocnos and cost:

Pass 0 for finding pseudo/allocno costs

  a1 (r44,l0) best NO_REGS, allocno NO_REGS
  a0 (r43,l0) best NO_REGS, allocno NO_REGS

a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

which is quite odd because MEM is way more expensive here than any REG.

Okay, so let's boost the MEM cost (TARGET_MEMORY_MOVE_COST) by a factor
of 100:

  a1 (r44,l0) best NO_REGS, allocno NO_REGS
  a0 (r43,l0) best NO_REGS, allocno NO_REGS

a0(r43,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000
a1(r44,l0) costs: ADDW_REGS:320 SIMPLE_LD_REGS:320
LD_REGS:320 NO_LD_REGS:320 GENERAL_REGS:320 MEM:801000

What??? The REG costs are 100 times higher, and stille higher that the
MEM costs.  What the heck is going on?

Setting TARGET_REGISTER_MOVE_COST and also TARGET_MEMORY_MOVE_COST to 0
yiels:

a0(r43,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0
a1(r44,l0) costs: ADDW_REGS:0 SIMPLE_LD_REGS:0 LD_REGS:0 NO_LD_REGS:0
GENERAL_REGS:0 MEM:0

as expected, i.e. there is no other hidden source of costs considered by
IRA.  And even TARGET_REGISTER_MOVE_COST = 0  and
TARGET_MEMORY_MOVE_COST = original gives:

a0(r43,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000
a1(r44,l0) costs: ADDW_REGS:32000 SIMPLE_LD_REGS:32000 LD_REGS:32000
NO_LD_REGS:32000 GENERAL_REGS:32000 MEM:9000

How the heck do I tell ira-costs that registers are way cheaper than MEM?


I think this is coming from:

/* FIXME: Ideally, the following test is not needed.
  However, it turned out that it can reduce the number
  of spill fails.  AVR and it's poor endowment with
  address registers is extreme stress test for reload.  */

if (GET_MODE_SIZE (mode) >= 4
&& regno >= REG_X)
  return false;


This was introduced to "fix" unable to find a register to spill ICE.

What I do not understand is that the code with long (which is SImode on
avr) is fine:

long lunc (long);

long callL (long f)
{
  return lunc (f);
}

callL:
rjmp lunc;  7   [c=24 l=1]  call_value_insn/3


This is due to differences in the way that lower-subreg.c lowers
SF moves vs. SI moves.  For SI it generates pure QI moves and so
gets rid of the SI entirely.  For SF it still builds the QI values
back into an SF:

   || (!SCALAR_INT_MODE_P (dest_mode)
  && !resolve_reg_p (dest)
  && !resolve_subreg_p (dest)))

I imagine this is because non-int modes are held in FPRs rather than
GPRs on most targets, but TBH I'm not sure.  I couldn't see a comment
that explains the above decision.

With -fno-split-wide-types I see the same RA behaviour for both SI and SF
(i.e. both spill to memory).


in avr_hard_regno_mode_ok.  This forbids SFmode in r26+ and means that
moves between pointer registers and general registers have the highest
possible cost (65535) to prevent them for being used for SFmode.  So:

 ira_register_move_cost[SFmode][POINTER_REG

Re: Help math RTL patterns...

2017-01-18 Thread Georg-Johann Lay

On 17.01.2017 21:41, Steve Silva via gcc wrote:

Hi Nathan,

Thanks for your advice.  I retooled the addhi3 sequence to look like this:

(define_expand "addhi3"
[(set (match_operand:HI 0 "snap_mem_or_reg""+a,m")
(plus:HI (match_operand:HI 1 "snap_mem_or_reg" "%0,0")
(match_operand:HI 2 "general_operand" "aim,aim")))]


Expanders don't take constraints; you should get a warning here.

""
""
)


You can omit trailing elements if they are empty:

(define_expand "addhi3"
  [(set (match_operand:HI 0 "snap_mem_or_reg")
(plus:HI (match_operand:HI 1 "snap_mem_or_reg")
 (match_operand:HI 2 "general_operand")))])

If you are allowing MEM here the generated insn might come
with a memory operand.  Disallowing it in the insn predicate
might lead to "unrecognizable insn" ICE, e.g. if op 1 is
a MEM and op 0 is a REG.


(define_insn "addhi3_regtarget"
[(set (match_operand:HI 0 "register_operand" "+a")


operands[0] is not an input here, it's only an output.
Hence "=a" is the right constraint, not "+a".


(plus:HI (match_operand:HI 1 "register_operand" "%0")
(match_operand:HI 2 "general_operand" "aim")))]
""
{
output_asm_insn("//Start of addhi3_regtarget %0 = %1 + %2", operands);
snap_do_basic_math_op_hi(operands, MATH_OP_PLUS);
output_asm_insn("//End of addhi3_regtarget", operands);
return("");
}
)


(define_insn "addhi3_memtarget"
[(set (match_operand:HI 0 "memory_operand""+m")
(plus:HI (match_operand:HI 1 "memory_operand" "%0")
(match_operand:HI 2 "general_operand" "aim")))]


You might consider one insn with several alternatives like:

(define_insn "*addhi3_insn"
  [(set (match_operand:HI 0 "nonimmediate_operand"  "=a ,m")
(plus:HI (match_operand:HI 1 "register_operand" "%0 ,0")
 (match_operand:HI 2 "general_operand"  "aim,aim")))]
 ""
 {
   // Output depending on which_alternative.
   return "";
 })

Johann


""
{
output_asm_insn("//Start of addhi3_memtarget %0 = %1 + %2", operands);
snap_do_basic_math_op_hi(operands, MATH_OP_PLUS);
output_asm_insn("//End of addhi3_memtarget", operands);
return("");
}
)

I compile a simple program with this:

void addit()
{
int a, b, c;

a = -10;
b = 2;
c = a + b;
}


And the compiler fails out with the following message:

addit.c: In function 'addit':
addit.c:12:1: internal compiler error: in find_reloads, at reload.c:4085
}
^
0x8f5953 find_reloads(rtx_insn*, int, int, int, short*)
../../gcc-6.2.0/gcc/reload.c:4085
0x90327b calculate_needs_all_insns
../../gcc-6.2.0/gcc/reload1.c:1484
0x90327b reload(rtx_insn*, int)
../../gcc-6.2.0/gcc/reload1.c:995
0x7e8f11 do_reload
../../gcc-6.2.0/gcc/ira.c:5437
0x7e8f11 execute
../../gcc-6.2.0/gcc/ira.c:5609

It would seem that the constraints are somehow not right, but I am not familiar 
with the particular way the compiler does this step.  Any insights or pointers?


Thanks,

Steve S


[snipped TOFU]



Getting spurious FAILS in testsuite?

2017-06-01 Thread Georg-Johann Lay

Hi, when I am running the gcc testsuite in $builddir/gcc then

$ make check-gcc RUNTESTFLAGS='ubsan.exp'

comes up with spurious fails.

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file 
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for 
target.
Using /home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/config/default.exp 
as tool-and-target-specific interface file.
Running 
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.dg/ubsan/ubsan.exp ...

FAIL: c-c++-common/ubsan/float-cast-overflow-8.c   -O2  output pattern test
FAIL: c-c++-common/ubsan/overflow-mul-4.c   -O0  output pattern test

...

when I am running the 1st test alone, then it works:

$ make check-gcc RUNTESTFLAGS='ubsan.exp=float-cast-overflow-8.c'

In an older log file I found for a different test from the same folder:

/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1: 
runtime error: value  is outside the range of representable 
values of type 'unsigned int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1: 
runtime error: value  is outside the range of representable 
values of type 'long int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1: 
runtime error: PASS: c-c++-common/ubsan


BANG: "PASS" output from previous test run shreds this one?

/float-cast-overflow-10.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-10.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test

Output was:
c-c++-common/ubsan/float-cast-overflow-7.h:147:1: runtime error: value 
 is outside the range of representable values of type 'signed char'

...
c-c++-common/ubsan/float-cast-overflow-7.h:149:1: runtime error:
Should match:

The last output line stops after "runtime error: ", i.e. at the place
where the "PASS" appears.

Any ideas?

Johann


Re: Getting spurious FAILS in testsuite?

2017-06-01 Thread Georg-Johann Lay

On 01.06.2017 14:59, Georg-Johann Lay wrote:

Hi, when I am running the gcc testsuite in $builddir/gcc then



FYI, I found the following thread which reports a similar problem,
but without and solution :-(

CC'ing Diego, maybe he remembers the solution from back then...

https://gcc.gnu.org/ml/libstdc++/2011-03/msg00083.html



$ make check-gcc RUNTESTFLAGS='ubsan.exp'

comes up with spurious fails.

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file
for target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for
target.
Using /home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/config/default.exp
as tool-and-target-specific interface file.
Running
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.dg/ubsan/ubsan.exp ...
FAIL: c-c++-common/ubsan/float-cast-overflow-8.c   -O2  output pattern test
FAIL: c-c++-common/ubsan/overflow-mul-4.c   -O0  output pattern test

...

when I am running the 1st test alone, then it works:

$ make check-gcc RUNTESTFLAGS='ubsan.exp=float-cast-overflow-8.c'

In an older log file I found for a different test from the same folder:

/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: value  is outside the range of representable
values of type 'unsigned int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: value  is outside the range of representable
values of type 'long int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: PASS: c-c++-common/ubsan

BANG: "PASS" output from previous test run shreds this one?

/float-cast-overflow-10.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-10.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test
Output was:
c-c++-common/ubsan/float-cast-overflow-7.h:147:1: runtime error: value
 is outside the range of representable values of type 'signed
char'
...
c-c++-common/ubsan/float-cast-overflow-7.h:149:1: runtime error:
Should match:

The last output line stops after "runtime error: ", i.e. at the place
where the "PASS" appears.

Any ideas?

Johann





Re: Getting spurious FAILS in testsuite?

2017-06-01 Thread Georg-Johann Lay

On 01.06.2017 16:16, Marek Polacek wrote:

On Thu, Jun 01, 2017 at 02:59:37PM +0200, Georg-Johann Lay wrote:

Hi, when I am running the gcc testsuite in $builddir/gcc then

$ make check-gcc RUNTESTFLAGS='ubsan.exp'

comes up with spurious fails.

Running target unix
Using /usr/share/dejagnu/baseboards/unix.exp as board description file for
target.
Using /usr/share/dejagnu/config/unix.exp as generic interface file for
target.
Using /home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/config/default.exp as
tool-and-target-specific interface file.
Running
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/gcc.dg/ubsan/ubsan.exp ...
FAIL: c-c++-common/ubsan/float-cast-overflow-8.c   -O2  output pattern test
FAIL: c-c++-common/ubsan/overflow-mul-4.c   -O0  output pattern test

...

when I am running the 1st test alone, then it works:

$ make check-gcc RUNTESTFLAGS='ubsan.exp=float-cast-overflow-8.c'

In an older log file I found for a different test from the same folder:

/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: value  is outside the range of representable values
of type 'unsigned int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: value  is outside the range of representable values
of type 'long int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:149:1:
runtime error: PASS: c-c++-common/ubsan

BANG: "PASS" output from previous test run shreds this one?

/float-cast-overflow-10.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-10.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test
Output was:
c-c++-common/ubsan/float-cast-overflow-7.h:147:1: runtime error: value
 is outside the range of representable values of type 'signed char'
...
c-c++-common/ubsan/float-cast-overflow-7.h:149:1: runtime error:
Should match:

The last output line stops after "runtime error: ", i.e. at the place
where the "PASS" appears.

Any ideas?


Does this help?

diff --git gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c 
gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c
index 4adb22a..746fe20 100644
--- gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c
+++ gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-8.c
@@ -140,4 +140,4 @@ main ()
 /* { dg-output "\[^\n\r]*value \[0-9.e+-]* is outside the range of representable 
values of type 'long long int'\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*value -1 is outside the range of representable values of 
type 'long long unsigned int'\[^\n\r]*(\n|\r\n|\r)" } */
 /* { dg-output "\[^\n\r]*value \[0-9.e+-]* is outside the range of representable 
values of type '__int128'\[^\n\r]*(\n|\r\n|\r)" { target { int128 } } } */
-/* { dg-output "\[^\n\r]*value -1 is outside the range of representable values of 
type '__int128 unsigned'\[^\n\r]*(\n|\r\n|\r)" { target { int128 } } } */
+/* { dg-output "\[^\n\r]*value -1 is outside the range of representable values of 
type '__int128 unsigned'" { target { int128 } } } */

Marek


No, still switching back and forth (for a different set of options this 
time):


$ make check-gcc RUNTESTFLAGS='-all ubsan.exp=float-cast-overflow-8.c'

...
PASS: c-c++-common/ubsan/float-cast-overflow-8.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test


$ make check-gcc RUNTESTFLAGS='-all ubsan.exp=float-cast-overflow-8.c'

...
FAIL: c-c++-common/ubsan/float-cast-overflow-8.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test


In the last case, gcc.log reads (notice the "testsuiPASS" and the 
"trunk/gcc/testsui" at the end):




/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:95:1: 
runtime error: 
val/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-7.h:95:1: 
runtime error: value -1 is outside the range of representable values of 
type 'short unsigned int'
/home/georg/gnu/gcc.gnu.org/trunk/gcc/testsuiPASS: 
c-c++-common/ubsan/float-cast-overflow-8.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL: c-c++-common/ubsan/float-cast-overflow-8.c   -O2 -flto 
-fuse-linker-plugin -fno-fat-lto-objects  output pattern test

Output was:
c-c++-common/ubsan/float-cast-overflow-7.h:93:1: runtime error: value 
-129 is outside the range of representable values of type 'signed char'


...

c-c++-common/ubsan/float-cast-overflow-7.h:95:1: runtime error: value 
-32769 is outside the range of representable values of type 'short int'
c-c++-common/ubsan/float-cast-overflow-7.h:95:1: runtime error: value -1 
is outside the range of representable values of type 'short unsigned int'

trunk/gcc/testsui
Should match:




Re: Getting spurious FAILS in testsuite?

2017-06-08 Thread Georg-Johann Lay

On 05.06.2017 18:25, Jim Wilson wrote:

On 06/01/2017 05:59 AM, Georg-Johann Lay wrote:

Hi, when I am running the gcc testsuite in $builddir/gcc then
$ make check-gcc RUNTESTFLAGS='ubsan.exp'
comes up with spurious fails.


This was discussed before, and the suspicion was that it was a linux
kernel bug.  There were multiple kernel fixes pointed at, it wasn't
clear which one was required to fix it.

I have Ubuntu 16.04 LTS on my laptop, and I see the problem.  I can't
run the ubsan testsuites with -j factor greater than one and get
reproducible results.  There may also be other ways to trigger the problem.

See for instance the thread
https://gcc.gnu.org/ml/gcc/2016-07/msg00117.html
The first message in the thread from Andrew Pinski mentions that the log
output is corrupted from apparent buffer overflow.

Jim



I have "Ubuntu 16.04.2 LTS".

Asking this at DejaGNU's, I got the following pointer:

https://lists.gnu.org/archive/html/dejagnu/2016-03/msg00034.html

AFAIU there is a problem separating stdout and stderr?

Johann





Re: Getting spurious FAILS in testsuite?

2017-07-06 Thread Georg-Johann Lay

On 08.06.2017 23:33, Andrew Pinski wrote:

On Thu, Jun 8, 2017 at 2:25 PM, Jeff Law  wrote:

On 06/08/2017 04:24 AM, Christophe Lyon wrote:

On 8 June 2017 at 11:57, Georg-Johann Lay  wrote:

On 05.06.2017 18:25, Jim Wilson wrote:


On 06/01/2017 05:59 AM, Georg-Johann Lay wrote:


Hi, when I am running the gcc testsuite in $builddir/gcc then
$ make check-gcc RUNTESTFLAGS='ubsan.exp'
comes up with spurious fails.


This was discussed before, and the suspicion was that it was a linux
kernel bug.  There were multiple kernel fixes pointed at, it wasn't
clear which one was required to fix it.

I have Ubuntu 16.04 LTS on my laptop, and I see the problem.  I can't
run the ubsan testsuites with -j factor greater than one and get
reproducible results.  There may also be other ways to trigger the
problem.

See for instance the thread
 https://gcc.gnu.org/ml/gcc/2016-07/msg00117.html
The first message in the thread from Andrew Pinski mentions that the log
output is corrupted from apparent buffer overflow.

Jim


I have "Ubuntu 16.04.2 LTS".

Asking this at DejaGNU's, I got the following pointer:

https://lists.gnu.org/archive/html/dejagnu/2016-03/msg00034.html

AFAIU there is a problem separating stdout and stderr?



Be careful, I'm not a dejagnu maintainer/developer :-)
I just meant to say I had "similar" problems, but according to this
thread, I'm not the only one :(

There was most definitely a linux kernel bug that affected the behavior
of "expect" used by dejagnu in the past.

THe gcc.gnu.org reference to a message from Pinski is the right one --
it identifies the problematical change in the kernel that mucked up
expect's behavior.

In the thread you'll find a reference to:

https://bugzilla.kernel.org/show_bug.cgi?id=96311

This has long since been fixed.  BUt I have no idea what version of hte
kernel is in Ubuntu and whether or not it is subject to this problem.


I think 4.10 or 4.11 has the full fix.  There was still some bugs even
in 4.8 (which was the one included with Ubuntu 1604).  I only say this
is because I have a 4.8 kernel which sees the problem but a 4.11
kernel does not.  The behavior I see with a non fixed kernel is that
the guailty tests will not run at all.  With the fixed kernel, it will
run.

Thanks,
Andrew


FYI, I am on 4.11.2 now and the spurious FAILs persist.

$ uname -a
Linux 4.11.2-041102-generic #201705201036 SMP ... x86_64 GNU/Linux

$ lsb_release -dicr
Distributor ID: Ubuntu
Description:Ubuntu 16.04.2 LTS
Release:16.04
Codename:   xenial

Johann


combiner: how to compute cost for bit insertion?

2017-07-10 Thread Georg-Johann Lay

Hi, I'd need some help with the following optimization issue:

avr backend supports insns for bit insertion, and insn combiner tries to 
use them:


unsigned char bset (unsigned char a, unsigned char n)
{
  return (a & ~0x40) | (n & 0x40);
}


Trying 7 -> 14:
Successfully matched this instruction:
(set (zero_extract:QI (reg/i:QI 24 r24)
(const_int 1 [0x1])
(const_int 6 [0x6]))
(lshiftrt:QI (reg:QI 52)
(const_int 6 [0x6])))
rejecting combination of insns 7 and 14
original costs 4 + 4 = 8
replacement cost 24


Hence the existing insn is rejected because of too high costs.

The problem is that the backend only sees

avr_rtx_costs[bset:combine(266)]=true (size) total=24, outer=set:
(lshiftrt:QI (reg:QI 52)
(const_int 6 [0x6]))

Hence this looks like a QI shift as the ZERO_EXTRACT is killed, only the 
outer SET is available which is not very helpful.


A shift is actually more expensive than a bit insertion.

How can I fix that?


What I'd like to avoid is to write hell of many complicated patterns 
like for:


Trying 8, 7 -> 9:
Failed to match this instruction:
(set (reg:QI 50)
(ior:QI (and:QI (reg/v:QI 49 [ n ])
(const_int 64 [0x40]))
(and:QI (reg:QI 24 r24 [ a ])
(const_int -65 [0xffbf]

This would be a different representation of bit insertion, but it would 
also need many patterns:


* Ones for same bit number (like in the example)
* Ones where the src bit is smaller than the dest bit (needs ASHIFT).
* Ones where the src bit is greater than the dest bit (needs LSHIFTRT).
* Ones where the MSB has to be inserted (will use other canonical form)
* Ones where the LSB has to be inserted (will use other canonical form)
* ... you name it.

Any ideas for a sane approach?


Thanks,

Johann


Re: combiner: how to compute cost for bit insertion?

2017-07-11 Thread Georg-Johann Lay

On 10.07.2017 23:40, Segher Boessenkool wrote:

On Mon, Jul 10, 2017 at 05:10:03PM +0200, Georg-Johann Lay wrote:

Any ideas for a sane approach?


You could change insn_rtx_cost to actually calculate the cost of the
insn, not just set_src_cost of a single set.  This will need checking
on a great many targets, not in the least because most target's cost
functions are, uh, not so good.  Big project, but hopefully well worth
the effort.


I already thought of such an approach by means of a new target hook;
well aware that maintainers are not very fond of new hooks.

Something like so:

Add a new hook that, per default, returns a magic value which indicates
that it is not implemented or that it's too tedious to compute for a
backend.  In that case, SET_DEST is dropped and the usual rtx_cost hook
is run.

If a different value is returned, that value is used for the cost.

The changes would be mostly in rtlanal.c to use cost from that hook
or graceful degrade to classic rtx_cost.

That way, targets could migrate to that hook if they need more exact
cost or SET_DEST or want to dig into PARALLELs. Cf. PR80929.

Just passing a pattern or SET with outer_code = INSN might raise so many
performance degradations across all targets when they would return
some "unknown cost" or make unknown stuff expensive.  Hence passing
outer_code = INSN is not a good idea, except we find someone who is
proficient in all backends' costs and willing to go through that...

After all, what's important is to get best compilation results (at least
if we don'd want to hear even more of the "just switch to xxx compiler:
better code, better diagnostics, better support").

As my 3-liner for PR80929 is already pending review for > 1 month now,
the prospects are not very promising for getting anything like that
to be reviewed...



Or you change your cost function to on a QImode shift assume it is an
insert instruction.  Not correct of course (but neither is the currently
calculated cost), but perhaps it gives good results in practice?


Unlikely.  That machine has no barrel shifter, and when MUL or DIV is
expanded using shifts, exact costs for shifts are needed.

Johann



Segher



onlinedocs: How to link to external HTML documents?

2017-07-11 Thread Georg-Johann Lay

Hi, in a recent change I added a pointer to some
Binutils documentation in doc/extend.texi:

see the GNU Binutils
@w{@uref{https://sourceware.org/binutils/docs/as/AVR_002dDependent.html,AVR 
assembler manual}}.


This worked well in my local build : 
gcc/HTML/gcc-8.0.0/gcc/AVR-Function-Attributes.html reads:


see the GNU Binutils
href="https://sourceware.org/binutils/docs/as/AVR_002dDependent.html";>AVR assembler manual.


However, the now online documentation has a broken link as it is overly 
smart with the _002d:


In

https://gcc.gnu.org/onlinedocs/gcc/AVR-Function-Attributes.html

there is:

see the GNU Binutils
href="https://sourceware.org/binutils/docs/as/AVR-Dependent.html";>AVR assembler manual.


which is a broken link.

How can I fix this?

Johann


Re: onlinedocs: How to link to external HTML documents?

2017-07-11 Thread Georg-Johann Lay

Joseph Myers schrieb:

On Tue, 11 Jul 2017, Georg-Johann Lay wrote:


Hi, in a recent change I added a pointer to some
Binutils documentation in doc/extend.texi:

see the GNU Binutils
@w{@uref{https://sourceware.org/binutils/docs/as/AVR_002dDependent.html,AVR
assembler manual}}.


For other manuals in the Texinfo universe, it's best to use Texinfo @xref 
in its five-argument form.  Then, for such a link to work on gcc.gnu.org, 
you need a .htaccess entry to redirect from /onlinedocs/as to a location 
where that manual is actually available online.


Sounds complicated and requiring access rights I don't have (and 
shouldn't be required to have or don't even want to have).


What's a bit annoying is that I can't even test this locally, because
locally all was fine and the external link worked as expected.

For now I just reverted...

http://gcc.gnu.org/ml/gcc-patches/2017-07/msg00507.html

Maybe what helps is escaping the "_" itself, like _005f002d or %5f002d. 
 At least the first form cannot be tested locally as my local texinfo, 
for whatever reason, deviates from what runs on the gcc server.  I'll 
give the 2nd version a try.


IIRC some links don't even work internally within onlinedocs, for 
example I found no way to link to

http://gcc.gnu.org/onlinedocs/gcc/AVR-Options.html#AVR-Built-in-Macros
even though there's an anchor "AVR-Built-in-Macros".  texinfo will 
always add a redundant top-level anchor like 
AVR-Options.html#AVR-Options for some obscure reason.


Johann


[patch] RFC: Hook for insn costs?

2017-07-12 Thread Georg-Johann Lay

Hi,

the current cost computations in rtlanal.c and maybe other places
suffer from the fact that they are hiding parts of the expressions
from the back-end, like SET_DESTs of single_set or the anatomy of
PARALELLs.

Would it be in order to have a hook like the one attached?

I am aware of that, in an ideal world, there wouldn't be more
than one hook to get rtx costs.  But well...

Whilst rtx_costs does in the majority of cases, there are cases
where hiding information leads to performance degradation,
for example when insn combine cooks up a set zero_extract.
combine.c does actually the right thing as it uses insn_rtx_costs,
but insn_rtx_cost is already a lie because it only uses SET_SRC
and digs into PARALELL without exposing the whole story.

The patch just uses a new targetm.insn_cost hook.  If the
back-end doesn't come up with something useful, the classic
functions with rtx_costs for sub-rtxes are called.

The purpose is to allow a friendly transition and not no
raise any performance degradations which would be very likely
if we just called rtx_costs with outer_code = INSN.

If a back-end finds it useful to implement this hook and need
the whole story, they can do so.  Otherwise, or if it is too
lazy to analyse a specific rtx, they can switch to the old
infrastructure.

Returning a magic value for "unknown" is just an implementation
detail; it could just as well be some bool* that would be set
to true or false depending on whether or not the computation
returned something useful or not.

The patch only touches seq_cost insn_rtx_cost of rtlanal.c.

Would something like this be in order, or is a new hook just
a complete no-go?

Index: coretypes.h
===
--- coretypes.h	(revision 250090)
+++ coretypes.h	(working copy)
@@ -253,6 +253,10 @@ construct that is easy to avoid even whe
   WARN_STRICT_OVERFLOW_MAGNITUDE = 5
 };
 
+/* A magic value to be returned by TARGET_INSN_COSTS to indicate that
+   the costs are not known or too complicated to work out there.  */
+#define INSN_COSTS_UNKNOWN (-1234567)
+
 /* The type of an alias set.  Code currently assumes that variables of
this type can take the values 0 (the alias set which aliases
everything) and -1 (sometimes indicating that the alias set is
Index: doc/tm.texi
===
--- doc/tm.texi	(revision 250123)
+++ doc/tm.texi	(working copy)
@@ -6564,6 +6564,39 @@ optimizers should use optab @code{rintdf
 The default hook returns true for all inputs.
 @end deftypefn
 
+@deftypefn {Target Hook} int TARGET_INSN_COSTS (const rtx_insn *@var{insn}, rtx @var{pattern}, bool @var{speed})
+This target hook describes the relative costs of an insn.
+
+@var{insn} might be NULL or an @code{INSN_P}.
+@var{pattern} is the pattern of @var{insn} or an rtx that is supposed
+to be used as the pattern of an insn the remainder of the compilation.
+
+In implementing this hook, you can use the construct
+@code{COSTS_N_INSNS (@var{n})} to specify a cost equal to @var{n}
+instructions.
+When optimizing for execution speed, i.e.@: when @var{speed} is
+true, this target hook should be used to estimate the relative
+execution cost of the pattern, and the size cost of the pattern if
+@var{speed} is false.
+
+Use this pattern if a @code{SET_DEST} is needed to calculate the correct
+costs or when you want to see the whole of a @code{PARALLEL} and not only
+parts of it. Notice that for a @code{single_set} you might see a
+@code{PARALLEL} @var{pattern} that contains a @code{SET} together with
+@code{COBBER}s.
+
+If the pattern is too complicated to be evaluated in this hook, return
+@code{INSN_COSTS_UNKNOWN} which directs the middle-end to use other
+approaches to get the respective costs like calling
+@code{TARGET_RTX_COSTS} for sub-rtxes of @var{pattern}.
+
+Don't implement this hook if it would always return
+@code{INSN_COSTS_UNKNOWN}.
+
+Don't use @code{INSN_COSTS_UNKNOWN} except as a return value for this hook.
+Do @emph{not use} that value as a return value for @code{TARGET_RTX_COSTS}!
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, machine_mode @var{mode}, int @var{outer_code}, int @var{opno}, int *@var{total}, bool @var{speed})
 This target hook describes the relative costs of RTL expressions.
 
Index: doc/tm.texi.in
===
--- doc/tm.texi.in	(revision 250123)
+++ doc/tm.texi.in	(working copy)
@@ -4790,6 +4790,8 @@ @samp{fold_range_test ()} is optimal.  T
 
 @hook TARGET_OPTAB_SUPPORTED_P
 
+@hook TARGET_INSN_COSTS
+
 @hook TARGET_RTX_COSTS
 
 @hook TARGET_ADDRESS_COST
Index: rtlanal.c
===
--- rtlanal.c	(revision 250090)
+++ rtlanal.c	(working copy)
@@ -5263,6 +5263,10 @@ insn_rtx_cost (rtx pat, bool speed)
   int i, cost;
   rtx set;
 
+  int icost = targetm.insn_costs (NULL, pat, speed);
+  if (icost != 

Re: [patch] RFC: Hook for insn costs (v2)?

2017-07-13 Thread Georg-Johann Lay

Hi, this is a bit different proposal.  Instead of using some
magic value to indicate that the hook returns nothing useful,
the hook has not a 4th argument of bool* to ship that information.

The goal and usage is the same as with the first proposal:

* Allow the back-end to compute correct costs and to see the
  full pattern.  Reason:  middle-end currently hides parts of
  patterns that might be relevant and implements strange logic
  in some places, e.g. in insn_rtx_costs.

* Don't introduce any performance degradation by changing
  current usage / assumptions of rtx_costs.


Would such a change be in order in principle?

Ideas to improve it?

I would then round it up and propose it as a patch.

Johann



On 12.07.2017 15:15, Georg-Johann Lay wrote:

Hi,

the current cost computations in rtlanal.c and maybe other places
suffer from the fact that they are hiding parts of the expressions
from the back-end, like SET_DESTs of single_set or the anatomy of
PARALELLs.

Would it be in order to have a hook like the one attached?

I am aware of that, in an ideal world, there wouldn't be more
than one hook to get rtx costs.  But well...

Whilst rtx_costs does in the majority of cases, there are cases
where hiding information leads to performance degradation,
for example when insn combine cooks up a set zero_extract.
combine.c does actually the right thing as it uses insn_rtx_costs,
but insn_rtx_cost is already a lie because it only uses SET_SRC
and digs into PARALELL without exposing the whole story.

The patch just uses a new targetm.insn_cost hook.  If the
back-end doesn't come up with something useful, the classic
functions with rtx_costs for sub-rtxes are called.

The purpose is to allow a friendly transition and not no
raise any performance degradations which would be very likely
if we just called rtx_costs with outer_code = INSN.

If a back-end finds it useful to implement this hook and need
the whole story, they can do so.  Otherwise, or if it is too
lazy to analyse a specific rtx, they can switch to the old
infrastructure.

Returning a magic value for "unknown" is just an implementation
detail; it could just as well be some bool* that would be set
to true or false depending on whether or not the computation
returned something useful or not.

The patch only touches seq_cost insn_rtx_cost of rtlanal.c.

Would something like this be in order, or is a new hook just
a complete no-go?



Index: target.def
===
--- target.def	(revision 250090)
+++ target.def	(working copy)
@@ -3558,6 +3558,41 @@ DEFHOOKPOD
  appropriately.",
  unsigned int, INVALID_REGNUM)
 
+/* Compute the cost of an insn resp. something that might become an insn. */
+DEFHOOK
+(insn_costs,
+"This target hook describes the relative costs of an insn.\n\
+\n\
+@var{insn} might be NULL or an @code{INSN_P}.\n\
+@var{pattern} is the pattern of @var{insn} or an rtx that is supposed\n\
+to be used as the pattern of an insn the remainder of the compilation.\n\
+\n\
+In implementing this hook, you can use the construct\n\
+@code{COSTS_N_INSNS (@var{n})} to specify a cost equal to @var{n}\n\
+instructions.\n\
+When optimizing for execution speed, i.e.@: when @var{speed} is\n\
+true, this target hook should be used to estimate the relative\n\
+execution cost of the pattern, and the size cost of the pattern if\n\
+@var{speed} is false.\n\
+\n\
+Use this pattern if a @code{SET_DEST} is needed to calculate the correct\n\
+costs or when you want to see the whole of a @code{PARALLEL} and not only\n\
+parts of it. Notice that for a @code{single_set} you might see a\n\
+@code{PARALLEL} @var{pattern} that contains a @code{SET} together with\n\
+@code{COBBER}s.\n\
+\n\
+If the hook computed the cost, set @var{*cost_computed} to true.\n\
+If the hook implementation choses not to compute the cost for some reason,\n\
+set @var{*cost_computed} to false.  This directs the middle-end to use\n\
+other approaches to get the respective costs like calling\n\
+@code{TARGET_RTX_COSTS} for sub-rtxes of @var{pattern}.\n\
+\n\
+Don't implement this hook if it would always set @var{*cost_computed} to\n\
+false.  Even if this hook handles all cases, you still need to implement\n\
+@code{TARGET_RTX_COSTS}.",
+ int, (const rtx_insn *insn, rtx pattern, bool speed, bool *cost_computed),
+ default_insn_costs)
+
 /* Compute a (partial) cost for rtx X.  Return true if the complete
cost has been computed, and false if subexpressions should be
scanned.  In either case, *TOTAL contains the cost result.  */
Index: targhooks.c
===
--- targhooks.c	(revision 250090)
+++ targhooks.c	(working copy)
@@ -2108,4 +2108,11 @@ default_excess_precision (enum excess_pr
   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT;
 }
 
+int
+default_insn_costs (const rtx_insn*, rtx, bool, bool *cost_computed)
+{
+  * cost_computed = false;
+  re

Re: Getting spurious FAILS in testsuite?

2017-07-13 Thread Georg-Johann Lay

On 12.07.2017 15:40, Bernd Edlinger wrote:

On 07/11/17 22:28, Bernd Edlinger wrote:

On 07/11/17 21:42, Andrew Pinski wrote:

On Tue, Jul 11, 2017 at 12:31 PM, Andrew Pinski 
wrote:

On Tue, Jul 11, 2017 at 12:27 PM, Andrew Pinski 
wrote:

On Tue, Jul 11, 2017 at 12:15 PM, Bernd Edlinger
 wrote:

Hi,

I see this now as well on Ubuntu 16.04, but I doubt that the Kernel is
to blame.


I don't see these failures when I use a 4.11 kernel.  Only with a
4.4 kernel.
Also the guality testsuite does not run at all with a 4.4 kernel, it
does run when using a 4.11 kernel; I suspect this is the same symptom
of the bug.



4.11 kernel results (with CentOS 7.4 glibc):
https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg00998.html

4.4 kernel results (with ubuntu 1604 glibc):
https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg01009.html

Note the glibc does not matter here as I get a similar testsuite
results as the CentOS 7.4 glibc using the Ubuntu 1604 glibc but with
the 4.11 kernel.

Also note I get a similar results with a plain 4.11 kernel compared to
the above kernel.



Maybe commit 6a7e6f78c235975cc14d4e141fa088afffe7062c  or
0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa to the kernel fixes both
issues.



As Georg-Johann points out here:
https://gcc.gnu.org/ml/gcc/2017-07/msg00028.html
updating the kernel alone does not fix the issue.

These patches do all circle around pty and tty devices,
but in the strace file I see a pipe(2) command
creating the fileno 10 and the sequence of events is
as follows:

pipe([7, 10]) = 0   (not in my previous e-mail)
clone() = 16141
clone() = 16142
write(4, "spawn: returns {0}\r\n", 20)  = 20
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16141,
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16142,
read(10,...,4096) = 4096
read(10,...,4096) = 2144
read(10, "", 4096) = 0
close(10) = 0
wait4(16141, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16141
wait4(16142, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16142

All data arrive at the expect process, but apparently too quickly.


Thanks
Bernd.




Oh, I think the true reason is this patch:
Author: Upstream
Description: Report and fix both by Nils Carlson.
   Replaced a cc==0 check with proper Tcl_Eof() check.
Last-Modified: Fri, 23 Oct 2015 11:09:07 +0300
Bug: http://sourceforge.net/p/expect/patches/18/
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799301

--- a/expect.c
+++ b/expect.c
@@ -1860,19 +1860,11 @@
   if (cc == EXP_DATA_NEW) {
/* try to read it */
cc = expIRead(interp,esPtr,timeout,tcl_set_flags);
-   
-   /* the meaning of 0 from i_read means eof.  Muck with it a */
-   /* little, so that from now on it means "no new data arrived */
-   /* but it should be looked at again anyway". */
-   if (cc == 0) {
+
+   if (Tcl_Eof(esPtr->channel)) {
cc = EXP_EOF;
-   } else if (cc > 0) {
-   /* successfully read data */
-   } else {
-   /* failed to read data - some sort of error was encountered such as
-* an interrupt with that forced an error return
-*/
}
+
   } else if (cc == EXP_DATA_OLD) {
cc = 0;
   } else if (cc == EXP_RECONFIGURE) {


The correct way to fix this would be something like this:

  if (cc == 0 && Tcl_Eof(esPtr->channel)) {
  cc = EXP_EOF;
  }

What happens for me is cc > 0 AND Tcl_Eof at the same time.
That makes dejagnu ignore the last few lines, because it does
not expect EOF and data at the same time.

Apparently tcl starts to read ahead because the default match
buffer size is unreasonably small like 2000 bytes.

I can work around it by increasing the buffer size like this:

ndex: gcc/testsuite/lib/gcc-dg.exp
===
--- gcc/testsuite/lib/gcc-dg.exp(revision 250150)
+++ gcc/testsuite/lib/gcc-dg.exp(working copy)
@@ -43,6 +43,9 @@
 setenv LANG C.ASCII
   }

+# Avoid sporadic data-losses with expect
+match_max -d 1
+
   # Ensure GCC_COLORS is unset, for the rare testcases that verify
   # how output is colorized.
   if [info exists ::env(GCC_COLORS) ] {


What do you think?
Can debian/ubuntu people reproduce and fix this?
Should we increase the match buffer size on the gcc side?

Thanks
Bernd.


Sigh, I am still getting these spurious fails.

$ sudo apt-get install tcl-expect
Reading package lists... Done
Building dependency tree
Reading state information... Done
tcl-expect is already the newest version (5.45-7).
...

So I have "5.45-7"

$ apt-get changelog tcl-expect
expect (5.45-7) unstable; urgency=medium

  * Applied a few fixes by upstream which were included in Expect 5.45.3
never released as a tarball (closes: #799301).
  * Bumped standards version to 3.9.6.

 -- Sergei Golovan   Fri, 23 Oct 2015 11:27:59 +0300
...

$ expect -v
expect version 5.45

Not showing the bugfix level as of configure:PACKAGE_VERSION='5.45'.

I also tried

Re: Getting spurious FAILS in testsuite?

2017-07-14 Thread Georg-Johann Lay

On 13.07.2017 18:47, Bernd Edlinger wrote:

On 07/13/17 16:31, Georg-Johann Lay wrote:

On 12.07.2017 15:40, Bernd Edlinger wrote:

On 07/11/17 22:28, Bernd Edlinger wrote:

On 07/11/17 21:42, Andrew Pinski wrote:

On Tue, Jul 11, 2017 at 12:31 PM, Andrew Pinski 
wrote:

On Tue, Jul 11, 2017 at 12:27 PM, Andrew Pinski 
wrote:

On Tue, Jul 11, 2017 at 12:15 PM, Bernd Edlinger
 wrote:

Hi,

I see this now as well on Ubuntu 16.04, but I doubt that the
Kernel is
to blame.


I don't see these failures when I use a 4.11 kernel.  Only with a
4.4 kernel.
Also the guality testsuite does not run at all with a 4.4 kernel, it
does run when using a 4.11 kernel; I suspect this is the same symptom
of the bug.



4.11 kernel results (with CentOS 7.4 glibc):
https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg00998.html

4.4 kernel results (with ubuntu 1604 glibc):
https://gcc.gnu.org/ml/gcc-testresults/2017-07/msg01009.html

Note the glibc does not matter here as I get a similar testsuite
results as the CentOS 7.4 glibc using the Ubuntu 1604 glibc but with
the 4.11 kernel.

Also note I get a similar results with a plain 4.11 kernel compared to
the above kernel.



Maybe commit 6a7e6f78c235975cc14d4e141fa088afffe7062c  or
0f40fbbcc34e093255a2b2d70b6b0fb48c3f39aa to the kernel fixes both
issues.



As Georg-Johann points out here:
https://gcc.gnu.org/ml/gcc/2017-07/msg00028.html
updating the kernel alone does not fix the issue.

These patches do all circle around pty and tty devices,
but in the strace file I see a pipe(2) command
creating the fileno 10 and the sequence of events is
as follows:

pipe([7, 10]) = 0   (not in my previous e-mail)
clone() = 16141
clone() = 16142
write(4, "spawn: returns {0}\r\n", 20)  = 20
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16141,
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=16142,
read(10,...,4096) = 4096
read(10,...,4096) = 2144
read(10, "", 4096) = 0
close(10) = 0
wait4(16141, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16141
wait4(16142, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 16142

All data arrive at the expect process, but apparently too quickly.


Thanks
Bernd.




Oh, I think the true reason is this patch:
Author: Upstream
Description: Report and fix both by Nils Carlson.
Replaced a cc==0 check with proper Tcl_Eof() check.
Last-Modified: Fri, 23 Oct 2015 11:09:07 +0300
Bug: http://sourceforge.net/p/expect/patches/18/
Bug-Debian: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799301

--- a/expect.c
+++ b/expect.c
@@ -1860,19 +1860,11 @@
if (cc == EXP_DATA_NEW) {
/* try to read it */
cc = expIRead(interp,esPtr,timeout,tcl_set_flags);
-
-/* the meaning of 0 from i_read means eof.  Muck with it a */
-/* little, so that from now on it means "no new data arrived */
-/* but it should be looked at again anyway". */
-if (cc == 0) {
+
+if (Tcl_Eof(esPtr->channel)) {
cc = EXP_EOF;
-} else if (cc > 0) {
-/* successfully read data */
-} else {
-/* failed to read data - some sort of error was encountered
such as
- * an interrupt with that forced an error return
- */
}
+
} else if (cc == EXP_DATA_OLD) {
cc = 0;
} else if (cc == EXP_RECONFIGURE) {


The correct way to fix this would be something like this:

   if (cc == 0 && Tcl_Eof(esPtr->channel)) {
   cc = EXP_EOF;
   }

What happens for me is cc > 0 AND Tcl_Eof at the same time.
That makes dejagnu ignore the last few lines, because it does
not expect EOF and data at the same time.

Apparently tcl starts to read ahead because the default match
buffer size is unreasonably small like 2000 bytes.

I can work around it by increasing the buffer size like this:

ndex: gcc/testsuite/lib/gcc-dg.exp
===
--- gcc/testsuite/lib/gcc-dg.exp(revision 250150)
+++ gcc/testsuite/lib/gcc-dg.exp(working copy)
@@ -43,6 +43,9 @@
  setenv LANG C.ASCII
}

+# Avoid sporadic data-losses with expect
+match_max -d 1
+
# Ensure GCC_COLORS is unset, for the rare testcases that verify
# how output is colorized.
if [info exists ::env(GCC_COLORS) ] {


What do you think?
Can debian/ubuntu people reproduce and fix this?
Should we increase the match buffer size on the gcc side?

Thanks
Bernd.


Sigh, I am still getting these spurious fails.

$ sudo apt-get install tcl-expect
Reading package lists... Done
Building dependency tree
Reading state information... Done
tcl-expect is already the newest version (5.45-7).
...

So I have "5.45-7"

$ apt-get changelog tcl-expect
expect (5.45-7) unstable; urgency=medium

* Applied a few fixes by upstream which were included in Expect 5.45.3
  never released as a tarball (closes: #799301).
* Bumped standards version to 3.9.6.

   -- Sergei Golovan   Fri,

expmed costs and i386.c cost for widening mul (PR81444)

2017-07-17 Thread Georg-Johann Lay

Hi, while testing a patch to fix PR81444, I came across a new FAIL due
to the patch in i386.c/pr71321.c

PR81444 is about wrong modes used by expmed.c as it computes costs for
widening operations like widening mul.  It uses  GET_MODE_WIDER_MODE
for the wider mode where is should use GET_MODE_2XWIDER_MODE (the
internals specify a mode 2 times as wide for the wider mode).

Applying the patch from below then leads to the following change in
code generated for PR71321.c:

Without patch:



cvt_to_2digit_ascii:
movzbl  %dil, %eax  # 7 *zero_extendqihi2/1 [length = 4]
leal(%rax,%rax,4), %edx # 46*leasi  [length = 3]
leal(%rax,%rdx,8), %edx # 47*leasi  [length = 3]
leal(%rdx,%rdx,4), %edx # 48*leasi  [length = 3]
shrw$11, %dx# 16*lshrhi3_1  [length = 4]
leal(%rdx,%rdx,4), %eax # 49*leasi  [length = 3]
movzbl  %dl, %edx   # 35*zero_extendqisi2/1 [length = 3]
addl%eax, %eax  # 50*ashlsi3_1/1[length = 2]
subl%eax, %edi  # 51*subsi_1/1  [length = 2]
movzbl  %dil, %eax  # 23*zero_extendqisi2/1 [length = 4]
sall$8, %eax# 24*ashlsi3_1/1[length = 3]
orl %edx, %eax  # 36*iorsi_1/1  [length = 2]
addl$667696, %eax   # 37*addsi_1/1  [length = 5]
ret # 55simple_return_internal  [length = 1]


With patch applied:

cvt_to_2digit_ascii:
movl$-51, %edx  # 7 *movqi_internal/2   [length = 5]
movl%edx, %eax  # 35*movqi_internal/1   [length = 2]
mulb%dil# 8 *umulqihi3_1[length = 3]
movl%eax, %edx  # 36*movhi_internal/1   [length = 2]

(Following code is the same)

shrw$11, %dx# 10*lshrhi3_1  [length = 4]
leal(%rdx,%rdx,4), %eax # 37*leasi  [length = 3]
movzbl  %dl, %edx   # 23*zero_extendqisi2/1 [length = 3]
addl%eax, %eax  # 38*ashlsi3_1/1[length = 2]
subl%eax, %edi  # 39*subsi_1/1  [length = 2]
movzbl  %dil, %eax  # 17*zero_extendqisi2/1 [length = 4]
sall$8, %eax# 18*ashlsi3_1/1[length = 3]
orl %edx, %eax  # 24*iorsi_1/1  [length = 2]
addl$667696, %eax   # 25*addsi_1/1  [length = 5]
ret # 43simple_return_internal  [length = 1]

Is this actually a performance regression or did the code improve 
actually? (compiled with -O2 on x86_64).


What would be the proposed fix:  Adjust the test case or is this some
problem in the i386.c cost computation?


Johann



Index: expmed.c
===
--- expmed.c(revision 250090)
+++ expmed.c(working copy)
@@ -119,6 +119,7 @@ init_expmed_one_conv (struct init_expmed
 {
   int to_size, from_size;
   rtx which;
+  machine_mode orig_mode = GET_MODE (all->reg);

   to_size = GET_MODE_PRECISION (to_mode);
   from_size = GET_MODE_PRECISION (from_mode);
@@ -140,6 +141,8 @@ init_expmed_one_conv (struct init_expmed
   PUT_MODE (all->reg, from_mode);
   set_convert_cost (to_mode, from_mode, speed,
set_src_cost (which, to_mode, speed));
+  // Unclobber the mode, callers still use it.
+  PUT_MODE (all->reg, orig_mode);
 }

 static void
@@ -210,7 +213,7 @@ init_expmed_one_mode (struct init_expmed
 }
   if (GET_MODE_CLASS (mode) == MODE_INT)
 {
-  machine_mode  wider_mode = GET_MODE_WIDER_MODE (mode);
+  machine_mode wider_mode = GET_MODE_2XWIDER_MODE (mode);
   if (wider_mode != VOIDmode)
{
  PUT_MODE (all->zext, wider_mode);


Re: expmed costs and i386.c cost for widening mul (PR81444)

2017-07-17 Thread Georg-Johann Lay

On 17.07.2017 10:53, Georg-Johann Lay wrote:

Hi, while testing a patch to fix PR81444, I came across a new FAIL due
to the patch in i386.c/pr71321.c

PR81444 is about wrong modes used by expmed.c as it computes costs for
widening operations like widening mul.  It uses  GET_MODE_WIDER_MODE
for the wider mode where is should use GET_MODE_2XWIDER_MODE (the
internals specify a mode 2 times as wide for the wider mode).

Applying the patch from below then leads to the following change in
code generated for PR71321.c:

Without patch:



cvt_to_2digit_ascii:
 movzbl%dil, %eax# 7*zero_extendqihi2/1[length = 4]
 leal(%rax,%rax,4), %edx# 46*leasi[length = 3]
 leal(%rax,%rdx,8), %edx# 47*leasi[length = 3]
 leal(%rdx,%rdx,4), %edx# 48*leasi[length = 3]
 shrw$11, %dx# 16*lshrhi3_1[length = 4]
 leal(%rdx,%rdx,4), %eax# 49*leasi[length = 3]
 movzbl%dl, %edx# 35*zero_extendqisi2/1[length = 3]
 addl%eax, %eax# 50*ashlsi3_1/1[length = 2]
 subl%eax, %edi# 51*subsi_1/1[length = 2]
 movzbl%dil, %eax# 23*zero_extendqisi2/1[length = 4]
 sall$8, %eax# 24*ashlsi3_1/1[length = 3]
 orl%edx, %eax# 36*iorsi_1/1[length = 2]
 addl$667696, %eax# 37*addsi_1/1[length = 5]
 ret# 55simple_return_internal[length = 1]


With patch applied:

cvt_to_2digit_ascii:
 movl$-51, %edx# 7*movqi_internal/2[length = 5]
 movl%edx, %eax# 35*movqi_internal/1[length = 2]
 mulb%dil# 8*umulqihi3_1[length = 3]
 movl%eax, %edx# 36*movhi_internal/1[length = 2]

(Following code is the same)

 shrw$11, %dx# 10*lshrhi3_1[length = 4]
 leal(%rdx,%rdx,4), %eax# 37*leasi[length = 3]
 movzbl%dl, %edx# 23*zero_extendqisi2/1[length = 3]
 addl%eax, %eax# 38*ashlsi3_1/1[length = 2]
 subl%eax, %edi# 39*subsi_1/1[length = 2]
 movzbl%dil, %eax# 17*zero_extendqisi2/1[length = 4]
 sall$8, %eax# 18*ashlsi3_1/1[length = 3]
 orl%edx, %eax# 24*iorsi_1/1[length = 2]
 addl$667696, %eax# 25*addsi_1/1[length = 5]
 ret# 43simple_return_internal[length = 1]

Is this actually a performance regression or did the code improve 
actually? (compiled with -O2 on x86_64).


What would be the proposed fix:  Adjust the test case or is this some
problem in the i386.c cost computation?



...some more information. Looking into what ix86_rtx_costs is coming
up with. The unchanged version sees the strange modes an computes,
for example:

cost 4 (speed=1, outer=set) =
(zero_extend:HI (reg:XI 87))

cost 4 (speed=1, outer=set) =
(zero_extend:HI (reg:XI 87))

cost 24 (speed=1, outer=set) =
(mult:HI (zero_extend:HI (reg:XI 87))
(zero_extend:HI (reg:XI 87)))

cost 4 (speed=1, outer=truncate) =
(lshiftrt:HI (mult:HI (zero_extend:HI (reg:XI 87))
(zero_extend:HI (reg:XI 87)))
(const_int 8 [0x8]))

cost 0 (speed=1, outer=lshiftrt) =
(const_int 8 [0x8])

cost 4 (speed=1, outer=lshiftrt) =
(zero_extend:HI (reg:XI 87))

cost 4 (speed=1, outer=lshiftrt) =
(zero_extend:HI (reg:XI 87))

cost 24 (speed=1, outer=lshiftrt) =
(mult:HI (zero_extend:HI (reg:XI 87))
(zero_extend:HI (reg:XI 87)))


The patched version doesn't go into the sub-expressions and comes up
with smaller costs of 12 / 12 instead of the 24 / 24 from above:

cost 12 (speed=1, outer=set) =
(mult:HI (zero_extend:HI (reg:QI 87))
(zero_extend:HI (reg:QI 87)))

cost 4 (speed=1, outer=truncate) =
(lshiftrt:HI (mult:HI (zero_extend:HI (reg:QI 87))
(zero_extend:HI (reg:QI 87)))
(const_int 8 [0x8]))

cost 0 (speed=1, outer=lshiftrt) =
(const_int 8 [0x8])

cost 12 (speed=1, outer=lshiftrt) =
(mult:HI (zero_extend:HI (reg:QI 87))
(zero_extend:HI (reg:QI 87)))

Hence, with the proper expressions, costs for widening mul are cheaper
than with the XImode.  XImode is artefact of clobbering mode of
all->reg in expmed.c::init_expmed_one_conv().



Johann



Index: expmed.c
===
--- expmed.c(revision 250090)
+++ expmed.c(working copy)
@@ -119,6 +119,7 @@ init_expmed_one_conv (struct init_expmed
  {
int to_size, from_size;
rtx which;
+  machine_mode orig_mode = GET_MODE (all->reg);

to_size = GET_MODE_PRECISION (to_mode);
from_size = GET_MODE_PRECISION (from_mode);
@@ -140,6 +141,8 @@ init_expmed_one_conv (struct init_expmed
PUT_MODE (all->reg, from_mode);
set_convert_cost (to_mode, from_mode, speed,
  set_src_cost (which, to_mode, speed));
+  // Unclobber the mode, callers still use it.
+  PUT_MODE (all->reg, orig_mode);
  }

  static void
@@ -210,7 +213,7 @@ ini

Re: [patch] RFC: Hook for insn costs?

2017-07-17 Thread Georg-Johann Lay

On 16.07.2017 00:51, Segher Boessenkool wrote:

Hi!

On Wed, Jul 12, 2017 at 03:15:09PM +0200, Georg-Johann Lay wrote:

the current cost computations in rtlanal.c and maybe other places
suffer from the fact that they are hiding parts of the expressions
from the back-end, like SET_DESTs of single_set or the anatomy of
PARALELLs.

Would it be in order to have a hook like the one attached?

I am aware of that, in an ideal world, there wouldn't be more
than one hook to get rtx costs.  But well...


The number of hooks is not a problem.  The overall complexity of this
interface (between the backends and optimisers; a group of related
hooks) _is_ a problem.  Also, the interface should be good for all
targets, easy to use, do one thing and do it well.


The interface is easy and straight forward, but the new hook might not
be as convenient as rtx_costs.  For example, testing for single_set is
a bit tedious when you don't have an insn.

There are situations where rtx_costs is called with outer_code = INSN;
cf. set_rtx_cost.

Using set_src_cost instead of a new hook that gracefully degrades
might make a hell break loose when rtx_costs just return some penalty
costs for unknown stuff instead of "0".

Therefore, a new hook is easier to use and understand, but it might
be confusing in what situations rtx_costs with outer_code = INSN is
used and in what situations patter_costs would be used.

Also not that set_src_cost is used incorrectly for set(zero_extract):
rtl.h states that it does "Return the cost of moving X into a register".
In rtlanal::insn_rtx_cost() however, this function is called for
*any* SETs on the SET_SRC, even if SET_DEST is not a register.

He might fix this by filtering out destinations that are zero_extract.
Subregs might be mem or regs, and the current implementation looks
good.

Moreover, insn_rtx_costs could do something like

rtx_cost (pat, VOIDmode, INSN, 4, speed_p)

when it sees a PARALLEL with more than 1 SET.


Currently we have rtx_costs, which computes an estimated cost for any
expression.  In most cases what we want to compute is the cost of an
instruction though!  It can be a single set (in which case the
expression cost is a reasonable approximation), but it also can be
something else (like, a parallel).  Also, on many targets at least,
it is *much* easier to compute the cost knowing that this is a valid
instruction.

So I argue that we want to have a hook for insn cost.

Now what should it take as input?  An rtx_insn, or just the pattern
(as insn_rtx_cost does)?  And if an rtx_insn, should that than be an
rtx_insn that is already linked into the insn chain (so we can see what
other insns generate its inputs, and which of its outputs are unused,
etc.)?


IMO, if we have an insn then we could also pass it, it doesn't cost
anything.  However I don't know if the insn might have some strange
properties like an INSN_CODE that doesn't reflect the actual pattern.
Ans if reg_notes might be helpful and correct resp. up to date.

It should be in order to run the hook from within a sequence, hence
"insn chain" might not be what the user expects, i.e. prev_insn might
be NULL even though the current function already has some insns.



One comment about your patch:


+/* A magic value to be returned by TARGET_INSN_COSTS to indicate that
+   the costs are not known or too complicated to work out there.  */
+#define INSN_COSTS_UNKNOWN (-1234567)


Why not just -1?  And is 0 really so terrible; in the extremely rare
case we really want cost 0, it won't hurt much saying "unknown", as we
do currently.  For "hook unimplemented", just set the hook to NULL.

Segher


Using "0" is really bad design.  It's not clear when you are reading the
sources that it is some magic value.  Some identifier to refer to such
a value is much better.  And 0 is also bad because there might be
situations where 0 is a legal cost.  There might even be situations
where you want negative cost in order to prefer a specific pattern
re. some alternative (cost functions are not always 100% correct and
are expected costs as they run before reg alloc etc.)

My second proposal avoided magic value altogether and uses a bool*
instead.  My proposal below is similar, but is passes the cost
in a int*.  As the call site will have to check whether the hook
computed the costs, it might be more convenient to return whether
or not the hook came up with the costs or not.

Johann

Index: target.def
===
--- target.def	(revision 250090)
+++ target.def	(working copy)
@@ -3558,6 +3558,39 @@ DEFHOOKPOD
  appropriately.",
  unsigned int, INVALID_REGNUM)
 
+/* Compute the costs of an insn pattern. */
+DEFHOOK
+(pattern_costs,
+"This target hook describes the relative costs of an insn pattern.\n\
+\n\
+@var{pattern} is the pattern of an insn or an rtx that is supposed\n\
+to be used as 

Bug in lto-plugin.c ?

2017-07-18 Thread Georg-Johann Lay

Hi, I tried to build a canadian cross with Configured with
--build=x86_64-linux-gnu
--host=i686-w64-mingw32
--target=avr

While the result appears to work under wine, I am getting the
following error from ld in a non-LTO compile + link:

e:/winavr/8.0_2017-07-18/bin/../lib/gcc/avr/8.0.0/../../../../avr/bin/ld.exe: 
error: asprintf failed


After playing around I found that -fno-use-linker-plugin avoids that
message, and I speculated that the error is emit by lto-plugin.c

In claim_file_handler() we have:


  /* We pass the offset of the actual file, not the archive header.
 Can't use PRIx64, because that's C99, so we have to print the
 64-bit hex int as two 32-bit ones. */
  int lo, hi, t;
  lo = file->offset & 0x;
  hi = ((int64_t)file->offset >> 32) & 0x;
  t = hi ? asprintf (&objname, "%s@0x%x%08x", file->name, lo, hi)
 : asprintf (&objname, "%s@0x%x", file->name, lo);
  check (t >= 0, LDPL_FATAL, "asprintf failed");


If hi != 0, then why is hi printed at the low end? Shouldn't hi and lo
be swapped like so

  t = hi ? asprintf (&objname, "%s@0x%x%08x", file->name, hi, lo)

if this is supposed to yield a 64-bit print?

What else could lead to an "asprintf failed" ?  Unfortunately I have
no idea how to debug that on the host...

Johann



Re: Bug in lto-plugin.c ?

2017-07-19 Thread Georg-Johann Lay

On 19.07.2017 12:46, Richard Biener wrote:

On Tue, Jul 18, 2017 at 6:36 PM, Georg-Johann Lay  wrote:

Hi, I tried to build a canadian cross with Configured with
--build=x86_64-linux-gnu
--host=i686-w64-mingw32
--target=avr

While the result appears to work under wine, I am getting the
following error from ld in a non-LTO compile + link:

e:/winavr/8.0_2017-07-18/bin/../lib/gcc/avr/8.0.0/../../../../avr/bin/ld.exe:
error: asprintf failed

After playing around I found that -fno-use-linker-plugin avoids that
message, and I speculated that the error is emit by lto-plugin.c

In claim_file_handler() we have:


   /* We pass the offset of the actual file, not the archive header.
  Can't use PRIx64, because that's C99, so we have to print the
  64-bit hex int as two 32-bit ones. */
   int lo, hi, t;
   lo = file->offset & 0x;
   hi = ((int64_t)file->offset >> 32) & 0x;
   t = hi ? asprintf (&objname, "%s@0x%x%08x", file->name, lo, hi)
  : asprintf (&objname, "%s@0x%x", file->name, lo);
   check (t >= 0, LDPL_FATAL, "asprintf failed");


If hi != 0, then why is hi printed at the low end? Shouldn't hi and lo
be swapped like so

   t = hi ? asprintf (&objname, "%s@0x%x%08x", file->name, hi, lo)

if this is supposed to yield a 64-bit print?


Looks odd indeed...  I suppose such large archives are the exception ;)



What else could lead to an "asprintf failed" ?  Unfortunately I have
no idea how to debug that on the host...


memory allocation failure I guess.

Richard.


hm, as I still have no idea how to debug, played around by changing
lto-plugin.c.  I managed to get the host on virtual box so that it's
less of a pain trying around...

My guess is that asprintf on that host is broken.  config.log says
that asprintf prototypes are available and host stdio.h shows them.

When I am replacing asprintf with allocation by hand like so:

#if 0
  t = hi ? asprintf (&objname, "%s@0x%x%08x", file->name, lo, hi)
 : asprintf (&objname, "%s@0x%x", file->name, lo);
#else
  objname = xmalloc (30 + strlen (file->name));
  if (objname)
{
  if (hi)
sprintf (objname, "%s@0x%x%08x", file->name, hi, lo);
  else
sprintf (objname, "%s@0x%x", file->name, lo);
  t = 1;
}
  else
t = -2;
#endif

then it works fine.  I also hooked into xmalloc to find other
questionable allocations, but all calls of xmalloc look sane.

For the case in question, 84 bytes are requested and with
xmalloc + sprintf it works fine, so the host-asprintf is broken
somehow.

I also tried to remove the asprintf from host-stdio.h and hoped that
after re-configure libiberty would jump in... but no.  Also I'd expected
that auto-host.h should be included before libiberty.h?

Would a change like above be in order? (without the #if 0 of course)
It's tad that a host's function is broken, but in that case the length
is easy to compute.

Any why is all this stuff executed anyway? It's a non-LTO compilation
with -O0, hence it's a bit surprising that ld needs to look into
libgcc.a.

Johann


Re: Register allocation trouble

2017-07-21 Thread Georg-Johann Lay

Andrew Stubbs schrieb:

Hi all,

I have an architecture that has two register files. Let's call them 
class A and class B. There are some differences between their 
capabilities, but for the purposes of this problem, they can be 
considered to be identical, both holding SImode values, and both able to 
receive values without a secondary reload.


In order to load data into A, the address register must also be in A. 
Similarly, in order to load data into B, the address register must also 
be in B.


So I set up my (simplified) pattern:

(set (match_operand:SI "register_operand" "=a,b")
 (match_operand:SI "memory_operand"   "Ra,Rb"))

where
  "a" is a register in A
  "b" is a register in B
  "Ra" is a mem with base address in A
  "Rb" is a mem with base address in B

(Obviously, there are stores and moves and whatnot too, but you get the 
idea.)


1) You must not have more than 1 mov insn per mode.

2) You must handle any (reg, mem) x (reg, mem, constant) operand
   combination (constants in pools are handled separately).

The problem is that the register allocator cannot see inside Ra and Rb 
to know that it must allocate the base register correctly. It only knows 
that, some of the time, the instruction "does not satisfy its constraints".


Dis you try secondary reload?

I believe the register allocator relies on base_reg_class to choose a 
class for the base register, but that just says "AB" (the union of class 
A and class B), because it has no context to say otherwise. Only the 
constraints define what hard registers are acceptable, and all they see 
is pseudoregs during reload.


Similarly for the other hooks I've investigated: they don't have enough 
context to know what's going on, and the rtx given always has pseudo 
registers anyway.


This might even be the case with secondary reloads. Sometimes you
just get "regclass is GENERAL_REGS" ... great.

The only solutions I can think of are to either preallocate the loads to 
register classes via some means MODE_CODE_BASE_REG_CLASS can see 
(address spaces, perhaps), or to have an alternative that catches the 
mismatching cases and splits to a load and move, or set the base class 
to A and always load via secondary reload for B.


I don't see how address spaces could work.  Who would set the address
spaces of all types and decls? You don't even have enough information
to set the address space (no reg classes, just types and trees).

Dedicated machine mode is also no solution.  You'd just waste time
duplication all SI pattern to also, say, PSI of same mode precision,
but you still have to handle the complete cross product in the mov
insns.  Maybe could could deny specific modes to go into specific
reg classes, but even if that works the resulting code will be sub
optimal and you then need secondary reloads for reg-reg moves.
And the problem of who shall decide which mode to use is still there.

Johann


Any suggestions would be greatly appreciated.

Andrew





Re: GCC Runtime Library Exception in gcc/config/* files?

2017-07-21 Thread Georg-Johann Lay

Sebastian Huber schrieb:

Hello,

there are some files in gcc/config/* that contain the GCC Runtime 
Library Exception


grep -r --include='*.[ch]' 'GCC Runtime Library Exception' -l gcc/config 
| wc

186 1865361

and some files that don't include it

grep -r --include='*.[ch]' 'GCC Runtime Library Exception' -l gcc/config 
-v | wc

753 753   20927

Does it matter? What should be used for new files?


Some machine headers are needed by libgcc.  Not all information is
available by means of built-in macros, so that compile-time decisions
in libgcc might need the target headers.

Likely, this dates back to the time when machine specific libgcc bits
where in gcc/config/$target.

Some users of (lib)gcc which compile their (proprietary) software
with gcc are paranoid about being infected by GPL due to using
libgcc which uses headers without runtime library exception.

https://gcc.gnu.org/ml/gcc-help/2012-08/msg00235.html

See also

https://gcc.gnu.org/PR61152


libgcc sources are not yet separated completey from the compiler, i.e.
it includes tm.h which in turn will include files from the compiler
backend like arm.h.

As such files go into libgcc, they should contain the GCC Runtime
Library Exception (RLE) in their license headers.

I know that these files actually don't add executable code to libgcc and
that the FSF is fine without RLE in these files, yet there are potential
users of GCC that are very concerned about linking their code against
libgcc and about the non-RLE files that are part of the libgcc sources:
as soon as a single file that goes into libgcc (e.g. by include) is GPL
but does not contain the RLE, they won't use GCC as a whole.

Thus, adding RLE to these files can greatly increase acceptance of
GCC/libgcc.



Johann



Re: Register allocation trouble

2017-07-24 Thread Georg-Johann Lay

On 24.07.2017 13:38, Andrew Stubbs wrote:

Thanks to all those who replied. :-)

Here's what I've done to fix the problem:

1. Set the base rclass to A only.

2. Configured secondary reloads to B via A.

3. Disabled the Rb constraint. [*]

That's enough to create correct code, but it's pretty horrible, so I 
also added new patterns of the form Nathan suggested so that the base 
register can be allocated directly, as an optimization. These occur 
before the main mov insn in the search order, and catch only valid MEMs 
that won't get meddled with, so I believe that the "one mov per mode" 
rule can be safely ignored. The main mov pattern can still do the 
loads/stores, via the secondary reloads, so I believe that to be safe too.


Dunno if that works in all situation.  For example, when the register
allocator is facing high register pressure and decides to spill the
target register, it uses the constraints of the matched insn.

Johann


Thanks again

Andrew

[*] I've not removed it because actually it's still active for some 
address spaces, but that's a detail I glossed over previously.






Re: Register allocation trouble

2017-07-24 Thread Georg-Johann Lay

Andrew Stubbs schrieb:

On 24/07/17 14:58, Georg-Johann Lay wrote:

Dunno if that works in all situation.  For example, when the register
allocator is facing high register pressure and decides to spill the
target register, it uses the constraints of the matched insn.


That would be a memory to memory move, and therefore not valid in any 
mov insn on many architectures. Are you sure that's a real thing?


Hard to tell, it depends on many gory details...

If the additional insn is a mov insn that only catches specific cases
by its predicates, then you have definitely a problem (more than 1 mov
insn per mode).

If it has an explicit mem: then the problems are completely different.
For example, it will also match volatile MEMs even for cases when
volatile_ok is false.

In general, it's hard to find test cases which show that something
breaks.

Who generates the additional insn?

Johann


Confused now.

Andrew





Overwhelmed by GCC frustration

2017-07-31 Thread Georg-Johann Lay

Around 2010, someone who used a code snipped that I published in
a wiki, reported that the code didn't work and hang in an
endless loop.  Soon I found out that it was due to some GCC
problem, and I got interested in fixing the compiler so that
it worked with my code.

1 1/2 years later, in 2011, I could provide a patch to fix the
problem, and it was committed as one of my first additions to GCC.
One and a half years -- that was the time to get a Copyright
Assignment from the FSF, the time to get the paperwork done, just
privately for me, no company's legal issues hampering the process.

Attracted by GCC's Free Software philosophy, and also as someone
who used the compiler for small µ-Controller hobby projects, I had
the crazy idea of improving the compiler, fixing bugs, teaching
it better code generation like "hey, there is an instruction too
much", or "this can be done using less registers", turning it into
a better piece of software.  Improving it, until the list of open
problems for the avr target would fit onto one screen -- and the
screen I used was not a big one.

From time to time I am providing snapshot builds for Windows,
because most users use that OS (or because it is way harder to
build GCC for that host), and to get feedback about new additions.
One such feedback, from just 2 weeks ago, was "I would really
prefer some 4.x version that just fixes bugs, because each new
version increases code size by about 2%".

This weekend I un-mothed an old project, just an innocent game on a
cathode-ray-tube, driven by some AVR µC.  After preparing the software
so that it compiled with good old v3.4, the results overwhelmed me
with complete frustration:  Any version from v4.7 up to v8 which I
fed into the compiler (six major version to be precise), produced a
code > 25% larger than compiled with v3.4.6 and the code is not only
bigger, it's all needless bloat that will also slow down the result;
optimizing for speed might bloat and slow even more.

All the -fno-tree-loop-optimize (because -ftree-loop-optimize
decreases performance), the -fno-reorder-blocks (because BB
reordering decreases performance), the -fno-move-loop-invariants
(you guess why) all that asm("":"+r"(var)) hacks to push modern
GCC into the right direction -- it's light years away from the
compiler GCC used to be.

When reading the code generated by v3.4.6, it takes some time until
you can be sure that it's generated by a compiler and not by a human
assembler programmer.  Compared to that, the results from a
"modern" GCC gives the impression of an old, toothless tiger that
chews again and again on the same code, chewing more than 300 times,
trying to crunch a simple, mostly linear code, fruitless and futile
attempts to come up with something smart, transforming and analysing
the stuff again and again, one pass not knowing what the other passes
will do, trampling their results, duplicating the code, using broken
cost models, using implicit handling that may help code generation for
bolides, deeply interwoven into the compiler's algorithms and working,
until that old, poor tiger spits it out again, as half-stomached pulp.

It just left me stunned, overwhelmed with frustration, feeling
ashamed of all that efforts and time and dedication that I put into
that sink.  All that features that GCC learned since then, new combiner
patterns for smarter insns, peepholes for loops, better costs, working
on libgcc asm implementations to quench out the last tick, all these
SSA passes, all that LTO magic, all the C++ transition -- all this
produces a code like from a drunken, typing monkey.

What an antique compiler accomplished in the blink of the eye,
will take 10 times or more host resources, just to come up with
bloat in almost any place.

I didn't even try to find out what is broken. For now it's more than I
can bear. I am just human, so I left it for a wizard as PR81625, for
someone that can repair something that's beyond repair -- beyond repair
not because it cannot be repaired in principle, not because it's
prohibited by some laws of nature, but as the natural consequence
of a multi-hundred pass, re-targetable compiler design.

Already daring to add a simple hook that won't hurt any other target,
that can easily be explained to someone who's not into gcc internals,
will be rejected as too specific, something that doesn't add a single
instruction to any other target, something that doesn't take longer to
execute on the host than a beam of light would need to pass alongside
my body from head to toe.

A bag of more than 20 backend fleas, impossible to contain.
Maybe 4 or 5 can be managed at the same time, being trained to well
behaviour because they are similar enough to fit the same model,
but you'll never get hold of all of them fleas, some targets will
just produce garbage, be happy that they do anything at all and
don't ICE or shred your cathode ray tube - electrons won't wait.

Let me thank for the insights into this piece of real-world software,
f

Re: Overwhelmed by GCC frustration

2017-08-16 Thread Georg-Johann Lay

On 31.07.2017 19:54, Jeff Law wrote:

On 07/31/2017 11:23 AM, Segher Boessenkool wrote:

On Tue, Aug 01, 2017 at 01:12:41AM +0900, Oleg Endo wrote:

I could probably write a similar rant.  This is the life of a "minority
target programmer".  Most development efforts are being done with
primary targets in mind.  And as a result, most changes are being
tested only on such targets.


Also, many changes require retuning of all target backends.  This never


Got the message.

This means it's actually waste of time to work on these backends.  The
code will finally end up in the dustbin as cc0 backends are considered
undesired ballast that has to be "jettisoned".

"Deprecate all cc0" is just a nice formulation of "deprecate
most of the cc0 backends".

Just the fact that the backends that get most attention and attract
most developers don't use cc0 doesn't mean cc0 is a useless device.

First of all, LRA cannot cope with cc0 (Yes, I know deprecating
cc0 is just to deprecate all non-LRA BEs).  LRA asserts that
accessing the frame doesn't change condition code. LRA doesn't
provide replacement for LEGITIMITE_RELOAD_ADDRESS.  Hence LRA
focusses just comfortable, orthogonal targets.

As far as cc0 is concerned, transforming avr BE is not trivial.
It would need rewriting almost all of its md files entirely.
It would need rewriting great deal of avr.c that handle
insn output and provide input to NOTICE_UPDATE_CC.

But my feeling is that opposing deprecation of cc0 is futile,
the voices that support cc0 deprecation are more and usefulness
of cc0 is not recognized.

Sooner or later these backends will end up in /dev/null.

Johann


[patch,avr,v5,packported] Apply avr back-ports to v5.

2017-08-22 Thread Georg-Johann Lay
Backported the wollowing PRs to v5: PR81910, PR80462, PR81407, PR67353, 
PR79883, PR81305, PR75963, PR81487.



gcc/
Backport from 2017-08-22 trunk r251256.
PR target/81910
* config/avr/avr.c (avr_handle_addr_attribute): Early return if
not VAR_P. Filter attribute warnings with OPT_Wattributes.
(avr_attribute_table) : Initialize
.decl_required with true.


gcc/
Backport from 2017-07-05 trunk r249995.
PR target/81305
* config/avr/avr.c (avr_out_movhi_mr_r_xmega) [CONSTANT_ADDRESS_P]:
Don't depend on "optimize > 0".
(out_movhi_r_mr, out_movqi_mr_r): Same.
(out_movhi_mr_r, out_movqi_r_mr): Same.
(avr_address_cost) [CONSTANT_ADDRESS_P]: Don't depend cost for
io_address_operand on "optimize > 0".
gcc/testsuite/
Backport from 2017-07-05 trunk r249995, r249996.
PR target/81305
* gcc.target/avr/isr-test.h: New file.
* gcc.target/avr/torture/isr-01-simple.c: New test.
* gcc.target/avr/torture/isr-02-call.c: New test.
* gcc.target/avr/torture/isr-03-fixed.c: New test.


gcc/
Backport from 2017-04-19 trunk r246997.
PR target/80462
* config/avr/avr.c (tree.h): Include it.
(hash-table.h): Include it.
(hash-set.h): Include it.
(symtab.h): Include it.
(inchash.h): Include it.
(function.h): Include it.
(hash-map.h): Include it.
(plugin-api.h): Include it.
(ipa-ref.h): Include it.
(cgraph.h): Include it.
(avr_encode_section_info): Don't warn for uninitialized progmem
variable if it's just an alias.

Backport from 2017-07-12 trunk r250151.
PR target/81407
* config/avr/avr.c (avr_encode_section_info)
[progmem && !TREE_READONLY]: Error if progmem object needs
constructing.


gcc/
Backport from 2016-06-15 trunk r237486.
Backport from 2017-07-12 trunk r250156.
PR target/79883
PR target/67353
* config/avr/avr.c (avr_set_current_function): Warn misspelled ISR
only if -Wmisspelled-isr is on.  In diagnostic messages: Quote
keywords and (parts of) identifiers.
[WITH_AVRLIBC]: Warn functions named "ISR", "SIGNAL" or "INTERRUPT".
* doc/invoke.texi (AVR Options) <-Wmisspelled-isr>: Document.


gcc/
Backport from 2017-05-06 trunk r247719.
PR rtl-optimization/75964
* simplify-rtx.c (simplify_const_relational_operation): Remove
invalid handling of comparisons of integer ABS.
gcc/testsuite/
Backport from 2017-05-06 trunk r247719.
PR rtl-optimization/75964
* gcc.dg/torture/pr75964.c: New test.

lto-plugin/
Backport from 2017-07-26 gcc-7-branch r250562.
PR lto/81487
* lto-plugin.c (claim_file_handler): Use xasprintf instead of
asprintf.
[hi!=0]: Swap hi and lo arguments supplied to xasprintf.
gcc/
Backport from 2017-07-26 gcc-7-branch r250562.
PR 81487
* tree-ssa-structalias.c (alias_get_name): Use xasprintf instead
of asprintf.
Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 251260)
+++ config/avr/avr.c	(revision 251261)
@@ -9059,10 +9059,12 @@ avr_handle_addr_attribute (tree *node, t
   bool io_p = (strncmp (IDENTIFIER_POINTER (name), "io", 2) == 0);
   location_t loc = DECL_SOURCE_LOCATION (*node);
 
-  if (TREE_CODE (*node) != VAR_DECL)
+  if (!VAR_P (*node))
 {
-  warning_at (loc, 0, "%qE attribute only applies to variables", name);
+  warning_at (loc, OPT_Wattributes, "%qE attribute only applies to "
+		  "variables", name);
   *no_add = true;
+  return NULL_TREE;
 }
 
   if (args != NULL_TREE)
@@ -9072,8 +9074,8 @@ avr_handle_addr_attribute (tree *node, t
   tree arg = TREE_VALUE (args);
   if (TREE_CODE (arg) != INTEGER_CST)
 	{
-	  warning (0, "%qE attribute allows only an integer constant argument",
-		   name);
+	  warning_at (loc, OPT_Wattributes, "%qE attribute allows only an "
+		  "integer constant argument", name);
 	  *no_add = true;
 	}
   else if (io_p
@@ -9082,19 +9084,20 @@ avr_handle_addr_attribute (tree *node, t
 			? low_io_address_operand : io_address_operand)
 			 (GEN_INT (TREE_INT_CST_LOW (arg)), QImode)))
 	{
-	  warning_at (loc, 0, "%qE attribute address out of range", name);
+	  warning_at (loc, OPT_Wattributes, "%qE attribute address "
+		  "out of range", name);
 	  *no_add = true;
 	}
   else
 	{
 	  tree attribs = DECL_ATTRIBUTES (*node);
-	  const char *names[] = { "io", "io_low", "address", NULL } ;
+	  const char *names[] = { "io", "io_low", "address", NULL };
 	  for (const char **p = names; *p; p++)
 	{
 	  tree other = lookup_attribute (*p, attribs);
 	  if (other && TREE_VALUE (other))
 		{
-		  warning_at (loc, 0,
+		  warning_at (loc, OPT_Wattributes,
 			  "b

Re: gcc torture test pr52286.c

2017-08-31 Thread Georg-Johann Lay

Paul S schrieb:
I've ported gcc to a 16 bit CPU and have all torture tests passing bar 
one, pr52286.c


The offending lines of code are

  long a, b = 0;
  asm ("" : "=r" (a) : "0" (0));

which should cause zero to be assigned to the "a" SI sized variable.
Inspecting the generated code revealed that zero was only being assigned 
to the lower 16 bit half of "a".


ldr2,0

I changed the inline asm statement to

  asm ("" : "=r" (a) : "0" (0L));

(note the change 0 to 0L) which caused the correct code to be generated ...

ldr2,0
ldr3,0

Curious, I performed an RTL dump and found that without the trailing 'L' 
following the '0'  the output of the expand pass looks like this ...


(insn 6 5 7 2 (set (reg:SI 29 [ a ])
(asm_operands:SI ("") ("=r") 0 [
(reg:HI 30)
]
 [
(asm_input:HI ("0") 
../gcc/testsuite/gcc.c-torture/execute/pr52286.c:14)

]

compared to

(insn 6 5 7 2 (set (reg:SI 29 [ a ])
(asm_operands:SI ("") ("=r") 0 [
(reg:SI 30)
]
 [
(asm_input:SI ("0") 
../gcc/testsuite/gcc.c-torture/execute/pr52286.c:14)

]

when 0L is used instead of just 0.

so it seems that the "0" constraint on the input operand is affecting 
the inferred mode of the output register operand ?


The 0 will we word_mode, presumably HImode on your machine, whereas
0L is long (obviously SImode on your machine).

The "0" constraint only tells the register allocator to use the same
reg, hence if you use (int)0 and ar eon littel endian, only the
lower 2 bytes will be set and the upper 2 bytes undefined.


Am I reading this correctly or have I missed something ?

Thanks, Paul


No, there are many test cases that are just dumped into the test
suite without proper review, so you can expect fallout when you
target has 16-bit int.

Johann



Re: Status of m32c target?

2018-01-19 Thread Georg-Johann Lay

On 13.01.2018 00:07, Joseph Myers wrote:

On Fri, 12 Jan 2018, Jeff Law wrote:


I was going to suggest deprecation for gcc-8 given how badly it was
broken in gcc-7 and the lack of maintenance on the target.


While we're considering deprecations, what happened to the idea of setting
a timescale by which cc0 targets need to be converted away from cc0 or be
removed?


I still don't quite get why cc0 is considered to be so bad.  Just the 
fact that the big targets don't use it doesn't make it useless.


Yes, I know that CCmode can represent condition code.  But just the fact 
that it can represent it doesn't make it superior or cc0 inferior or 
bad.  Having different representations for the same thing has also its 
obvious upsides (think of different representations in maths or 
physics), and in the present case one has the choice between an explicit 
RTL representation and an implicit (w.r.t. to RTL) one.


The target I am familiar with is avr, and for that particular target, I 
cannot see a single benefit:


- cc0 does a good job and did always a good job in the past. In the 
years I contributed to avr, there hasn't been a single cc0 flaw (all the 
few, minor cc0-related issues were avr BE issues).  From my experience, 
if some middle-end bits flaw for a what-the-dickens-is-xxx-target, then 
that target is left behind and has to hack the backend to surmount the 
middle-end flaw (like for reload or all the other FIXMEs you'll find). 
I wouldn't expect that anyone would fix CCmode shortcomings if some of 
these targets had trouble with it.  And IIUC one of the posts from 
above, m32c is considered to be kicked out because of reload bugs.


- No improvements of generated code are expected.  avr won't benefit 
from separating comparisons from branches (no scheduling).


- Likewise for insn splitting:  If the avr port could benefit more from 
splitting, then such splitting would have to be performed before 
register allocation -- which is not supported as LRA is restricted to 
cases that don't clobber CC (which is not possible for avr), and reload 
will also die soon IIUC.  Hence any CCmode implementation for avr will 
jump in after RA, even if reload might still be in use.


- Currently the best instruction sequence is printed during asm out. 
Often, there is a multitude of different sequences that could perform 
the same task, and the best (usually w.r.t. length) is chosen ad doc. 
The cc0 setting and insn length computation follows the sequence, i.e. 
there are implicit assertions between printed code, insn lengths and cc0 
effect.  All this is in a single place, cf. avr_out_plus all its callees 
like avr_out_plus_1.  Usually, it is not possible to describe the 
conditions for specific sequences on RTL or constraint level (would need 
to filter depending on cartesian product over all constraints, and this 
is not possible because not allowed in insn conditions).


Switching to CCmode would basically require to throw avr into the dust 
bin and start the backend from scratch, at least considerable parts of c 
and md. Even if switching to a clobbers-all solution is feasible, this 
will result in performance drop, and fixing that will likely be no easy 
task (if possible with a reasonable effort at all) and greatly 
destabilize the backend.


Actually, LLVM has an evolving avr backend.  It's far from being mature 
and less stable and optimizing than gcc's.  But instead of putting work 
into a dead horse (at least re. avr) like gcc with strong tendencies of 
self-destruction (which removing cc0 is, IMO), it seems much more 
reasonable to switch to an infrastructure that's more friendly to such 
architectures and not in the decline.


In my option, one of the great upsides of GCC is that it supports so 
many targets, many deeply embedded, mature targets amongst them.  With 
that background, it may very well make sense to have considerably 
different approaches to the same problem.  And most of cc0 code is in 
final and a few parts that keep comparisons and branches together.  So 
kicking out cc0 doesn't seem to bring a maintenance boost, either...


Johann


Re: extern const initialized warns in C

2018-01-21 Thread Georg-Johann Lay

Jay K schrieb:

extern const int foo = 123;

Why does this warn?
This is a valid portable form, with the same meaning
across all compilers, and, importantly, portably
to C and C++.


I also wondered about this.

In C99 §6.9.2 "External object definitions" there's actually
the following example in clause 4:

extern int i3 = 3; // definition, external linkage

Johann


Re: extern const initialized warns in C

2018-01-25 Thread Georg-Johann Lay

On 22.01.2018 16:20, Jonathan Wakely wrote:

On 21 January 2018 at 12:08, Georg-Johann Lay wrote:

Jay K schrieb:


extern const int foo = 123;

Why does this warn?
This is a valid portable form, with the same meaning
across all compilers, and, importantly, portably
to C and C++.


I also wondered about this.

In C99 §6.9.2 "External object definitions" there's actually
the following example in clause 4:

extern int i3 = 3; // definition, external linkage


That's a different case. There's no advantage to the 'extern' here,
because the code means the same thing in C and C++ without the
'extern', so just leave it out.


I'd rather like to know why GCC is throwing a warning here.

It's clear how to hack the C source, but that's a different point.

It's just the case that I don't see any problem with that construct,
and it was worth an explicit example in the standard.  Or is it
common practice to warn constructs that are "no advantage"?

Johann




Re: $target.h vs $target-protos.h

2018-02-25 Thread Georg-Johann Lay

Sandra Loosemore schrieb:
The internals manual says that a backend for $target should have 
$target.h and $target-protos.h files, but doesn't say what the 
difference between them is or what belongs in which file.  Current 
practice seems to be a mix of


(1) $target.h contains macro definitions and $target-protos.h contains 
extern declarations.


(2) $target.h defines the external interface to the backend (the macros 
documented in the internals manual) and $target-protos.h contains things 
shared between $target.c and the various .md files.


But some generic source files include $target.h only and some source 
files include both, which wouldn't be true if either of the above models 
applied.  So is there some other logical way to explain the difference 
and what goes where?


IIRC, one difference is scanning for GTY markers used to tag ggc roots: 
$target.h is scanned whereas $target-protos.h is not (and AFAIR adding 
$target-protos.h in config.gcc to the files being scanned pops up other 
problems). Hence when you have a target-specific GTYed structure that's 
shared by several back-end modules, you'd add the struct definition to 
$target.h (If only one module needs such objects, then you'd add the 
type definition to, say, $target.c which is scanned — or can be rendered 
to a scanned one by adjusting config.gcc).


The bulk of code is not GTYed, of course, and from my experience the 
usage of the mentioned files is like you summarized: $target-protos.h is 
usually a blob of prototypes used internally to communicate within a 
back-end, whereas $target.h "defines" the backend part that's not yet 
hookized, e.g. the TARGET_ macros that define (initializers for) 
register classes etc.


And the usage in libgcc might be different: $target.h is used in libgcc 
(which is the reason for why $target.h might need runtime library 
exceptions, cf. PR61152: 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61152#c0 ) Ideally, all 
information needed by libgcc and other target libraries would be shipped 
by built-in macros, but such complete separation from compiler sources 
from libgcc sources is not yet complete to the best of my knowledge. 
$target-protos.h (via tm_p.h), however, is not used by libgcc.


Johann

The thing that got me thinking about this is looking at a new port 
originally written by a customer, where it seems like there is too much 
stuff in $target.h.  Then I started chasing it down


- FUNCTION_BOUNDARY depends on which arch variant is being compiled for

- Its expansion references some internal target data structures and 
enumerators to look up that info


- The default definition of TARGET_PTRMEMFUNC_VBIT_LOCATION uses 
FUNCTION_BOUNDARY


- ipa-prop.c uses TARGET_PTRMEMFUNC_VBIT_LOCATION but doesn't include 
$target-protos.h


- tree.c also uses FUNCTION_BOUNDARY directly without including 
$target-protos.h


- Probably there are many other target macros that potentially have 
similar issues


- Presumably this means everything referenced in the expansion of any 
target macro in $target.h also needs to be declared in $target.h and not 
depend on $target-protos.h also being included at the point of use.


So what is the purpose of having a separate $target-protos.h?

-Sandra the confused





TARGET_HELP: --target-help vs. --help=target

2011-02-23 Thread Georg-Johann Lay
Hi, I noticed a difference between --target-help and --help=target

Whilst --target-help also calls target hooh TARGET_HELP to display
additional information, using --help=target does not.

This is in sync with the documentation, but appears a bit odd. It this
a desired behaviour and is there a specific rationale to not display
all information with --help=target?

Thanks, Johann


How working with bugzilla?

2011-02-23 Thread Georg-Johann Lay
Hi, what is needed to work with bugzilla like:

- assigning a bug to me
- confirming a bug (changing status)
- adding affected versions like "known to fail"

My user name is avr at gjlay dot de.

It there a connexion between gcc role like maintainer and bugzilla
access rights?

Thanks, Johann


gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-24 Thread Georg-Johann Lay
Hi, I am trying to track down/fix PR target/45291. The problem is that
pass .subreg1 generates invalid subregs.

According to internals "10.8 Registers and Memory, Normal Subregs"
a normal (non-paradoxical) subreg as Lvalue sets the specifyed subreg
and leaves the remaining part of the target word undefined. This
happens in the following test case

unsigned char func1(void);
void funct2(int l);

void dummy(void)
{
int l = 256 * func1() + func1();
if (l > 256)
return;

func1();
funct2(l);
}

compiled with

avr-gcc -S pr45291.c -O2 -mmcu=atmega8 -dP -da -v

Using built-in specs.
COLLECT_GCC=/home/georg/gcc/install/gcc-4.5/bin/avr-gcc
COLLECT_LTO_WRAPPER=/mnt/nfs/home/georg/gcc/install/gcc-4.5/bin/../libexec/gcc/avr/4.5.3/lto-wrapper
Target: avr
Configured with:
/gnu/source/gcc.gnu.org/branches/gcc-4_5-branch/configure --target=avr
--prefix=/home/georg/gcc/install/gcc-4.5 --enable-languages=c,c++
--disable-libssp --disable-libada --disable-nls --disable-shared
Thread model: single
gcc version 4.5.3 20110223 (prerelease) (GCC)
COLLECT_GCC_OPTIONS='-O2' '-S' '-mmcu=atmega8' '-dP' '-da' '-v'
 /mnt/nfs/home/georg/gcc/install/gcc-4.5/bin/../libexec/gcc/avr/4.5.3/cc1
-quiet -v -imultilib avr4 -iprefix
/mnt/nfs/home/georg/gcc/install/gcc-4.5/bin/../lib/gcc/avr/4.5.3/
pr45291.c -quiet -dumpbase pr45291.c -dP -da -mmcu=atmega8 -auxbase
pr45291 -O2 -version -o pr45291.s
GNU C (GCC) version 4.5.3 20110223 (prerelease) (avr)
compiled by GNU C version 4.3.2 [gcc-4_3-branch revision
141291], GMP version 4.3.1, MPFR version 2.4.2, MPC version 0.8.1
[...]
COLLECT_GCC_OPTIONS='-O2' '-S' '-mmcu=atmega8' '-dP' '-da' '-v'

==

In pass .148r.jump everything looks fine:

(note 2 3 6 2 NOTE_INSN_FUNCTION_BEG)

(call_insn 6 2 7 2 pr45291.c:6 (set (reg:QI 24 r24)
(call (mem:HI (symbol_ref:HI ("func1") [flags 0x41]
) [0 S2 A8])
(const_int 0 [0x0]))) 123 {call_value_insn} (nil)
(nil))

(insn 7 6 9 2 pr45291.c:6 (set (reg:QI 41 [ D.1925 ])
(reg:QI 24 r24)) 4 {*movqi} (nil))

(call_insn 9 7 10 2 pr45291.c:6 (set (reg:QI 24 r24)
(call (mem:HI (symbol_ref:HI ("func1") [flags 0x41]
) [0 S2 A8])
(const_int 0 [0x0]))) 123 {call_value_insn} (nil)
(nil))

(insn 10 9 11 2 pr45291.c:6 (set (reg:QI 44 [ D.1928 ])
(reg:QI 24 r24)) 4 {*movqi} (nil))

(insn 11 10 12 2 pr45291.c:6 (set (reg:HI 49)
(zero_extend:HI (reg:QI 41 [ D.1925 ]))) 94 {zero_extendqihi2}
(nil))

(insn 12 11 13 2 pr45291.c:6 (set (reg:HI 50)
(ashift:HI (reg:HI 49)
(const_int 8 [0x8]))) 68 {ashlhi3} (nil))

(insn 13 12 14 2 pr45291.c:6 (set (reg:HI 51)
(zero_extend:HI (reg:QI 44 [ D.1928 ]))) 94 {zero_extendqihi2}
(nil))

(insn 14 13 15 2 pr45291.c:6 (set (reg/v:HI 46 [ l ])
(plus:HI (reg:HI 50)
(reg:HI 51))) 22 {*addhi3} (nil))




Then .149r.subreg1 generates insns 33 and 34 with insn 34 nullifying
insn 33.


(note 2 3 6 2 NOTE_INSN_FUNCTION_BEG)

(call_insn 6 2 7 2 pr45291.c:6 (set (reg:QI 24 r24)
(call (mem:HI (symbol_ref:HI ("func1") [flags 0x41]
) [0 S2 A8])
(const_int 0 [0x0]))) 123 {call_value_insn} (nil)
(nil))

(insn 7 6 9 2 pr45291.c:6 (set (reg:QI 41 [ D.1925 ])
(reg:QI 24 r24)) 4 {*movqi} (nil))

(call_insn 9 7 10 2 pr45291.c:6 (set (reg:QI 24 r24)
(call (mem:HI (symbol_ref:HI ("func1") [flags 0x41]
) [0 S2 A8])
(const_int 0 [0x0]))) 123 {call_value_insn} (nil)
(nil))

(insn 10 9 31 2 pr45291.c:6 (set (reg:QI 44 [ D.1928 ])
(reg:QI 24 r24)) 4 {*movqi} (nil))

(insn 31 10 32 2 pr45291.c:6 (set (reg:QI 54)
(reg:QI 41 [ D.1925 ])) 4 {*movqi} (nil))

(insn 32 31 33 2 pr45291.c:6 (set (reg:QI 55 [+1 ])
(const_int 0 [0x0])) 4 {*movqi} (nil))

(insn 33 32 34 2 pr45291.c:6 (set (subreg:QI (reg:HI 50) 1)
(reg:QI 54)) 4 {*movqi} (nil))

(insn 34 33 13 2 pr45291.c:6 (set (subreg:QI (reg:HI 50) 0)
(const_int 0 [0x0])) 4 {*movqi} (nil))

(insn 13 34 14 2 pr45291.c:6 (set (reg:HI 51)
(zero_extend:HI (reg:QI 44 [ D.1928 ]))) 94 {zero_extendqihi2}
(nil))

(insn 14 13 15 2 pr45291.c:6 (set (reg/v:HI 46 [ l ])
(plus:HI (reg:HI 50)
(reg:HI 51))) 22 {*addhi3} (nil))


Insn 33 survives until .188r.ira and is kicked out there. So despite
my comment in the original bug report where I suspected avr backend to
emit the bad subregs, it is actually a middle-end bug, right?

Target avr suffers from similar problems in 4.3, 4.4 and 4.6 which are
all reported against avr backend, not against middleend/rtl optimizers.



PR46779 shows the bug for avr-gcc-4.4.6 (prerelease)

Thanks, Johann








Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-25 Thread Georg Johann Lay

Paul Koning schrieb:

On Feb 24, 2011, at 12:46 PM, Eric Botcazou wrote:



Maybe the misunderstanding occurs when the mode of the subreg is
 less than word_size?  It would certainly make sense that a
subreg write of less than word_size leaves the bits undefined.

ie, if word_size is SImode and we had a write to

(subreg:HI (reg:SI) 0)

Then the upper bits are left undefined.


Contrast that to the same RTL, but word_size is HImode

(subreg:HI (reg:SI) 0)

Would leave the upper bits untouched.


That's the conventional wisdom I think.  Sub-word and word-sized 
subregs don't behave the same, in particular when they are on the 
LHS of an assignment.



That's certainly what the manual says.



I still don't understand.

The manual says:

http://gcc.gnu.org/onlinedocs/gccint/Regs-and-Memory.html#Regs-and-Memory

When storing to a normal subreg that is smaller than a word, the 
other bits of the referenced word are usually left in an undefined 
state. This laxity makes it easier to generate efficient code for 
such instructions. To represent an instruction that preserves all the

bits outside of those in the subreg, use strict_low_part or
zero_extract around the subreg.


What does "word" mean here? Is it a 32-bit entity or is it according to 
word_mode which is QImode for avr?


In the latter case, all code generated by subreg lowering and RTL 
lowering (which expands DI operation to bulk of (set (subreg:QI (DI)) as 
avr backend does not have DI-insns) is fine and I will have to dig in 
IRA for PR45291, PR46779 and maybe others.


In the first case all code that RTL lowering produces for 64-bit stuff 
would be incorrect which is hard to imagine...


Some lines after that, internals give an example for HI subregs of PSI:

> If "UNITS_PER_WORD <= 2" then these two subregs:
>  (subreg:HI (reg:PSI 0) 0)
>  (subreg:HI (reg:PSI 0) 2)
> represent independent 2-byte accesses that together span the whole of
> "(reg:PSI 0)". Storing to the first subreg does not affect the value
> of the second, and vice versa.

So the same should be true for QI-subregs of scalar modes if 
UNITS_PER_WORT = 1. Right?


Georg-Johann


Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-25 Thread Georg Johann Lay

Eric Botcazou schrieb:

What does "word" mean here? Is it a 32-bit entity or is it according to
word_mode which is QImode for avr?


The latter, it is machine-dependent.



So the same should be true for QI-subregs of scalar modes if
UNITS_PER_WORT = 1. Right?


Right.


Thanks for that definite clarification.

Internals are sometimes confusing, incomplete or outdated, so I prefere 
to reassure me here. Thanks again :-)


Georg-Johann





Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-26 Thread Georg-Johann Lay

Denis Chertykov schrieb:

2011/2/26 Georg Johann Lay 


Eric Botcazou schrieb:


What does "word" mean here? Is it a 32-bit entity or is it according to
word_mode which is QImode for avr?


The latter, it is machine-dependent.




So the same should be true for QI-subregs of scalar modes if
UNITS_PER_WORT = 1. Right?


Right.


Thanks for that definite clarification.



As I understand PR46779 and PR45291 no longer an AVR port bugs.
And the following comment is not a true anymore:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45291#c2
Am I right ?


At least the analysis I gave there is incorrect.

I remember a problem in avr backend concerning the frame pointer because 
QImode was allowed in FP's hard regs... that sort of problem. Was that 
fixed here or just in some distribution like WinAVR? So perhaps it's 
still this FP problem?


Johann


Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-26 Thread Georg-Johann Lay

Georg-Johann Lay schrieb:

Denis Chertykov schrieb:


2011/2/26 Georg Johann Lay 


Eric Botcazou schrieb:

What does "word" mean here? Is it a 32-bit entity or is it 
according to

word_mode which is QImode for avr?



The latter, it is machine-dependent.




So the same should be true for QI-subregs of scalar modes if
UNITS_PER_WORT = 1. Right?



Right.



Thanks for that definite clarification.




As I understand PR46779 and PR45291 no longer an AVR port bugs.
And the following comment is not a true anymore:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45291#c2
Am I right ?



At least the analysis I gave there is incorrect.

I remember a problem in avr backend concerning the frame pointer because 
QImode was allowed in FP's hard regs... that sort of problem. Was that 
fixed here or just in some distribution like WinAVR? So perhaps it's 
still this FP problem?


Ok, here it is:

http://gcc.gnu.org/viewcvs?view=revision&revision=123263

But now there is the problem that we have

(insn 33 35 34 2 foo.c:5 (set (subreg:QI (reg:HI 50 [ D.1955 ]) 1)
(reg:QI 41 [ D.1955 ])) ...

and reg 50 is allowed to get in R29:R28 (frame pointer regs) but the 
subreg is not, i.e. because QI is not allowed in FP regs, it is not 
possible to set low/high part seperately.


Johann


Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-26 Thread Georg-Johann Lay

Georg-Johann Lay schrieb:

Denis Chertykov schrieb:


2011/2/26 Georg Johann Lay 


Eric Botcazou schrieb:

What does "word" mean here? Is it a 32-bit entity or is it 
according to

word_mode which is QImode for avr?



The latter, it is machine-dependent.




So the same should be true for QI-subregs of scalar modes if
UNITS_PER_WORT = 1. Right?



Right.



Thanks for that definite clarification.




As I understand PR46779 and PR45291 no longer an AVR port bugs.
And the following comment is not a true anymore:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45291#c2
Am I right ?



At least the analysis I gave there is incorrect.

I remember a problem in avr backend concerning the frame pointer because 
QImode was allowed in FP's hard regs... that sort of problem. Was that 
fixed here or just in some distribution like WinAVR? So perhaps it's 
still this FP problem?


Ok, this is the patch I meant:

http://gcc.gnu.org/viewcvs?view=revision&revision=86842

it allows just Pmode in r29:r28 because of some spill failures in 
PR15417 and PR12017.


Johann



Re: gcc-4.5/4.4: Bug in .subreg1 pass?

2011-02-28 Thread Georg-Johann Lay

Denis Chertykov schrieb:

2011/2/26 Georg-Johann Lay :


Ok, this is the patch I meant:

http://gcc.gnu.org/viewcvs?view=revision&revision=86842

it allows just Pmode in r29:r28 because of some spill failures in PR15417
and PR12017.


It was a stupid workaround.
I think that the problem exists anyway because it's not a port problem.

Denis.


PR41894 is yet another variation of the problem.

Removong the restricting code like so

int
avr_hard_regno_mode_ok (int regno, enum machine_mode mode)
{
  /* Disallow QImode in stack pointer regs.  */
  if ((regno == REG_SP || regno == (REG_SP + 1)) && mode == QImode)
return 0;

-  /* The only thing that can go into registers r28:r29 is a Pmode.  */
-  if (regno == REG_Y && mode == Pmode)
-return 1;
-
-  /* Otherwise disallow all regno/mode combinations that span r28:r29.
-  */
-  if (regno <= (REG_Y + 1) && (regno + GET_MODE_SIZE (mode)) >= (REG_Y 
+ 1))

-return 0;
-
  if (mode == QImode)
return 1;

  /* Modes larger than QImode occupy consecutive registers.  */
  if (regno + GET_MODE_SIZE (mode) > FIRST_PSEUDO_REGISTER)
return 0;

  /* All modes larger than QImode should start in an even register.  */
  return !(regno & 1);
}

leads to correct code.

But I don't understand enough of reload/inerts of fp elimination to 
estimate all undesired side effects.


Maybe someone with more insight in reload can comment on this issue?

Johann


RFC: [4.7] Adding CUMULATIVE_ARGS to targetm.calls.function_ok_for_sibcall?

2011-03-03 Thread Georg-Johann Lay
Is it ok to extend targetm.function_ok_for_sibcall so that it passes
also a pointer to the callee's CUMULATIVE_ARGS structure?

In some situation it is quite tedious to recompute information like
which hard regs are used to pass arguments from the passed trees (call
expression resp., decl). This information in readily available in
args_so_far.

Ports like s390 or avr could avoid recomputation of information just
computed. (avr does not implement tail calls yet).

Patch lined out below (not yet patched any backends).

Technically, shall docs patch be against doc/tm.texi or doc/tm.texi.in?

Johann

--

Index: doc/tm.texi.in
===
--- doc/tm.texi.in  (revision 170651)
+++ doc/tm.texi.in  (working copy)
@@ -4889,7 +4889,14 @@ the function prologue.  Normally, the pr
 @hook TARGET_FUNCTION_OK_FOR_SIBCALL
 True if it is ok to do sibling call optimization for the specified
 call expression @var{exp}.  @var{decl} will be the called function,
-or @code{NULL} if this is an indirect call.
+or @code{NULL} if this is an indirect call. The argument @var{cum}
+points to the @code{CUMULATIVE_ARGS} data structure of the called
+function. This hook runs just before the last call to
+@code{FUNCTION_ARG} resp. @code{FUNCTION_INCOMING_ARG} with
+@code{mode=VOIDmode}.  @var{cum} serves informational purposes like,
+e.g. what hard registers are used to pass arguments to the
+callee (which might be tedious to recompute from @var{exp} or
+@var{decl} alone).

 It is not uncommon for limitations of calling conventions to prevent
 tail calls to functions outside the current unit of translation, or
Index: target.def
===
--- target.def  (revision 170651)
+++ target.def  (working copy)
@@ -1424,7 +1424,7 @@ DEFHOOK
 DEFHOOK
 (function_ok_for_sibcall,
  "",
- bool, (tree decl, tree exp),
+ bool, (tree decl, tree exp, CUMULATIVE_ARGS *cum),
  hook_bool_tree_tree_false)

 /* Establish appropriate back-end context for processing the function
Index: calls.c
===
--- calls.c (revision 170651)
+++ calls.c (working copy)
@@ -2323,7 +2323,7 @@ expand_call (tree exp, rtx target, int i
 #endif
   /* Check whether the target is able to optimize the call
 into a sibcall.  */
-  || !targetm.function_ok_for_sibcall (fndecl, exp)
+  || !targetm.function_ok_for_sibcall (fndecl, exp, &args_so_far)
   /* Functions that do not return exactly once may not be sibcall
 optimized.  */
   || (flags & (ECF_RETURNS_TWICE | ECF_NORETURN))


Re: RFC: [4.7] Adding CUMULATIVE_ARGS to targetm.calls.function_ok_for_sibcall?

2011-03-03 Thread Georg-Johann Lay

Joern Rennecke wrote:

Quoting Georg-Johann Lay :


Is it ok to extend targetm.function_ok_for_sibcall so that it passes
also a pointer to the callee's CUMULATIVE_ARGS structure?



CUMULATIVE_ARGS is a target-dependent type, and thus every use of it
in the interface of target hooks should be considered a bug.
See PR46500.


Thanks for pointing me to that thread.

So the question is if such an extension would be okay in principle after 
cumulative_args_t, or whatever it will eventually be, has found it's way 
into mainline?


Johann





Why IRA stores in frame, not in callee-saved reg?

2011-03-07 Thread Georg-Johann Lay
In current trunk (r170704), 4.4-branch and 4.5-branch I observe the
following optimization issue in IRA: It saves regs in the frame
instead of in callee-saved registers which would be much smarter.

In the following C source, foo2 is compiled as desired (push/pop r17
to save r24). In foo1 and foo3 r24 is saved in the frame. The old
lreg/greg allocator of 4.3-branch generates fine code for all functions.

Saving a reg in the frame should only be done if running out of hard
registers because setting up frame(pointer) and accessing frame is
very expensive on avr.

Maybe someone can give me a hint what's going wrong.

gcc configured

/gnu/source/gcc.gnu.org/trunk/configure --target=avr --prefix=...
--enable-languages=c,c++ --disable-libssp --disable-libada
--disable-nls --disable-shared

and sources compiled

with -Os -mmcu=atmega8 -c -dp -da -fira-verbose=100

/*/
void bar0 (void);
void bar1 (char);

void foo1 (char x)
{
bar0();
bar1(x);
}


char foo2 (char x)
{
bar1(x);

return x;
}

char foo3 (char x)
{
bar0();

return x;
}
/*/

FYI, I attached IRA dumps and asm output

As far I can see target avr gives appropriate costs for memory and
register moves.

IRA printout is as follows:

Thanks for any hints on that!

Johann

===

Building IRA IR

Pass 0 for finding pseudo/allocno costs

a0 (r42,l0) best GENERAL_REGS, cover GENERAL_REGS

  a0(r42,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:4000


Pass 1 for finding pseudo/allocno costs

r42: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

  a0(r42,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:4000

   Insn 10(l0): point = 0
   Insn 9(l0): point = 2
   Insn 7(l0): point = 4
   Insn 2(l0): point = 6
 a0(r42): [3..6]
Compressing live ranges: from 9 to 2 - 22%
Ranges after the compression:
 a0(r42): [0..1]
+++Allocating 0 bytes for conflict table (uncompressed size 4)
;; a0(r42,l0) conflicts:  regions=1, blocks=3, points=2
allocnos=1 (big 0), copies=0, conflicts=0, ranges=1

 Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r42
modified regnos: 42
border:
Pressure: GENERAL_REGS=1
Reg 42 of GENERAL_REGS has 6 regs less
  Pushing a0(r42,l0)
  Popping a0(r42,l0)  -- assign reg 24
Disposition:
0:r42  l024
New iteration of spill/restore move
+++Costs: overall -4000, reg -4000, mem 0, ld 0, st 0, move 0
+++   move loops 0, new jumps 0

===

Building IRA IR

Pass 0 for finding pseudo/allocno costs

a0 (r43,l0) best GENERAL_REGS, cover GENERAL_REGS

  a0(r43,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:6000


Pass 1 for finding pseudo/allocno costs

r43: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

  a0(r43,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:6000

   Insn 16(l0): point = 0
   Insn 13(l0): point = 2
   Insn 8(l0): point = 4
   Insn 7(l0): point = 6
   Insn 2(l0): point = 8
 a0(r43): [3..8]
Compressing live ranges: from 11 to 2 - 18%
Ranges after the compression:
 a0(r43): [0..1]
+++Allocating 0 bytes for conflict table (uncompressed size 4)
;; a0(r43,l0) conflicts:  regions=1, blocks=3, points=2
allocnos=1 (big 0), copies=0, conflicts=0, ranges=1

 Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r43
modified regnos: 43
border:
Pressure: GENERAL_REGS=2
Reg 43 of GENERAL_REGS has 7 regs less
  Pushing a0(r43,l0)
  Popping a0(r43,l0)  -- assign reg 17
Disposition:
0:r43  l017
New iteration of spill/restore move
+++Costs: overall 0, reg 0, mem 0, ld 0, st 0, move 0
+++   move loops 0, new jumps 0


===

Building IRA IR

Pass 0 for finding pseudo/allocno costs

a0 (r43,l0) best GENERAL_REGS, cover GENERAL_REGS

  a0(r43,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:4000


Pass 1 for finding pseudo/allocno costs

r43: preferred GENERAL_REGS, alternative NO_REGS, cover GENERAL_REGS

  a0(r43,l0) costs: POINTER_X_REGS:0,0 POINTER_Z_REGS:0,0
BASE_POINTER_REGS:0,0 POINTER_REGS:0,0 ADDW_REGS:0,0
SIMPLE_LD_REGS:0,0 LD_REGS:0,0 NO_LD_REGS:0,0 GENERAL_REGS:0,0 MEM:400

Re: Why IRA stores in frame, not in callee-saved reg?

2011-03-08 Thread Georg-Johann Lay
Georg-Johann Lay schrieb:
> In current trunk (r170704), 4.4-branch and 4.5-branch I observe the
> following optimization issue in IRA: It saves regs in the frame
> instead of in callee-saved registers which would be much smarter.
> 
> In the following C source, foo2 is compiled as desired (push/pop r17
> to save r24). In foo1 and foo3 r24 is saved in the frame. The old
> lreg/greg allocator of 4.3-branch generates fine code for all functions.
> 
> Saving a reg in the frame should only be done if running out of hard
> registers because setting up frame(pointer) and accessing frame is
> very expensive on avr.
> 
> Maybe someone can give me a hint what's going wrong.
> 
> gcc configured
> 
> /gnu/source/gcc.gnu.org/trunk/configure --target=avr --prefix=...
> --enable-languages=c,c++ --disable-libssp --disable-libada
> --disable-nls --disable-shared
> 
> and sources compiled
> 
> with -Os -mmcu=atmega8 -c -dp -da -fira-verbose=100
> 
> /*/
> void bar0 (void);
> void bar1 (char);
> 
> void foo1 (char x)
> {
> bar0();
> bar1(x);
> }
> 
> 
> char foo2 (char x)
> {
> bar1(x);
> 
> return x;
> }
> 
> char foo3 (char x)
> {
> bar0();
> 
> return x;
> }
> /*/
> 
> FYI, I attached IRA dumps and asm output
> 
> As far I can see target avr gives appropriate costs for memory and
> register moves.
> 
> IRA printout is as follows:
> 

Returning memory move costs 4 times higher (8 instead of 2) for QI
memory moves, IRA/reload generates code as expected.

Is there any other way than lying about the costs?

IRA doues not take into account costs implied by generating new stack
slots and setting up frame pointer. AFAIK there is no hook to
influence that. How can a target describe costs generated by setting
up stack slots and accessing them?

Johann


Re: Why IRA stores in frame, not in callee-saved reg?

2011-03-09 Thread Georg-Johann Lay

Vladimir Makarov schrieb:

On 03/08/2011 06:36 AM, Georg-Johann Lay wrote:


Georg-Johann Lay schrieb:


In current trunk (r170704), 4.4-branch and 4.5-branch I observe the
following optimization issue in IRA: It saves regs in the frame
instead of in callee-saved registers which would be much smarter.

In the following C source, foo2 is compiled as desired (push/pop r17
to save r24). In foo1 and foo3 r24 is saved in the frame. The old
lreg/greg allocator of 4.3-branch generates fine code for all functions.

Saving a reg in the frame should only be done if running out of hard
registers because setting up frame(pointer) and accessing frame is
very expensive on avr.

Maybe someone can give me a hint what's going wrong.

gcc configured

/gnu/source/gcc.gnu.org/trunk/configure --target=avr --prefix=...
--enable-languages=c,c++ --disable-libssp --disable-libada
--disable-nls --disable-shared

and sources compiled

with -Os -mmcu=atmega8 -c -dp -da -fira-verbose=100

/*/
void bar0 (void);
void bar1 (char);

void foo1 (char x)
{
 bar0();
 bar1(x);
}


char foo2 (char x)
{
 bar1(x);

 return x;
}

char foo3 (char x)
{
 bar0();

 return x;
}
/*/

FYI, I attached IRA dumps and asm output

As far I can see target avr gives appropriate costs for memory and
register moves.

IRA printout is as follows:


Returning memory move costs 4 times higher (8 instead of 2) for QI
memory moves, IRA/reload generates code as expected.

Is there any other way than lying about the costs?

IRA doues not take into account costs implied by generating new stack
slots and setting up frame pointer. AFAIK there is no hook to
influence that. How can a target describe costs generated by setting
up stack slots and accessing them?


First of all, defining equal costs for moving into memory and into 
register (2 in both cases) is wrong way to direct IRA.


In case of test1, pseudo 42 is in two insns involving hard register 24.  
Therefore its cost is decreased in ira-costs.c.  After that it is 
increased on the same value because the pseudo intersects one call and 
the cost of ld/st (for save/restore) is the same as for moving its value 
into a general register.  Because r24 is the first in the hard register 
allocation order, IRA chooses r24.


So making ld/st a bit more costly than hard register moving would solve 
the problem.  Unfortunately, it does not happens because the two move 
insns involving p42 and hard register 24 are taken into account twice 
(once in ira-costs.c and another one ira-conflicts.c:process_regs_for 
copy).


The following patch would solve the problem.  I'll submit it when gcc is 
in stage 1.  The patch looks harmless but changes in ira-costs.c 
frequently trigger reload failures.  Therefore the patch needs thorough 
testing and stage1 is best time to do it.


Thanks for that patch. With that patch, slightly increasing memory costs 
(2 instead of 4 without the patch) is sufficient to get good code.


Isn't reload supposed to word in all situations or for different cost 
setups?  Sounds as if that is working around problems in reload.


Thanks for pointing the problem.  It helped to find some cost 
calculation pitfall  which was probably introduced by an IRA change in 
ira-costs.c inconsistent with the code in ira-conflicts.c.


Maybe you like to take care of
http://gcc.gnu.org/ml/gcc-patches/2011-02/msg01443.html
in stage 1? It takes into account LOCAL_REGNO in 
ira-color.c:assign_hard_reg.


Johann


Index: ira-conflicts.c
===
--- ira-conflicts.c(revision 170786)
+++ ira-conflicts.c(working copy)
@@ -432,8 +432,7 @@ process_regs_for_copy (rtx reg1, rtx reg
   rclass = REGNO_REG_CLASS (allocno_preferenced_hard_regno);
   mode = ALLOCNO_MODE (a);
   cover_class = ALLOCNO_COVER_CLASS (a);
-  if (only_regs_p && insn != NULL_RTX
- && reg_class_size[rclass] <= (unsigned) CLASS_MAX_NREGS (rclass, mode))
+  if (only_regs_p && insn != NULL_RTX)
 /* It is already taken into account in ira-costs.c.  */
 return false;
   index = 
ira_class_hard_reg_index[cover_class][allocno_preferenced_hard_regno];





Re: Using cc (question from avr)

2011-03-09 Thread Georg-Johann Lay

Paulo J. Matos schrieb:

Hi,

I am having some trouble really understanding the working of cc_status.
In order to understand it better I was looking at the code for avr under 
gcc 4.3.


My assumption is that set_zn, set_* means that an instructions _changes_ 
these flags. So an instruction that set_zn means that Z and N are 
modified but we are not sure to which values. I assume clobber means 
that an instruction changes some flags but we don't know which.


set_* tells you which condition code flags have been set in a usable 
way, e.g. to be used in a subsequent branch instruction. For example, if 
* contains 'z' you can use the Z-Flag to test if the value just set is 
equal to zero or not. Look at "andqi3" insn which sets Z and N, but 
"andhi3" just sets N or clobbers CC. Z just reflects the andi-action of 
the high byte of the result, so you cannot use Z-flag to test wether or 
not the result is zero or not. However, for alternatives 0 and 2, you 
can use N to test the result (interpreted as signed) for <0 resp. >= 0.
'clobber' means the instruction leaves CC in a mess. 'none' means CC is 
unchanged.


Now, the first thing that surprises me is clobber. Given a processor 
instruction description we know exactly which instructions set what. 
What's the need for clobber?


Check the following example from avr:

(define_insn "*strlenhi"
  [(set (match_operand:HI 0 "register_operand" "=e")
(unspec:HI [(mem:BLK (match_operand:HI 1 "register_operand" "%0"))
(const_int 0)
(match_operand:HI 2 "immediate_operand" "i")]
   UNSPEC_STRLEN))]
  ""
  "ld __tmp_reg__,%a0+

tst __tmp_reg__

brne .-6"
  [(set_attr "length" "3")
   (set_attr "cc" "clobber")])


 From the instruction manual I have [1] ld changes none, tst changes Z,N 
and brne changes none so I would expect this instruction to have 
(set_attr "cc" "set_zn") instead of the clobber. Why is this?


No. Z will always be set after that insn, even if op0 is not zero. Thus, 
Z has nothing to do with op0, dito for N. Therefore, CC is 'clobber' 
because it is not unchanged.


Johann


Re: avr compilation

2011-03-18 Thread Georg-Johann Lay
Paulo J. Matos schrieb:
> Hi all,
> 
> I am looking at the avr backend in order to try to sort some things out
> on my own backend.
> 
> One of the tests I am doing is by compiling the following:
> int x = 0x1010;
> int y = 0x0101;
> 
> int add(void)
> {
>   return x+y;
> }
> 
> It compiles to (in gcc-4.3.5_avr with -Os)
> add:
> /* prologue: function */
> /* frame size = 0 */
> lds r18,y
> lds r19,(y)+1
> lds r24,x
> lds r25,(x)+1
> add r18,r24
> adc r19,r25
> mov r24,r18
> mov r25,r19
> /* epilogue start */
> ret
> 
> I don't know much avr assembler so bear with me but I would expect this

note that the last moves are two QI moves, the add is HI.

Without splitting HI the moves will disappear, try -fno-split-wide-types.

Johann

> to be written:
> add:
> /* prologue: function */
> /* frame size = 0 */
> lds r18,y
> lds r19,(y)+1
> lds r24,x
> lds r25,(x)+1
> add r24,r18
> adc r25,r19
> /* epilogue start */
> ret
> 
> By inverting the add arguments we save two mov instructions.
> 
> If it can be written like this any ideas on why GCC is avoiding it?

Try newer version of gcc, like 4.5.2
> 
> Cheers,
> 
> -- 
> PMatos



Using secondary reload to reload CONST_INT?

2011-03-20 Thread Georg-Johann Lay

The AVR controller basically has two kinds of hard registers:

* LD_REGS (constraint "d") that can move immediates
* NO_LD_REGS (constraint "l") that cannot move immediates

movsi insn of avr backend does not supply an "l,i" constraint 
alternative, so that reload takes care of that and allocates an 
intermediate SImode d-reg.


The drawback is that this allocates 4 GPRs (AVR is 8-bit machine). 
However, one "d"-reg would be sufficient to copy a const into class "l".


So I  defined TARGET_SECONDARY_RELOAD to take care of input reloads of 
CONST_INT for SImode. The hook return NO_REGS and sets sri->icode to 
insn code of an insn similar to old-fashioned input reload pattern.


All this works fine and as expected, but I have some questions:

1) The internals just mention TARGET_SECONDARY_RELOAD for REG-MEM and
   for REG-REG moves, no word about REG-CONST moves. So is using
   secondary reloads for CONST_INT (or other consts) like outlined
   above a defined use case I can rely on?

2) The secondary reload hook is always called with
   regclass = GENERAL_REGS, even in cases where the class
   is known to be NO_LD_REGS like, e.g. when preparing arguments
   for a function call. Why this? I would expect to get the smallest
   available regclass. If a reg lies in LD_REGS, a secondary reload
   is not needed, but how can I know if class is always GENERAL_REGS?
   Is it ensured that secondary reload hook gets never called when
   a constraint alternative matches, like "d,i"?

3) What is the "unit" of sri->extra_cost? Compared to COST_N_INSNS?
   Or compared to "?" constraint cost?

TARGET_CANNOT_FORCE_CONST_MEM returns always true.

Thanks, Johann


Re: Using secondary reload to reload CONST_INT?

2011-03-21 Thread Georg-Johann Lay
Denis Chertykov schrieb:
> 2011/3/20 Georg-Johann Lay :
>> The AVR controller basically has two kinds of hard registers:
>>
>> * LD_REGS (constraint "d") that can move immediates
>> * NO_LD_REGS (constraint "l") that cannot move immediates
>>
>> movsi insn of avr backend does not supply an "l,i" constraint alternative,
>> so that reload takes care of that and allocates an intermediate SImode
>> d-reg.
> 
> The *movsi pattern:
> (define_insn "*movsi"
>   [(set (match_operand:SI 0 "nonimmediate_operand" "=r,r,r,Qm,!d,r")
> (match_operand:SI 1 "general_operand"   "r,L,Qm,rL,i,i"))]
> 
> Last constraints pair `r' and `i' supply an `l' and `i' because `r' = `dl'.
> 
> Also, to optimize movsi exists the following peephole2:
> (define_peephole2 ; movsi_lreg_const
>   [(match_scratch:QI 2 "d")
>(set (match_operand:SI 0 "l_register_operand" "")
> (match_operand:SI 1 "immediate_operand" ""))
>(match_dup 2)]
>   "(operands[1] != const0_rtx
> && operands[1] != constm1_rtx)"
>   [(parallel [(set (match_dup 0) (match_dup 1))
> (clobber (match_dup 2))])]
>   "")
> 
> 
>> The drawback is that this allocates 4 GPRs (AVR is 8-bit machine).
> 
> Please, provide an example.

Yes, you're right. There is this alternative "r,i" which use
__tmp_reg__ to temporarily backup a d-reg.

So my intention was actually to remove the "r,i" alternative and have
it handled by secondary reload; the original question is if secondary
reload is designed to handle the reg-const combination that will occur
in that case.

The default to load a const needs 10 instructions.

  /* Last resort, better than loading from memory.  */
  *l = 10;
  return (AS2 (mov,__tmp_reg__,r31) CR_TAB
  AS2 (ldi,r31,lo8(%1)) CR_TAB
  AS2 (mov,%A0,r31) CR_TAB
  AS2 (ldi,r31,hi8(%1)) CR_TAB
  AS2 (mov,%B0,r31) CR_TAB
  AS2 (ldi,r31,hlo8(%1))CR_TAB
  AS2 (mov,%C0,r31) CR_TAB
  AS2 (ldi,r31,hhi8(%1))CR_TAB
  AS2 (mov,%D0,r31) CR_TAB
  AS2 (mov,r31,__tmp_reg__));

With a d-reg from secondary reload it's at most 8 instructions.

The current code relies on a d-reg being available in peep2, and the
overhead for HI resp. QI is even higher.

Thus, if no d-reg is at hand in peep2, we get the big code.

A secondary reload would always supply a d-reg because register
allocator would arrange for that and we would not have to rely on peep2.

Moreover, the extra_cost of secondary reload might allow to give more
accurate costs.

Johann



Re: Using secondary reload to reload CONST_INT?

2011-03-21 Thread Georg-Johann Lay

Denis Chertykov schrieb:
>

Please, provide test results.

Denis.


Ok, will take some time.

Johann





  1   2   3   4   >