VRP causing extra register usage

2015-11-12 Thread Senthil Kumar Selvaraj
Hi,

  When analyzing code size differences between a 4.9.x compiler and
  trunk for the AVR target, I found quite a few cases where extra
  registers were being used to hold smaller types (two 8 bit registers
  for a uint8_t, for example).

  On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the point
  at which the dumps start to diverge.

  For code like this

#include 

uint16_t get(const uint16_t wvalue)
{
  const uint8_t type = (wvalue >> 8);
  uint16_t size = 0;

  if (type == 1)
  {
size = 20;
  }
  else if (type == 2)
  {
size = 32;
  }
  return size;
}

  VRP substitutes wvalue for the type variable in the conditionals 
  (simplify_cond_using_ranges:9506), as it figures out that the range 
  of wvalue is the same as a uint8_t). That is, code goes from

:
_3 = wvalue_2(D) >> 8;
type_4 = (const uint8_t) _3;
if (type_4 == 1)
  goto ;
else
  goto ;

to

:
_3 = wvalue_2(D) >> 8;
type_4 = (const uint8_t) _3;
if (_3 == 1)
  goto ;
else
  goto ;

  This "widening" causes later passes to allocate extra registers 
  (holding zeros for the extra bits) and generate extra comparisons
  for the AVR target.

  I found that in the 4.9.2 compiler I was benchmarking against, VRP
  didn't figure out the range for wvalue and therefore the folding
  didn't happen, which was why the code was better.

  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
  32 bits instead of 8 shows the same effect - the generated code uses
  rdi instead of just di to hold the type variable.

  Is this a bug? Should the folding happen only if the type
  conversion was from a smaller type to a bigger one? Or is the backend
  supposed to detect this pattern and deal with it?

Regards
Senthil


details-raw vrp1 dump

;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0, symbol_order=0)

;; 1 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5
;; 2 succs { 5 3 }
;; 3 succs { 4 5 }
;; 4 succs { 5 }
;; 5 succs { 1 }

ASSERT_EXPRs to be inserted

Assertions to be inserted for type_4
if (type_4 == 1)

BB #3
EDGE 2->3 2 [55.9%]  (FALSE_VALUE,EXECUTABLE)
PREDICATE: type_4 ne_expr 1




Updating SSA:
Registering new PHI nodes in block #2
Updating SSA information for statement type_4 = (const uint8_t) _3;
Updating SSA information for statement if (type_4 == 1)
Registering new PHI nodes in block #3
Updating SSA information for statement type_6 = ASSERT_EXPR ;
Updating SSA information for statement if (type_4 == 2)
Registering new PHI nodes in block #4
Registering new PHI nodes in block #5

SSA replacement table
N_i -> { O_1 ... O_j } means that N_i replaces O_1, ..., O_j

type_6 -> { type_4 }
Incremental SSA update started at block: 2
Number of blocks in CFG: 6
Number of blocks to update: 2 ( 33%)
Affected blocks: 2 3



SSA form after inserting ASSERT_EXPRs
get (const uint16_t wvalue, const uint8_t windex, const void * * const address)
{
  uint16_t size;
  const uint8_t type;
  unsigned int _3;

  :
  _3 = wvalue_2(D) >> 8;
  type_4 = (const uint8_t) _3;
  if (type_4 == 1)
goto ;
  else
goto ;

  :
  type_6 = ASSERT_EXPR ;
  if (type_6 == 2)
goto ;
  else
goto ;

  :

  :
  # size_1 = PHI <20(2), 0(3), 32(4)>
  return size_1;

}


Immediate_uses: 

size_1 : --> single use.
return size_1;

wvalue_2(D) : --> single use.
_3 = wvalue_2(D) >> 8;

_3 : --> single use.
type_4 = (const uint8_t) _3;

type_4 : -->3 uses.
type_6 = ASSERT_EXPR ;
type_6 = ASSERT_EXPR ;
if (type_4 == 1)

.MEM_5(D) : --> single use.
# VUSE <.MEM_5(D)>
return size_1;

type_6 : --> single use.
if (type_6 == 2)

Adding destination of edge (0 -> 2) to worklist

Simulating block 2

Visiting statement:
_3 = wvalue_2(D) >> 8;
Intersecting
  [0, 255]
and
  [0, +INF]
to
  [0, 255]
Found new range for _3: [0, 255]
interesting_ssa_edges: adding SSA use in type_4 = (const uint8_t) _3;
marking stmt to be not simulated again

Visiting statement:
type_4 = (const uint8_t) _3;
Found new range for type_4: [0, +INF]
interesting_ssa_edges: adding SSA use in type_6 = ASSERT_EXPR ;
interesting_ssa_edges: adding SSA use in if (type_4 == 1)
marking stmt to be not simulated again

Visiting statement:
if (type_4 == 1)

Visiting conditional with predicate: if (type_4 == 1)

With known ranges
type_4: [0, +INF]

Predicate evaluates to: DON'T KNOW
Adding destination of edge (2 -> 5) to worklist
Adding destination of edge (2 -> 3) to worklist

Simulating block 3

Visiting statement:
type_6 = ASSERT_EXPR ;
Intersecting
  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
and
  [0, +INF]
to
  ~[1, 1]  EQUIVALENCES: { type_4 } (1 elements)
Found new range for type_6: ~[1, 1]
interesting_ssa_edges: adding SSA use in if (type_6 == 2)
marking stmt to be not simulated again

Visiting statement:
if (type_6 == 2)

Visiting conditional with predicate: if (type_6 == 2)

With known ranges
type_6: ~[1,

Re: VRP causing extra register usage

2015-11-13 Thread Senthil Kumar Selvaraj

On Thu, Nov 12, 2015 at 08:37:02PM +0100, Richard Biener wrote:
> On November 12, 2015 8:10:05 PM GMT+01:00, Senthil Kumar Selvaraj 
>  wrote:
> >Hi,
> >
> >  When analyzing code size differences between a 4.9.x compiler and
> >  trunk for the AVR target, I found quite a few cases where extra
> >  registers were being used to hold smaller types (two 8 bit registers
> >  for a uint8_t, for example).
> >
> >On deeper analysis, I found that the VRP pass (gcc/tree-vrp.c) was the
> >point
> >  at which the dumps start to diverge.
> >
> >  For code like this
> >
> >#include 
> >
> >uint16_t get(const uint16_t wvalue)
> >{
> >  const uint8_t type = (wvalue >> 8);
> >  uint16_t size = 0;
> >
> >  if (type == 1)
> >  {
> >size = 20;
> >  }
> >  else if (type == 2)
> >  {
> >size = 32;
> >  }
> >  return size;
> >}
> >
> >  VRP substitutes wvalue for the type variable in the conditionals 
> >  (simplify_cond_using_ranges:9506), as it figures out that the range 
> >  of wvalue is the same as a uint8_t). That is, code goes from
> >
> >:
> >_3 = wvalue_2(D) >> 8;
> >type_4 = (const uint8_t) _3;
> >if (type_4 == 1)
> >  goto ;
> >else
> >  goto ;
> >
> >to
> >
> >:
> >_3 = wvalue_2(D) >> 8;
> >type_4 = (const uint8_t) _3;
> >if (_3 == 1)
> >  goto ;
> >else
> >  goto ;
> >
> >  This "widening" causes later passes to allocate extra registers 
> >  (holding zeros for the extra bits) and generate extra comparisons
> >  for the AVR target.
> >
> >  I found that in the 4.9.2 compiler I was benchmarking against, VRP
> >  didn't figure out the range for wvalue and therefore the folding
> >  didn't happen, which was why the code was better.
> >
> >  With the native compiler on my machine (gcc 5.2 x86_64) - replacing 
> >  uint8_t by uint32_t and uint16_t by uint64_t, and shifting right by 
> >  32 bits instead of 8 shows the same effect - the generated code uses
> >  rdi instead of just di to hold the type variable.
> >
> >  Is this a bug? Should the folding happen only if the type
> >  conversion was from a smaller type to a bigger one? Or is the backend
> >  supposed to detect this pattern and deal with it?
> 
> We should probably avoid widening beyond word_mode (or sth else if there is 
> sth suitable).
> 

Hmm, that does fix this problem. The below patch allows folding only if
the operand size is smaller or equal to a word.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e2393e4..3f5668c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
   innerop = gimple_assign_rhs1 (def_stmt);
 
   if (TREE_CODE (innerop) == SSA_NAME
+&& (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
+  <= GET_MODE_SIZE(word_mode))
  && !POINTER_TYPE_P (TREE_TYPE (innerop)))
{
  value_range *vr = get_value_range (innerop);

However, if the type conversion was from a smaller type to a larger type, and 
the
smaller type was bigger than a word, this patch results in worse code. For 
example.

#include 

uint16_t get(const uint16_t wvalue)
{
  const uint32_t type = (wvalue >> 8);
  uint16_t size = 0;

  if (type == 1)
  {
size = 20;
  }
  else if (type == 2)
  {
size = 32;
  }
  return size;
}

Before the patch, the folding done caused type to be substituted with wvalue, 
and the
smaller size resulted in better code. After the patch, since wvalue is bigger 
than a 
word (for AVR), the folding doesn't happen and registers are allocated to hold 
all 32
bits.

How does the below patch look? It does the folding only if it is beneficial 
i.e., the
value being substituted (innerop) is smaller than or equal the current one 
(op0)?

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index e2393e4..3f5668c 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -9467,6 +9467,8 @@ simplify_cond_using_ranges (gcond *stmt)
   innerop = gimple_assign_rhs1 (def_stmt);
 
   if (TREE_CODE (innerop) == SSA_NAME
+&& (GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(innerop)))
+  <= GET_MODE_SIZE(TYPE_MODE(TREE_TYPE(op0
  && !POINTER_TYPE_P (TREE_TYPE (innerop)))
{
  value_range *vr = get_value_range (innerop);


Regards
Senthil

> Richard.
> 
> >Regards
> >Senthil
> >
> >
> >details-raw vrp1 dump
> >
> >;; Function get (get, funcdef_no=0, decl_uid=1522, cgraph_uid=0,
> >symbol_order=0)
> >
> >;; 1 loops found
> >;;
> >;; Loop 0
> >;;  header

split_live_ranges_for_shrink_wrap and !SHRINK_WRAPPING_ENABLED?

2016-04-18 Thread Senthil Kumar Selvaraj
Hi,

  While tracking down a performance regression for the AVR target from
  4.9.x to trunk, I noticed that failing the SHRINK_WRAPPING_ENABLED
  check in ira.c led to noticeably worse code for the following
  case. That check prevents live range splitting of pseudos containing
  formal args, and between 4.9.x and now, the check was modified from
  just flag_shrink_wrap to now flag_shrink_wrap && targetm.have_simple_return.

#include 

extern int bar(uint32_t, uint16_t);
extern int baz(void);

int foo(uint8_t x, uint32_t y, uint16_t z)
{
   uint8_t status;
   switch(x)
   {
  case 0:
status = bar(y, z); 
if (status == 0)
   status = baz();
break;

   }
   return status;
}

If the live range splitting of pseudos containing formal args is not
done in IRA, then reload uses callee-saved registers (SI r12), resulting in
extra saves and restores in the prologue/epilogue.


(insn 3 2 5 2 (set (reg/v:SI 12 r12 [orig:47 y ] [47])
(reg:SI 20 r20 [ y ])) ../ctrl_access.c:7 95 {*movsi}
 (nil))
(note 5 3 8 2 NOTE_INSN_FUNCTION_BEG)
(insn 8 5 9 2 (set (cc0)
(compare (reg:QI 24 r24 [ x ])
(const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi}
 (nil))
(jump_insn 9 8 13 2 (set (pc)
(if_then_else (ne (cc0)
(const_int 0 [0]))
(label_ref:HI 25)
(pc))) ../ctrl_access.c:9 428 {branch}
 (int_list:REG_BR_PROB 6102 (nil))
 -> 25)
(note 13 9 14 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 14 13 15 3 (set (reg:HI 20 r20)
(reg/v:HI 18 r18 [orig:48 z ] [48])) ../ctrl_access.c:12 83 {*movhi}
 (nil))
(insn 15 14 16 3 (set (reg:SI 22 r22)
(reg/v:SI 12 r12 [orig:47 y ] [47])) ../ctrl_access.c:12 95 {*movsi}
 (nil))


leading to code like

push r12
push r13
push r14
push r15
/* prologue: function */
/* frame size = 0 */
/* stack size = 4 */
.L__stack_usage = 4
movw r12,r20
movw r14,r22
cpse r24,__zero_reg__
rjmp .L2
movw r20,r18
movw r24,r14
movw r22,r12
call bar


Whereas in 4.9.2, live range splitting allows reload to generate far
better RTL, like so


(insn 8 5 9 2 (set (cc0)
(compare (reg:QI 24 r24 [ x ])
(const_int 0 [0]))) ../ctrl_access.c:9 404 {*cmpqi}
 (nil))
(jump_insn 9 8 13 2 (set (pc)
(if_then_else (ne (cc0)
(const_int 0 [0]))
(label_ref:HI 25)
(pc))) ../ctrl_access.c:9 428 {branch}
 (int_list:REG_BR_PROB 6102 (nil))
 -> 25)
(note 13 9 42 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 42 13 14 3 (set (reg/v:SI 22 r22 [orig:48 y ] [48])
(reg/v:SI 20 r20 [orig:48 y ] [48])) 95 {*movsi}
 (nil))
(insn 14 42 16 3 (set (reg:HI 20 r20)
(reg/v:HI 18 r18 [orig:49 z ] [49])) ../ctrl_access.c:12 83 {*movhi}
 (nil))


leading to code like

cpse r24,__zero_reg__
rjmp .L2
movw r24,r22
movw r22,r20
movw r20,r18
call bar


In either case, shrink wrapping isn't done in the end, but the live
range splitting ends up helping reload, AFAICT.

I verified that adding simple_return pattern to avr.md fixes the
regression, but the code size increase if shrink wrapping
really does happen would badly hurt a flash constrained target like the
AVR. I also noticed that targets like sh and arm (for thumb) disable it when
optimizing for size (using the condition part in simple_return pattern)
- understandable.

What do you guys think is the right way to deal with this? Should I look
into why range splitting helps reload and have reload itself handle this?

Regards
Senthil



Re: show size of stack needed by functions

2016-05-09 Thread Senthil Kumar Selvaraj

Eric Botcazou writes:

>> Output of -fstack-usage is not accurate
>> ===
>> 
>> This article mentions a "call cost":
>> https://mcuoneclipse.com/2015/08/21/gnu-static-stack-usage-analysis/
>> 
>> I checked for myself, by looking at the change of the stackpointer with a
>> debugger, and, yes, there seems to be a constant mismatch (2 bytes with
>> avr-gcc-5.3) between change of stack pointer and output of -fstack-usage.
>> In some rare cases there are more differences, which I didn't understand
>> yet.
>
> That's a bug, very likely in the AVR back-end, which must be fixed by someone 
> who knows the AVR architecture.

I'll take a look.

Regards
Senthil


Re: How to improve the location of a gcc diagnostic

2016-06-23 Thread Senthil Kumar Selvaraj

David Malcolm writes:

> A user filed a bug about a bad location in a warning.  It was marked as
> an "easyhack" in bugzilla, and I had a go at fixing it.
>
> I though it may be useful for new GCC developers if I document what I
> did to fix it.
>
> FWIW, the bug was PR c/71610
>   i.e. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71610)
> ("Improve location for "warning: ISO C restricts enumerator values to
> range of 'int' [-Wpedantic]"?").
>
>
> Step 1: create a minimal reproducer for the diagnostic
>
> This was easy for this bug: I copied it from the bug report.  Try to
> avoid #include if possible; start with the simplest possible reproducer
> (to minimize effort in the debugger), and work from there.
>
> If you're working on a bug in our bugzilla it's good to click on the
> "take" button next to "Assignee" to mark yourself as assignee, and to
> set the status to ASSIGNED, so that we don't duplicate effort.
>
>
> Step 2: create a file under the relevant subdirectory of gcc/testsuite.
>
> This will be under one of gcc.dg, g++.dg or c-c++-common. If possible,
> create it under c-c++-common so that we can test both the C and C++
> frontends.
>
> I created a file called "pr71610.c" under c-c++-common.  (If this was
> for a specific warning, the name of the warning may have been better,
> but my case was part of "-Wpedantic", which is too broad for a testcase
> title).
>
>
> Step 3: run the reproducer under DejaGnu
>
> With a freshly built gcc, I ran the following from the "gcc" build
> directory:
>
>   make -jN -k && make check-gcc RUNTESTFLAGS="-v -v dg.exp=pr71610*"
>

Worth mentioning that you need to configure/make with CXXFLAGS or BOOT_*
flags set to -O0 -g3 so that breakpoints actually work?

Regards
Senthil


Re: ubsan and Dejagnu not having a big enough buffer in some cases

2016-07-20 Thread Senthil Kumar Selvaraj

Richard Biener writes:

> On July 20, 2016 2:01:18 AM GMT+02:00, Andrew Pinski  
> wrote:
>>Hi,
>>  I noticed that ubsan testsuite sometimes has failures due to dejagnu
>>buffer gets full and we no longer match on the output any more.
>>As you can see from the .log file:
>>/data1/jenkins/workspace/BuildThunderX_native_gcc_6/gcc/gcc/testsuite/c-c++-common/ubsan/float-cast-overflow-1.c:88:3:
>>runtime error: value iPASS: c-c++-common/ubsan/float-cast-overflow-1.c
>>  -O0  execution test
>>
>>This looks like I am using a much bigger path name than what most
>>people use which is why I am seeing it fail.  Is there a way to
>>increase the buffer for dejagnu/expect so dg-output matches?  Looks
>>like it is limited to 500k (if I read dejagnu code correctly).
>>
>>Or can/should we split up float-cast-overflow-1.c instead?
>
> I see this for some of the larger C frontend tests with lots of expected 
> errors/warnings as well.

Are you guys getting this everytime or is it sporadic?

I recently ran into something similar, except that there were more parts
involved (Windows subsystem for Linux, wrapper process for gcc etc..)
and that it was random. After messing around with stdout/stderr buffer
sizes and finding they didn't really help, I eventually ran strace to
figure out what the heck was happening. The runtest process stops when a
read syscall gets EIO, but if it so happens that there's buffered data that's
already been read and not written yet, they get dropped.

9631  read(8, "\r\n/gcc/gcc/testsuite/gcc.dg/c99-vla-jump-1.c: 
I..e/gcc.dg/c99-vl", 4096) = 4096
9631  read(8, "a-jump-1.c:300:124: note...ote: 'f' declared here\r\r\n", 4096) 
= 1380
9631  write(7, "1.c:283:124: note: la..er with var", 4096) = 4096
9631  write(7, "iably modified typ..99-vla-jump-1.c:300:163: error: j", 1905) = 
1905
...
9631  write(4, "\0", 1) = 1
...
9631  read(8, 0x13a49b8, 4096)  = -1 EIO (Input/output error)
9631  fcntl(8, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
9631  fcntl(8, F_SETFL, O_RDWR) = 0
9631  fcntl(8, F_GETFL) = 0x2 (flags O_RDWR)
9631  close(8)  = 0
9631  open("/dev/null", O_RDONLY)   = 8
9631  fcntl(8, F_SETFD, FD_CLOEXEC) = 0
9631  fcntl(8, F_SETFD, FD_CLOEXEC) = 0
9631  wait4(9691, [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0, NULL) = 9691
9631  close(8)  = 0
...
9631  write(7, "compiler exited with status 1\n", 30) = 30

The 1380 bytes read in the last read call end up getting dropped on the
floor. 

Andrew, is the 500K limit you see in remote.exp:standard_wait?

Regards
Senthil


[avr] fno-caller-saves and regression tests

2016-08-08 Thread Senthil Kumar Selvaraj
Hi Johann,

  Turning off -fcaller-saves for AVR makes the testcase I had for PR 71873
  pass unless I explicitly add -fcaller-saves to force the compiler to
  generate the triggering insn patterns.

  Wonder if we should modify the existing test cases in gcc.target/avr to
  be tested both ways (with and without caller saves)? At least the
  register allocation related ones probably won't catch regressions.

Regards
Senthil



Worse code after bbro?

2016-12-21 Thread Senthil Kumar Selvaraj
Hi,

  For this C code (slightly modified from PR 30908)

void wait(int i)
{
while (i-- > 0)
asm volatile("nop" ::: "memory");
}

  gcc 4.8 at -Os produces

jmp .L2
.L3:
nop
decl%edi
.L2:
testl   %edi, %edi
jg  .L3
ret

whereas gcc trunk (and 4.9 onwards, from a quick check) produces

.L2:
testl   %edi, %edi
jle .L5
nop
decl%edi
jmp .L2
.L5:
ret

The code size is identical, but the trunk version executes one more
instruction everytime the loop runs (explicit jump to .L5 with trunk vs
fallthrough with 4.8) - it's faster only if the loop never runs. This
happens irrespective of the memory clobber inline assembler statement.

Digging into the dump files, I found that the transformation occurs in
the bb reorder pass, when it calls cfg_layout_initialize, which
eventually calls try_redirect_by_replacing_jump with in_cfglayout set to
true. That function then removes the jump and causes the RTL
transformation that eventually results in slower code.

Is this intentional? If not, what would be the best way to fix this?

Regards
Senthil

RTL before and after bbro.

Before:

(jump_insn 24 6 25 2 (set (pc)
(label_ref 15)) "pr30908.c":3 678 {jump}
 (nil)
 -> 15)
(barrier 25 24 17)
(code_label 17 25 12 3 3 "" [1 uses])
(note 12 17 13 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 13 12 14 3 (parallel [
(asm_operands/v ("nop") ("") 0 []
 []
 [] pr30908.c:4)
(clobber (mem:BLK (scratch) [0  A8]))
(clobber (reg:CCFP 18 fpsr))
(clobber (reg:CC 17 flags))
]) "pr30908.c":4 -1
 (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 14 13 15 3 (parallel [
(set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) 210 {*addsi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(code_label 15 14 16 4 2 "" [1 uses])
(note 16 15 18 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 18 16 19 4 (set (reg:CCNO 17 flags)
(compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
 (nil))
(jump_insn 19 18 30 4 (set (pc)
(if_then_else (gt (reg:CCNO 17 flags)
(const_int 0 [0]))
(label_ref 17)
(pc))) "pr30908.c":3 646 {*jcc_1}
 (expr_list:REG_DEAD (reg:CCNO 17 flags)
(int_list:REG_BR_PROB 8500 (nil)))
 -> 17)
(note 30 19 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708 {simple_return_internal}
 (nil)
 -> simple_return)

After:


(code_label 15 6 16 3 2 "" [1 uses])
(note 16 15 18 3 [bb 3] NOTE_INSN_BASIC_BLOCK)
(insn 18 16 19 3 (set (reg:CCNO 17 flags)
(compare:CCNO (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(const_int 0 [0]))) "pr30908.c":3 3 {*cmpsi_ccno_1}
 (nil))
(jump_insn 19 18 12 3 (set (pc)
(if_then_else (le (reg:CCNO 17 flags)
(const_int 0 [0]))
(label_ref:DI 34)
(pc))) "pr30908.c":3 646 {*jcc_1}
 (expr_list:REG_DEAD (reg:CCNO 17 flags)
(int_list:REG_BR_PROB 1500 (nil)))
 -> 34)
(note 12 19 13 4 [bb 4] NOTE_INSN_BASIC_BLOCK)
(insn 13 12 14 4 (parallel [
(asm_operands/v ("nop") ("") 0 []
 []
 [] pr30908.c:4)
(clobber (mem:BLK (scratch) [0  A8]))
(clobber (reg:CCFP 18 fpsr))
(clobber (reg:CC 17 flags))
]) "pr30908.c":4 -1
 (expr_list:REG_UNUSED (reg:CCFP 18 fpsr)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 14 13 35 4 (parallel [
(set (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(plus:SI (reg:SI 5 di [orig:90 ivtmp.9 ] [90])
(const_int -1 [0x])))
(clobber (reg:CC 17 flags))
]) 210 {*addsi_1}
 (expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)))
(jump_insn 35 14 36 4 (set (pc)
(label_ref 15)) -1
 (nil)
 -> 15)
(barrier 36 35 34)
(code_label 34 36 30 5 5 "" [1 uses])
(note 30 34 28 5 [bb 5] NOTE_INSN_BASIC_BLOCK)
(note 28 30 29 5 NOTE_INSN_EPILOGUE_BEG)
(jump_insn 29 28 31 5 (simple_return) "pr30908.c":5 708 {simple_return_internal}
 (nil)
 -> simple_return)


Re: Worse code after bbro?

2017-01-05 Thread Senthil Kumar Selvaraj

Segher Boessenkool writes:

> On Wed, Jan 04, 2017 at 10:05:49AM +0100, Richard Biener wrote:
>> > The code size is identical, but the trunk version executes one more
>> > instruction everytime the loop runs (explicit jump to .L5 with trunk vs
>> > fallthrough with 4.8) - it's faster only if the loop never runs. This
>> > happens irrespective of the memory clobber inline assembler statement.
>
> With -Os you've asked for smaller code, not faster code.
>
> All of the block reordering is based on heuristics -- there is no polynomial
> time and space algorithm to do it optimally, let alone the linear time and
> space we need in GCC -- so there always will be cases we do not handle
> optimally.  -Os does not get as much attention as -O2 etc., as well.
>
> OTOH this seems to be a pretty common case that we could handle.  Please
> open a PR to keep track of this?
>

Filed PR 79012 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79012)

Regards
Senthil


Re: GCC ARM: aligned access

2014-08-31 Thread Senthil Kumar Selvaraj
On Mon, Sep 01, 2014 at 09:14:31AM +0800, Peng Fan wrote:
> 
> 
> On 09/01/2014 08:09 AM, Matt Thomas wrote:
> > 
> > On Aug 31, 2014, at 11:32 AM, Joel Sherrill  
> > wrote:
> > 
> >>> Hi,
> >>>
> >>> I am writing some code and found that system crashed. I found it was
> >>> unaligned access which causes `data abort` exception. I write a piece
> >>> of code and objdump
> >>> it. I am not sure this is right or not.
> >>>
> >>> command:
> >>> arm-poky-linux-gnueabi-gcc -marm -mno-thumb-interwork -mabi=aapcs-linux
> >>> -mword-relocations -march=armv7-a -mno-unaligned-access
> >>> -ffunction-sections -fdata-sections -fno-common -ffixed-r9 -msoft-float
> >>> -pipe  -O2 -c 2.c -o 2.o
> >>>
> >>> arch is armv7-a and used '-mno-unaligned access'
> >>
> >> I think this is totally expected. You were passed a u8 pointer which is 
> >> aligned for that type (no restrictions likely). You cast it to a type with 
> >> stricter alignment requirements. The code is just flawed. Some CPUs handle 
> >> unaligned accesses but not your ARM.
> > 
> armv7 and armv6 arch except armv6-m support unaligned access. a u8 pointer is 
> casted to u32 pointer, should gcc take the align problem into consideration 
> to avoid possible errors? because -mno-unaligned-access.
> > While armv7 and armv6 supports unaligned access, that support has to be 
> > enabled by the underlying O/S.  Not knowing the underlying environment, 
> > I can't say whether that support is enabled.  One issue we had in NetBSD
> > in moving to gcc4.8 was that the NetBSD/arm kernel didn't enable unaligned
> > access for armv[67] CPUs.  We quickly changed things so unaligned access
> > is supported.
> 
> Yeah. by set a hardware bit in arm coprocessor, unaligned access will not 
> cause data abort exception.
> I just wonder is the following correct? should gcc take the responsibility to 
> take care possible unaligned pointer `u8 *data`? because 
> -mno-unaligned-access is passed to gcc.
> 
> int func(u8 *data)
>   
> { 
>   
> return *(unsigned int *)data; 
>   
> }

I guess -mno-unaligned-access only applies to (potentially) unaligned addresses 
that the compiler itself is aware of, like packed struct members.
Otherwise gcc would have to consider *every* memory load/store as
unaligned and break them down into byte loads/stores. In the above case,
you are telling the compiler that you know it is word aligned (by
casting), and the compiler believes you :).

Regards
Senthil


Missed optimizations at -Os

2017-01-17 Thread Senthil Kumar Selvaraj
Hi,

  For this (reduced) test case

  
extern int x, y, z;
void foo(void);
void bar(void);
void blah(void);

void test (void)
{
  int flag = 0;
  flag = ((x && y) || z);

  if (flag && x && y) {
 bar();
  }
}

  I expected gcc -Os (x86_64, if it matters) to generate code equivalent to

if (x && y)
  bar();


  Instead, I get

test():
mov eax, DWORD PTR x[rip]
testeax, eax
je  .L2
cmp DWORD PTR y[rip], 0
jne .L3
.L2:
cmp DWORD PTR z[rip], 0
je  .L1
testeax, eax
je  .L1
.L3:
cmp DWORD PTR y[rip], 0
je  .L1
jmp bar()
.L1:
ret

At -O1 and above though, I get what I expected. At -O3
test():
mov edx, DWORD PTR x[rip]
testedx, edx
je  .L1
mov eax, DWORD PTR y[rip]
testeax, eax
je  .L1
jmp bar()
.L1:
rep ret


Tracing through the dumps, I see that dom2 is where the gimple starts
diverging. At -O3, dom2 clones the bb that tests z into two copies, and
I guess that enables jump threading and subsequent dse to optimize away the
second (redundant) check for x and y, as also the check for z. At -Os,
dom2 doesn't attemp the bb clone as it thinks it would increase code
size. 

I have two questions.

1. Is the analysis right? Is there anything that can be done to fix this?

2. If nothing can be done to fix this, is there some pass that can
rewire goto  in  to goto ?

Regards
Senthil


.dom2 at Os


test ()
{
  int x.1_1;
  int y.2_2;
  int z.3_3;
  int y.5_4;
  _Bool _8;
  _Bool _9;
  _Bool _10;

   [100.00%]:
  x.1_1 = x;
  if (x.1_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [50.00%]:
  y.2_2 = y;
  if (y.2_2 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [75.00%]:
  z.3_3 = z;
  _8 = x.1_1 != 0;
  _9 = z.3_3 != 0;
  _10 = _8 & _9;
  if (_10 != 0)
goto ; [25.60%]
  else
goto ; [74.40%]

   [35.37%]:
  y.5_4 = y;
  if (y.5_4 != 0)
goto ; [48.99%]
  else
goto ; [51.01%]

   [17.33%]:
  bar ();

   [100.00%]:
  return;

}

.dom2 at O3


test ()
{
  int x.1_1;
  int y.2_2;
  int z.3_11;
  _Bool _12;
  _Bool _13;
  _Bool _14;
  int y.5_15;
  int z.3_16;
  _Bool _17;
  _Bool _18;
  _Bool _19;
  int y.5_20;

   [100.00%]:
  x.1_1 = x;
  if (x.1_1 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [50.00%]:
  y.2_2 = y;
  if (y.2_2 != 0)
goto ; [50.00%]
  else
goto ; [50.00%]

   [100.00%]:
  return;

   [50.00%]:
  z.3_11 = z;
  _12 = x.1_1 != 0;
  _13 = z.3_11 != 0;
  _14 = _12 & _13;
  goto ; [100.00%]

   [18.04%]:
  y.5_15 = y;
  goto ; [100.00%]

   [25.00%]:
  z.3_16 = z;
  _17 = x.1_1 != 0;
  _18 = z.3_16 != 0;
  _19 = _17 & _18;
  if (_19 != 0)
goto ; [72.17%]
  else
goto ; [27.83%]

   [25.00%]:
  y.5_20 = y;
  bar ();
  goto ; [100.00%]

}





[Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-03-27 Thread Senthil Kumar Selvaraj
Hi,

global-used-types.c in gcc/testsuite/gcc.dg/debug/dwarf2 only specifies
-g in dg-options. For a target that is not configured to generate
dwarf-2 by default, the test fails looking for specific DWARF strings in
the generated assembly.

The patch below changes dg-options to -gdwarf-2. Can someone
apply if it is ok?

Regards
Senthil


2013-03-27  Senthil Kumar Selvaraj  

* gcc.dg/debug/dwarf2/global-used-types.c: Specify -gdwarf-2 in
dg-options


diff --git gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c 
gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
index 54fa58a..03c6ede 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
@@ -1,6 +1,6 @@
 /*
  Contributed by Dodji Seketeli 
- { dg-options "-g -dA -fno-merge-debug-strings" }
+ { dg-options "-gdwarf-2 -dA -fno-merge-debug-strings" }
  { dg-do compile }
  { dg-final { scan-assembler-times "DIE \\(0x\[^\n\]*\\) 
DW_TAG_enumeration_type" 1 } }
  { dg-final { scan-assembler-times "DIE \\(0x\[^\n\]*\\) DW_TAG_enumerator" 2 
} }


_Alignas attribute and HOST_BITS_PER_INT

2013-03-27 Thread Senthil Kumar Selvaraj
Hi,

I was looking at why gcc.dg/c1x-align-3.c (test for errors, line 15) is
failing for the AVR target, and I found that the test expects _Alignas
with -__INT_MAX__ - 1 to fail with a "too large" error.

I looked at the code responsible for generating the error (c-common.c,
check_user_alignment) and found that it checks if the number of
bits in the user provided alignment is more than HOST_BITS_PER_INT -
BITS_PER_UNIT_LOG. For AVR, integer size is 16 bits, and therefore 
__INT_MAX__ is 2^15 - 1. HOST_BITS_PER_INT, however, is 32 bits, and
hence the error is not triggered.

Is it right to check against HOST_BITS_PER_INT, when the alignment
attribute only applies to the target? If the check is indeed correct,
should the test be modified to handle targets whose __INT__MAX__ is less
than 2^HOST_BITS_PER_INT - 1 ?

Regards
Senthil


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-03-28 Thread Senthil Kumar Selvaraj
On Wed, Mar 27, 2013 at 08:43:53AM -0700, Mike Stump wrote:
> On Mar 27, 2013, at 1:02 AM, Senthil Kumar Selvaraj 
>  wrote:
> > global-used-types.c in gcc/testsuite/gcc.dg/debug/dwarf2 only specifies
> > -g in dg-options. For a target that is not configured to generate
> > dwarf-2 by default, the test fails looking for specific DWARF strings in
> > the generated assembly.
> > 
> > The patch below changes dg-options to -gdwarf-2. Can someone
> > apply if it is ok?
> 
> Ok.  [ that clears the way for application. ]

I found a bunch of other testcases missing -gdwarf-2, attached is a 
patch that includes fixes for them as well. Finding these many testcases
with just -g makes me a bit suspicious though, could there be a reason why
-gdwarf-2 is not specified?

Regards
Senthil

2013-03-28  Senthil Kumar Selvaraj  

* gcc.dg/debug/dwarf2/global-used-types.c: Specify -gdwarf-2 in
dg-options
* gcc.dg/debug/dwarf2/inline2.c: Likewise
* gcc.dg/debug/dwarf2/inline3.c: Likewise
* gcc.dg/debug/dwarf2/pr37726.c: Likewise 
* gcc.dg/debug/dwarf2/pr41445-1.c: Likewise 
* gcc.dg/debug/dwarf2/pr41445-2.c: Likewise
* gcc.dg/debug/dwarf2/pr41445-3.c: Likewise
* gcc.dg/debug/dwarf2/pr41445-4.c: Likewise
* gcc.dg/debug/dwarf2/pr41445-5.c: Likewise
* gcc.dg/debug/dwarf2/pr41445-6.c: Likewise
* gcc.dg/debug/dwarf2/pr47939-1.c: Likewise
* gcc.dg/debug/dwarf2/pr47939-2.c: Likewise
* gcc.dg/debug/dwarf2/pr47939-3.c: Likewise
* gcc.dg/debug/dwarf2/pr47939-4.c: Likewise
* gcc.dg/debug/dwarf2/pr53948.c: Likewise
* gcc.dg/debug/dwarf2/struct-loc1.c: Likewise


diff --git gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c 
gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
index 54fa58a..03c6ede 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/global-used-types.c
@@ -1,6 +1,6 @@
 /*
  Contributed by Dodji Seketeli 
- { dg-options "-g -dA -fno-merge-debug-strings" }
+ { dg-options "-gdwarf-2 -dA -fno-merge-debug-strings" }
  { dg-do compile }
  { dg-final { scan-assembler-times "DIE \\(0x\[^\n\]*\\) 
DW_TAG_enumeration_type" 1 } }
  { dg-final { scan-assembler-times "DIE \\(0x\[^\n\]*\\) DW_TAG_enumerator" 2 
} }
diff --git gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c 
gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
index 20edb58..b9ce3a5 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/inline2.c
@@ -14,7 +14,7 @@
   properly nested DW_TAG_inlined_subroutine DIEs for third, second and first.
 */
 
-/* { dg-options "-O -g3 -dA" } */
+/* { dg-options "-O -g3 -gdwarf-2 -dA" } */
 /* { dg-do compile } */
 
 /* There are 6 inlined subroutines:
diff --git gcc/testsuite/gcc.dg/debug/dwarf2/inline3.c 
gcc/testsuite/gcc.dg/debug/dwarf2/inline3.c
index d2d3e0f..caa397e 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/inline3.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/inline3.c
@@ -1,7 +1,7 @@
 /* Verify that only one DW_AT_const_value is emitted for baz,
not for baz abstract DIE and again inside of
DW_TAG_inlined_subroutine.  */
-/* { dg-options "-O2 -g -dA -fmerge-all-constants" } */
+/* { dg-options "-O2 -gdwarf-2 -dA -fmerge-all-constants" } */
 /* { dg-do compile } */
 /* { dg-final { scan-assembler-times " DW_AT_const_value" 1 } } */
 
diff --git gcc/testsuite/gcc.dg/debug/dwarf2/pr37726.c 
gcc/testsuite/gcc.dg/debug/dwarf2/pr37726.c
index 60fb839..2f2d26c 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr37726.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr37726.c
@@ -1,6 +1,6 @@
 /* PR debug/37726 */
 /* { dg-do compile } */
-/* { dg-options "-g -O0 -dA -fno-merge-debug-strings" } */
+/* { dg-options "-gdwarf-2 -O0 -dA -fno-merge-debug-strings" } */
 
 int foo (int parm)
 {
diff --git gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-1.c 
gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-1.c
index 452c0f6..f91a79d 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-1.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-1.c
@@ -2,7 +2,7 @@
 /* Test that token after multi-line function-like macro use
gets correct locus even when preprocessing separately.  */
 /* { dg-do compile } */
-/* { dg-options "-save-temps -g -O0 -dA -fno-merge-debug-strings" } */
+/* { dg-options "-save-temps -gdwarf-2 -O0 -dA -fno-merge-debug-strings" } */
 
 #define A(a,b)
 int varh;A(1,
diff --git gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-2.c 
gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-2.c
index d2ee408..a39419e 100644
--- gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-2.c
+++ gcc/testsuite/gcc.dg/debug/dwarf2/pr41445-2.c
@@ -1,6 +1,6 @@
 /* PR preprocessor/41445 */
 /* { dg-do compile } */
-/* { dg-options "-g -O0 -dA -fno-merge-debug-strings" } */
+/* { dg-o

Re: _Alignas attribute and HOST_BITS_PER_INT

2013-03-28 Thread Senthil Kumar Selvaraj
On Wed, Mar 27, 2013 at 03:13:13PM +, Joseph S. Myers wrote:
> On Wed, 27 Mar 2013, Senthil Kumar Selvaraj wrote:
> 
> > Hi,
> > 
> > I was looking at why gcc.dg/c1x-align-3.c (test for errors, line 15) is
> > failing for the AVR target, and I found that the test expects _Alignas
> > with -__INT_MAX__ - 1 to fail with a "too large" error.
> 
> It expects either an error either about being too large, or about not 
> being a power of 2.
> 

Yes, that's right. I was focusing on the "too large" error, because
that's what is causing the test to pass for a native build
(x86_64-pc-linux).

> > Is it right to check against HOST_BITS_PER_INT, when the alignment
> 
> A check against HOST_BITS_PER_INT would be because of code inside GCC that 
> uses host "int" to store alignment values.  Ideally there wouldn't be such 
> code - ideally any alignment up to and including the size of the whole 
> target address space could be used.  (For example, alignments could always 
> be represented internally as a base-2 log.)  But given the existence of 
> such code, such a check is needed.
> 
> However, a size that is not a power of 2 (such as this one, minus a power 
> of 2) should still be detected and get an error that this test accepts, 
> whether or not that size is also too large for host int.  So look at why 
> you don't get the "requested alignment is not a power of 2" error for this 
> code with a negative alignment.
> 

The tree_log2 function, called by check_user_alignment, uses a HOST_WIDE_INT
to hold the alignment value, but masks out everything except the number of 
bits in the alignment value i.e.

low &= ~((HOST_WIDE_INT) (-1) << prec);

where prec is 16 for the AVR target. This results in -32768 (__INT_MAX__
- 1) becoming 32768, and that is of course a power of 2. The same
behavior occurs for a native build (x86_64), except that prec is 32 in
that case.

tree_log2 appears to be a general function, so I suppose the check for
negative integers must be made in check_user_alignment. Will the
following patch work? (Bootstrapped x86_64, all alignment tests pass).

Regards
Senthil

ChangeLog

2013-03-28  Senthil Kumar Selvaraj  

* c-common.c (check_user_alignment): Emit error for negative
constants.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index c7cdd0f..dfdfbb6 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -7308,9 +7308,10 @@ check_user_alignment (const_tree align, bool allow_zero)
 }
   else if (allow_zero && integer_zerop (align))
 return -1;
-  else if ((i = tree_log2 (align)) == -1)
+  else if (tree_int_cst_sgn (align) == -1
+   || (i = tree_log2 (align)) == -1)
 {
-  error ("requested alignment is not a power of 2");
+  error ("requested alignment is not a positive power of 2");
   return -1;
 }
   else if (i >= HOST_BITS_PER_INT - BITS_PER_UNIT_LOG)


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-03-29 Thread Senthil Kumar Selvaraj
On Fri, Mar 29, 2013 at 02:46:49PM -0400, Jason Merrill wrote:
> On 03/28/2013 06:09 PM, Mike Stump wrote:
> >Hum… I can't help but wonder if there was supposed to be code that checks to 
> >ensure dwarf is supported and the default before doing the entire test suite.
> 
> That's what I thought, yes.  And we don't want to specify -gdwarf-2
> explicitly, as now -gdwarf-4 is the default.
> 
 
dwarf2.exp does check if DWARF is supported by compiling a test file with 
-gdwarf-2, but it doesn't check if it's the default.

I couldn't find a way to ask gcc to just generate DWARF (default
version) either. How should this be fixed?

Regards
Senthil


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-04-02 Thread Senthil Kumar Selvaraj
On Mon, Apr 01, 2013 at 06:46:29PM -0700, Mike Stump wrote:
> On Apr 1, 2013, at 6:43 PM, Jason Merrill  wrote:
> > On 03/30/2013 02:23 AM, Senthil Kumar Selvaraj wrote:
> >> I couldn't find a way to ask gcc to just generate DWARF (default
> >> version) either. How should this be fixed?
> > 
> > Maybe we could use -gdwarf for that now, since we stopped using it for 
> > DWARF 1 in GCC 3.4.
> 
> I like that.
> 
Ok, how about the following (tentative) patch? If -gdwarf- is
specified without any argument, it picks DWARF 4 as the default.

Regards
Senthil


diff --git gcc/common.opt gcc/common.opt
index bdbd3b6..58311e7 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -2307,7 +2307,7 @@ Common JoinedOrMissing Negative(gdwarf-)
 Generate debug information in COFF format
 
 gdwarf-
-Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
+Common JoinedOrMissing UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format
 
 ggdb
diff --git gcc/opts.c gcc/opts.c
index 45b12fe..336554b 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1700,12 +1700,24 @@ common_handle_option (struct gcc_options *opts,
   break;
 
 case OPT_gdwarf_:
-  if (value < 2 || value > 4)
-   error_at (loc, "dwarf version %d is not supported", value);
-  else
-   opts->x_dwarf_version = value;
-  set_debug_level (DWARF2_DEBUG, false, "", opts, opts_set, loc);
-  break;
+  {
+/* Treat -gdwarf- without any argument (dwarf version) as equivalent to
+   -gdwarf- */
+const int max_dwarf_version = 4;
+if (strlen (arg) == 0)
+  {
+opts->x_dwarf_version = max_dwarf_version;
+  }
+else
+  {
+if (value < 2 || value > max_dwarf_version)
+  error_at (loc, "dwarf version %d is not supported", value);
+else
+  opts->x_dwarf_version = value;
+  }
+set_debug_level (DWARF2_DEBUG, false, "", opts, opts_set, loc);
+break;
+  }
 
 case OPT_gsplit_dwarf:
   set_debug_level (NO_DEBUG, DEFAULT_GDB_EXTENSIONS, "", opts, opts_set,


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-04-02 Thread Senthil Kumar Selvaraj
On Tue, Apr 02, 2013 at 11:09:12AM -0400, Jason Merrill wrote:
> On 04/02/2013 09:07 AM, Senthil Kumar Selvaraj wrote:
> >Ok, how about the following (tentative) patch? If -gdwarf- is
> >specified without any argument, it picks DWARF 4 as the default.
> 
> -gdwarf- looks a bit odd to me; I was thinking -gdwarf without the
> trailing -.

Does the below patch look good?

Regards
Senthil


diff --git gcc/common.opt gcc/common.opt
index bdbd3b6..5af41d9 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -2306,8 +2306,12 @@ gcoff
 Common JoinedOrMissing Negative(gdwarf-)
 Generate debug information in COFF format
 
+gdwarf
+Common UInteger Var(dwarf_default_version, 4) Negative(gdwarf-)
+Generate debug information in the default DWARF version format
+
 gdwarf-
-Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
+Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs) 
Negative(gdwarf)
 Generate debug information in DWARF v2 (or later) format
 
 ggdb
diff --git gcc/opts.c gcc/opts.c
index 45b12fe..c6823a6 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1698,7 +1698,9 @@ common_handle_option (struct gcc_options *opts,
 case OPT_gcoff:
   set_debug_level (SDB_DEBUG, false, arg, opts, opts_set, loc);
   break;
-
+  
+case OPT_gdwarf:
+  value = opts->x_dwarf_default_version;
 case OPT_gdwarf_:
   if (value < 2 || value > 4)
error_at (loc, "dwarf version %d is not supported", value);


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-04-04 Thread Senthil Kumar Selvaraj
On Wed, Apr 03, 2013 at 11:07:04PM -0400, Jason Merrill wrote:
> On 04/02/2013 03:25 PM, Senthil Kumar Selvaraj wrote:
> >+gdwarf
> >+Common UInteger Var(dwarf_default_version, 4) Negative(gdwarf-)
> >+Generate debug information in the default DWARF version format
> 
> The Negative options need to form a circular chain, so gcoff should
> have Negative(gdwarf) and gdwarf should have Negative(gdwarf-).
> 
> I don't think you need a dwarf_default_version variable since
> there's already dwarf_version.
> 

Fixed those in the below patch.

> It would be nice to give a helpful diagnostic if people try to write
> -gdwarf2 to suggest that they either write -gdwarf-2 or -gdwarf -g2.
> 

I wasn't able to get this done in a clean way. To make the gdwarf option
handling code detect this error, the only I found was to make it 
JoinedOrMissing, and then raise the error if there was an argument
provided. Otherwise, I'd have to do it in the -g option case, which I
think is messy.

Regards
Senthil


diff --git gcc/common.opt gcc/common.opt
index e02e7ed..b9ba416 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -2308,9 +2308,13 @@ Common JoinedOrMissing
 Generate debug information in default format
 
 gcoff
-Common JoinedOrMissing Negative(gdwarf-)
+Common JoinedOrMissing Negative(gdwarf)
 Generate debug information in COFF format
 
+gdwarf
+Common Var(dwarf_version, 4) Negative(gdwarf-)
+Generate debug information in the default DWARF version format
+
 gdwarf-
 Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format
diff --git gcc/opts.c gcc/opts.c
index 45b12fe..60f290d 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1699,6 +1699,8 @@ common_handle_option (struct gcc_options *opts,
   set_debug_level (SDB_DEBUG, false, arg, opts, opts_set, loc);
   break;
 
+case OPT_gdwarf:
+  value = opts->x_dwarf_version;
 case OPT_gdwarf_:
   if (value < 2 || value > 4)
error_at (loc, "dwarf version %d is not supported", value);


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-04-04 Thread Senthil Kumar Selvaraj
On Thu, Apr 04, 2013 at 01:41:30PM -0400, Jason Merrill wrote:
> On 04/04/2013 11:48 AM, Senthil Kumar Selvaraj wrote:
> >+Common Var(dwarf_version, 4) Negative(gdwarf-)
> 
> I don't think you need to mention the variable on this line; it's
> already there and statically initialized for gdwarf-.
> 
Yeah, you're right. I was originally attempting to use "value"
directly, and that of course, didn't work as it was set to 1.
Sorry, I should have been more careful.

> >I wasn't able to get this done in a clean way. To make the gdwarf option
> >handling code detect this error, the only I found was to make it
> >JoinedOrMissing, and then raise the error if there was an argument
> >provided.
> 
> That sounds clean enough to me.
> 

Ok, I thought it was odd to specify that gdwarf has an optional
argument in the opt file, only to reject it outright when processing it.
Will send another patch tomorrow.

Regards
Senthil


Re: [Patch, testsuite] Add missing -gdwarf-2 flag in debug/dwarf2 testcase

2013-04-09 Thread Senthil Kumar Selvaraj
On Fri, Apr 05, 2013 at 12:02:49AM +0530, Senthil Kumar Selvaraj wrote:
> On Thu, Apr 04, 2013 at 01:41:30PM -0400, Jason Merrill wrote:
> > On 04/04/2013 11:48 AM, Senthil Kumar Selvaraj wrote:
> > >+Common Var(dwarf_version, 4) Negative(gdwarf-)
> > 
> > I don't think you need to mention the variable on this line; it's
> > already there and statically initialized for gdwarf-.
> > 
> Yeah, you're right. I was originally attempting to use "value"
> directly, and that of course, didn't work as it was set to 1.
> Sorry, I should have been more careful.
> 
> > >I wasn't able to get this done in a clean way. To make the gdwarf option
> > >handling code detect this error, the only I found was to make it
> > >JoinedOrMissing, and then raise the error if there was an argument
> > >provided.
> > 
> > That sounds clean enough to me.
> > 
> 
> Ok, I thought it was odd to specify that gdwarf has an optional
> argument in the opt file, only to reject it outright when processing it.
> Will send another patch tomorrow.

I added detection and error emission if version is specified with
gdwarf. Does this look ok?

Regards
Senthil


diff --git gcc/common.opt gcc/common.opt
index e02e7ed..e3645c3 100644
--- gcc/common.opt
+++ gcc/common.opt
@@ -2308,9 +2308,13 @@ Common JoinedOrMissing
 Generate debug information in default format
 
 gcoff
-Common JoinedOrMissing Negative(gdwarf-)
+Common JoinedOrMissing Negative(gdwarf)
 Generate debug information in COFF format
 
+gdwarf
+Common JoinedOrMissing UInteger Negative(gdwarf-)
+Generate debug information in the default DWARF version format
+
 gdwarf-
 Common Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format
diff --git gcc/opts.c gcc/opts.c
index 45b12fe..4bfee42 100644
--- gcc/opts.c
+++ gcc/opts.c
@@ -1699,6 +1699,17 @@ common_handle_option (struct gcc_options *opts,
   set_debug_level (SDB_DEBUG, false, arg, opts, opts_set, loc);
   break;
 
+case OPT_gdwarf:
+  if (arg && strlen(arg) != 0)
+{
+  error_at (loc, "dwarf version is not allowed. "
+  "Try -gdwarf-%d or -g%d instead.", value, value);
+  break;
+}
+  else
+{
+  value = opts->x_dwarf_version;
+}
 case OPT_gdwarf_:
   if (value < 2 || value > 4)
error_at (loc, "dwarf version %d is not supported", value);


[Testsuite] tree-ssa failures for targets with non 32 bit int size

2013-04-25 Thread Senthil Kumar Selvaraj
I noticed that there is a bunch of testcases in gcc.dg/tree-ssa 
(slsr-27.c, for e.g.) that assume that the size of the integer is 4
bytes. For example, slsr-27.c has

struct x
{
  int a[16];
  int b[16];
  int c[16];
 };

and 

void
f (struct x *p, unsigned int n)
{
  foo (p->a[n], p->c[n], p->b[n]);
}

and expects a "* 4" to be present in the dump, assuming the size of an int to 
be 4 bytes (n * 4 gives the array offset).

What is right way to fix these? I saw one testcase that did

 typedef int int32_t __attribute__ ((__mode__ (__SI__)));

and used int32_t everywhere where a 32 bit int is assumed. Is this the right
way to go? Or maybe some preprocessor magic that replaces the "int" token with
one that has the SI attribute? Or should the test assertion be branched for 
differing sizes of int?

Regards
Senthil



Re: [Testsuite] tree-ssa failures for targets with non 32 bit int size

2013-04-26 Thread Senthil Kumar Selvaraj
On Fri, Apr 26, 2013 at 10:03:43AM +0200, Richard Biener wrote:
> On Thu, Apr 25, 2013 at 6:00 PM, Mike Stump  wrote:
> > On Apr 25, 2013, at 7:44 AM, Senthil Kumar Selvaraj 
> >  wrote:
> >> What is right way to fix these? I saw one testcase that did
> >>
> >> typedef int int32_t __attribute__ ((__mode__ (__SI__)));
> >>
> >> Is this the right way to go?
> >
> > I like this.  Pre-approved.
> 
> We also have
> 
> /* { dg-require-effective-target int32plus } */
> 
> which is used throughout the testsuite in some cases.

Wouldn't that prevent the tests from running for non 32 bit int targets?
Correct me if I'm wrong, but from what I understand, the tree-ssa tests
don't really depend on ints to be 32 bits - it's just an incorrect
assumption?

Regards
Senthil


--with-dwarf2 and default DWARF version

2013-08-26 Thread Senthil Kumar Selvaraj
Hi,

  The help text for the --with-dwarf2 configure option says it forces
  the default debug format to be DWARF 2. However, the built compiler
  still produces DWARF 4, unless -gdwarf-2 is explicitly specified when
  compiling.

  I see that the darwin target gets around this by using the
  SUBSUBTARGET_OVERRIDE_OPTIONS hook and setting dwarf_version to 2 if no
  specific dwarf version was requested.

  Is this the recommended way to go? If yes, isn't the help text for
  --with-dwarf2 misleading? Or should --with-dwarf2 be fixed to do what
  it says?

Regards
Senthil


Re: --with-dwarf2 and default DWARF version

2013-08-26 Thread Senthil Kumar Selvaraj
On Mon, Aug 26, 2013 at 10:23:24AM +0200, Jakub Jelinek wrote:
> On Mon, Aug 26, 2013 at 01:42:18PM +0530, Senthil Kumar Selvaraj wrote:
> >   The help text for the --with-dwarf2 configure option says it forces
> >   the default debug format to be DWARF 2. However, the built compiler
> >   still produces DWARF 4, unless -gdwarf-2 is explicitly specified when
> >   compiling.
> 
> This configure option is really --with-dwarf[234], it was added when
> there was dwarf (1) support and dwarf2 support.  In several places
> (including the filename of dwarf2out.c) GCC still talks about dwarf2 when it
> means DWARF2 or later.

Ah ok.
> 
> >   I see that the darwin target gets around this by using the
> >   SUBSUBTARGET_OVERRIDE_OPTIONS hook and setting dwarf_version to 2 if no
> >   specific dwarf version was requested.
> 
> It isn't any kind of workaround the above.  Apple tools are simply so buggy
> that they can't grok DWARF later than 2, crash in lots of ways on it.
> Why do you want DWARF2 rather than DWARF3 or 4?

Same reason, some tools (proprietary debugger etc..) only work with
DWARF2. So I guess the OVERRIDE_OPTIONS hook is the way to go.

Regards
Senthil


[IRA] Segfault in ira_costs.c:find_costs_and_classes for AVR target

2013-12-26 Thread Senthil Kumar Selvaraj
Hi,

  gcc.c-torture/compile/pr34856.c and a couple of other tests segfault 
  for the AVR target. Looking at the code, I found that the
  x_ira_register_move_cost array[TImode] is NULL, while the code goes on
  to dereference it (ira_register_move_cost
  [ALLOCNO_MODE (a)][best][aclass] at line 1832).

  I looked at the code that populates the array, and found that it does
  explicitly allow the array to have NULL entries for certain modes,
  atleast during initialization.

  I'm not really sure how this should be fixed - is NULL valid or is the
  AVR target violating some invariant?

Regards
Senthil
  
  



[Ping, IRA] Segfault in ira_costs.c:find_costs_and_classes for AVR target

2014-01-13 Thread Senthil Kumar Selvaraj
Ping !

Regards
Senthil

On Thu, Dec 26, 2013 at 03:11:25PM +0530, Senthil Kumar Selvaraj wrote:
> Hi,
> 
>   gcc.c-torture/compile/pr34856.c and a couple of other tests segfault 
>   for the AVR target. Looking at the code, I found that the
>   x_ira_register_move_cost array[TImode] is NULL, while the code goes on
>   to dereference it (ira_register_move_cost
>   [ALLOCNO_MODE (a)][best][aclass] at line 1832).
> 
>   I looked at the code that populates the array, and found that it does
>   explicitly allow the array to have NULL entries for certain modes,
>   atleast during initialization.
> 
>   I'm not really sure how this should be fixed - is NULL valid or is the
>   AVR target violating some invariant?
> 
> Regards
> Senthil
>   
>   
> 


Emitting discardable per-function DWARF debug_info

2014-01-13 Thread Senthil Kumar Selvaraj
I have been hacking a bit on dwarf2out.c to make gcc generate DWARF 
information that gets discarded along with the appropriate function section 
code when compiled with -ffunction-sections and then linked with 
--gc-sections. Currently, debug information for discarded functions is not 
removed, and this could cause the debugger to get confused.

The assembler already has a -gdwarf-sections flag to generate per
function sections for whatever debug information it generates, so I
figured I would do it for the compiler. Based on a previous discussion
on the binutils mailing list, I thought I'd generate separate
debug_info sections (for starters) for each function, and then tie it 
with the code section for that function by assigning them both to the 
same section group.

Based on the approach suggested by section E.3.3 of the DWARF 4
standard (using DW_TAG_imported_unit and the DW_AT_import attribute), I
managed to make the compiler to generate DWARF info as shown below.

Before going on, I thought I'd ask if my approach is right (and doable). 
Did I read the standard right? 

gdb currently crashes when presented with this DWARF. I guess both gcc and 
gdb currently use DW_TAG_imported_unit only to refer to other (partial) 
units from a DW_TAG_compile_unit, not the other way around?

Regards
Senthil

~  cat test.c
int _exit(int code) { return code; }

volatile int x;
static inline int add()
{
return x++;
}
int foo()
{
return add();
}

int doo()
{
x--;
return add();
}
int main() { 
return doo();
}
~  /scratch/arm/install/bin/arm-none-eabi-gcc test.c -g3 -ffunction-sections


~  /scratch/arm/install/bin/arm-none-eabi-objdump -Wi a.out

a.out: file format elf32-littlearm

Contents of the .debug_info section:

  Compilation Unit @ offset 0x0:
   Length:0x41 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
   Pointer Size:  4
 <0>: Abbrev Number: 1 (DW_TAG_compile_unit)
   DW_AT_producer: (indirect string, offset: 0x13f7): GNU C 4.9.0 
20131226 (experimental) -g3 -ffunction-sections
<10>   DW_AT_language: 1(ANSI C)
<11>   DW_AT_name: (indirect string, offset: 0x6b3): test.c
<15>   DW_AT_comp_dir: (indirect string, offset: 0x23): /home/saaadhu
<19>   DW_AT_ranges  : 0x0
<1d>   DW_AT_low_pc  : 0x0
<21>   DW_AT_stmt_list   : 0x0
<25>   DW_AT_GNU_macros  : 0x0
 <1><29>: Abbrev Number: 4 (DW_TAG_base_type)
<2a>   DW_AT_byte_size   : 4
<2b>   DW_AT_encoding: 5(signed)
<2c>   DW_AT_name: int
 <1><30>: Abbrev Number: 8 (DW_TAG_variable)
<31>   DW_AT_name: x
<33>   DW_AT_decl_file   : 1
<34>   DW_AT_decl_line   : 3
<35>   DW_AT_type: <0x3f>
<39>   DW_AT_external: 1
<39>   DW_AT_location: 5 byte block: 3 f0 c 1 0 (DW_OP_addr: 10cf0)
 <1><3f>: Abbrev Number: 9 (DW_TAG_volatile_type)
<40>   DW_AT_type: <0x29>
 <1><44>: Abbrev Number: 0
  Compilation Unit @ offset 0x45:
   Length:0x31 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
   Pointer Size:  4
 <0><50>: Abbrev Number: 10 (DW_TAG_imported_unit)
<51>   DW_AT_import  : <0xb>[Abbrev Number: 1]
 <1><55>: Abbrev Number: 2 (DW_TAG_subprogram)
<56>   DW_AT_external: 1
<56>   DW_AT_name: (indirect string, offset: 0xcf8): _exit
<5a>   DW_AT_decl_file   : 1
<5b>   DW_AT_decl_line   : 1
<5c>   DW_AT_prototyped  : 1
<5c>   DW_AT_type: <0x6e>
<60>   DW_AT_low_pc  : 0x8350
<64>   DW_AT_high_pc : 0x24
<68>   DW_AT_frame_base  : 1 byte block: 9c (DW_OP_call_frame_cfa)
<6a>   DW_AT_GNU_all_call_sites: 1
 <2><6a>: Abbrev Number: 3 (DW_TAG_formal_parameter)
<6b>   DW_AT_name: (indirect string, offset: 0x1d41): code
<6f>   DW_AT_decl_file   : 1
<70>   DW_AT_decl_line   : 1
<71>   DW_AT_type: <0x6e>
<75>   DW_AT_location: 2 byte block: 91 74  (DW_OP_fbreg: -12)
 <2><78>: Abbrev Number: 0
 <1><79>: Abbrev Number: 0
  Compilation Unit @ offset 0x7a:
   Length:0x22 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
   Pointer Size:  4
 <0><85>: Abbrev Number: 10 (DW_TAG_imported_unit)
<86>   DW_AT_import  : <0xb>[Abbrev Number: 1]
 <1><8a>: Abbrev Number: 5 (DW_TAG_subprogram)
<8b>   DW_AT_name: add
<8f>   DW_AT_decl_file   : 1
<90>   DW_AT_decl_line   : 4
<91>   DW_AT_type: <0xa3>
<95>   DW_AT_low_pc  : 0x8374
<99>   DW_AT_high_pc : 0x30
<9d>   DW_AT_frame_base  : 1 byte block: 9c (DW_OP_call_frame_cfa)
<9f>   DW_AT_GNU_all_call_sites: 1
 <1><9f>: Abbrev Number: 0
  Compilation Unit @ offset 0xa0:
   Length:0x22 (32-bit)
   Version:   4
   Abbrev Offset: 0x0
   Pointer Size:  4
 <0>: Abbrev Number: 10 (DW_TAG_imported_unit)
   DW_AT_import  : <0xb>[Abbrev Number: 1]
 <1>: Abbrev Number: 6 (DW_TAG_subprogram)
   DW_AT_external: 1
   DW_AT_name: foo
 

[Testsuite] getpid in gcc.c-torture/execute/pr58419.c

2014-01-27 Thread Senthil Kumar Selvaraj
All,

  gcc.c-torture/execute/pr58419.c has a call to getpid, and this causes
  a linker error on the AVR (embedded) target. Is the call intentional,
  and if yes, how should this be fixed for targets that don't support an
  OS?

Regards
Senthil


Re: [Testsuite] getpid in gcc.c-torture/execute/pr58419.c

2014-01-27 Thread Senthil Kumar Selvaraj
This is on trunk - I was under the impression that it is always trunk,
unless otherwise stated?

getpid doesn't really make sense for bare metal targets, I would think?

Regards
Senthil

On Mon, Jan 27, 2014 at 01:04:48PM +, Umesh Kalappa wrote:
> Senthil,
> Please do let us know the gcc version ,I couldn't locate the file pr58419.c  
> in  the GCC 4.8.1 source .
>  
> To go with the below problem ,you can attributed the getpid() function as 
> weak (http://www.embedded-bits.co.uk/2008/gcc-weak-symbols/).
> 
> ~Umesh
> 
> -Original Message-
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of 
> Senthil Kumar Selvaraj
> Sent: 27 January 2014 15:18
> To: gcc@gcc.gnu.org
> Subject: [Testsuite] getpid in gcc.c-torture/execute/pr58419.c
> 
> All,
> 
>   gcc.c-torture/execute/pr58419.c has a call to getpid, and this causes
>   a linker error on the AVR (embedded) target. Is the call intentional,
>   and if yes, how should this be fixed for targets that don't support an
>   OS?
> 
> Regards
> Senthil


Re: Testing machine descriptions

2014-03-27 Thread Senthil Kumar Selvaraj
On Thu, Mar 27, 2014 at 07:51:06PM -0400, Niranjan Hasabnis wrote:
> Hi DJ Delorie,
> 
> Thank you for your answer. It is useful. One more question: so does the
> main testsuite cover all md entries? Meaning all possible assembly
> instructions that could be generated by that md are checked by the
> main testsuite? Thank you again.
> 

You can check it out yourself - just build gcc with coverage
instrumentation turned on, and then use gcov.

http://tryout.senthilthecoder.com/view/coverage/gcc/config/avr/avr.md.gcov.html
shows coverage information for the AVR target's md file.

Regards
Senthil


Function parameter debug info at -O0

2012-08-06 Thread Senthil Kumar Selvaraj
Hi,

The following program, when compiled with -O0 -g3 (x86_64 target, but
doesn't seem to matter), shows wrong values for p (function parameter)
when debugging.

[saaadhu@jaguar scratch]$ cat test.c
int func(int p)
{
p = 20;
p = 30;
}

int main()
{
int local = 42;
func(local);
}

The assembly generated looks like this

func:
pushq   %rbp
movq%rsp, %rbp
movl%edi, -20(%rbp)
movl$20, -4(%rbp)
movl$30, -4(%rbp)
popq %rbp
ret

I guess the DWARF debug information generated specifies -20(%rbp)
as the offset to look for p, and that's why the debugger keeps showing
the passed value (42) instead of 20 and 30 (see gdb session dump below).

The same behavior occurs for the avr target as well, and debugging
cc1 showed that the initial move happens from gimple_expand_cfg's 
expand_function_start, when it eventually calls assign_parm_setup_stack.
That in itself is ok I guess, but the frame offset doesn't match other
reads from and writes to the same variable.

Debugging further, I found that that the offset is bumped up because
expand_vars, called earlier from gimple_expand_cfg, allocates space on
the frame for p as well. Specifically, expand_used_vars, when looping
through SA.map->num_partitions, sees that SA.partition_has_default_def
is false for the partition corresponding to p, and calls expand_one_var 
to allocate space for it.

remove_ssa_form sets partition_has_default_def, but looping
over num_ssa_names does not include the tree node for the parameter.

Before investigating further, I wanted to know if this actually is a bug
and if I'm debugging it right. Shouldn't function parameters be defined by 
default 
on entry into the function? 

Regards
Senthil


[saaadhu@jaguar scratch]$ gcc -g3 -O0 test.c
[saaadhu@jaguar scratch]$ gdb a.out
GNU gdb (GDB) 7.4.1
...
(gdb) b func
Breakpoint 1 at 0x4004b3: file test.c, line 3.
(gdb) start
Temporary breakpoint 2 at 0x4004cb: file test.c, line 9.
Starting program: /home/saaadhu/scratch/a.out 
Temporary breakpoint 2, main () at test.c:9
9   int local = 42;
(gdb) c
Continuing.

Breakpoint 1, func (p=42) at test.c:3
3   p = 20;
(gdb) n
4   p = 30;
(gdb) print p
$1 = 42
(gdb) 



Re: Function parameter debug info at -O0

2012-08-06 Thread Senthil Kumar Selvaraj
On Mon, Aug 06, 2012 at 01:40:57PM -0400, Frank Ch. Eigler wrote:
> Senthil Kumar Selvaraj  writes:
> 
> > [...]
> > The following program, when compiled with -O0 -g3 (x86_64 target, but
> > doesn't seem to matter), shows wrong values for p (function parameter)
> > when debugging. [...]
> 
> This sounds like <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51358>.
> 

I guess that's a different issue - that bug talks about debug info for
formal parameters being invalid before they are pushed on the stack. In
the case I described, it stays wrong for the entire body of the
function. 

The core of the problem appears to be that a formal parameter
that is not read from but written to, is allocated space on the stack twice - 
once when expand_used_vars runs, and once when expand_function_start
runs. The debug information generated for the formal parameter only
points to the space allocated by the former, and thus does not reflect
writes to the variable that occur on the space allocated by the
latter.

Regards
Senthil


Excluding dejagnu testcases for subtargets

2012-08-09 Thread Senthil Kumar Selvaraj
Hi,

 What is the recommended way to skip specific (non target specific) testcases 
for a  subtargets?

 There are a bunch of tests in the gcc testsuite that are too big (in
 terms of code size or memory) for a subtarget of the avr target. The
 subtarget is specified in the dejagnu board configuration file
 (set_board_info cflags "-mmcu ").

 Using dg-skip-if with "-mmcu " for the include part did
 not work. Looking at the source (target-supports-dg.exp) showed that it 
 doesn't consider board_info cflags. Only board_info multilib_flags, 
 flags specified in dg-options, $TOOL_OPTIONS and $TEST_ALWAYS_FLAGS 
 appear to be considered.

 Should we set the -mmcu option to  multilib_flags instead of cflags in 
 the board config? Should we use --tool_opt in RUNTESTFLAGS? How do
 other targets handle this?

 Regards
 Senthil




Re: Excluding dejagnu testcases for subtargets

2012-08-11 Thread Senthil Kumar Selvaraj
On Fri, Aug 10, 2012 at 09:54:17AM -0700, Janis Johnson wrote:
> On 08/09/2012 10:52 PM, Senthil Kumar Selvaraj wrote:
> > Hi,
> > 
> >  What is the recommended way to skip specific (non target specific) 
> > testcases for a  subtargets?
> > 
> >  There are a bunch of tests in the gcc testsuite that are too big (in
> >  terms of code size or memory) for a subtarget of the avr target. The
> >  subtarget is specified in the dejagnu board configuration file
> >  (set_board_info cflags "-mmcu ").
> > 
> >  Using dg-skip-if with "-mmcu " for the include part did
> >  not work. Looking at the source (target-supports-dg.exp) showed that it 
> >  doesn't consider board_info cflags. Only board_info multilib_flags, 
> >  flags specified in dg-options, $TOOL_OPTIONS and $TEST_ALWAYS_FLAGS 
> >  appear to be considered.
> > 
> >  Should we set the -mmcu option to  multilib_flags instead of cflags in 
> >  the board config? Should we use --tool_opt in RUNTESTFLAGS? How do
> >  other targets handle this?
> > 
> >  Regards
> >  Senthil
> 
> Probably check-flags in target-supports-dg.exp should check cflags
> in the board_info along with the other flags.  Can you try that to
> see if it does what you need?
> 

Yes it does. The patch below did the job.

Is there a reason why cflags wasn't included before?

Regards
Senthil


diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
b/gcc/testsuite/lib/target-supports-dg.exp
index 2f6c4c2..bdf7476 100644
--- a/gcc/testsuite/lib/target-supports-dg.exp
+++ b/gcc/testsuite/lib/target-supports-dg.exp
@@ -304,6 +304,9 @@ proc check-flags { args } {
 # If running a subset of the test suite, $TEST_ALWAYS_FLAGS may not exist.
 catch {append compiler_flags " $TEST_ALWAYS_FLAGS "}
 set dest [target_info name]
+if [board_info $dest exists cflags] {
+append compiler_flags "[board_info $dest cflags] "
+}
 if [board_info $dest exists multilib_flags] {
append compiler_flags "[board_info $dest multilib_flags] "
 }


Re: Excluding dejagnu testcases for subtargets

2012-08-13 Thread Senthil Kumar Selvaraj
On Sat, Aug 11, 2012 at 09:40:52AM -0700, Janis Johnson wrote:
> On 08/11/2012 09:18 AM, Senthil Kumar Selvaraj wrote:
> > On Fri, Aug 10, 2012 at 09:54:17AM -0700, Janis Johnson wrote:
> >> On 08/09/2012 10:52 PM, Senthil Kumar Selvaraj wrote:
> >>> Hi,
> >>>
> >>>  What is the recommended way to skip specific (non target specific) 
> >>> testcases for a  subtargets?
> >>>
> >>>  There are a bunch of tests in the gcc testsuite that are too big (in
> >>>  terms of code size or memory) for a subtarget of the avr target. The
> >>>  subtarget is specified in the dejagnu board configuration file
> >>>  (set_board_info cflags "-mmcu ").
> >>>
> >>>  Using dg-skip-if with "-mmcu " for the include part did
> >>>  not work. Looking at the source (target-supports-dg.exp) showed that it 
> >>>  doesn't consider board_info cflags. Only board_info multilib_flags, 
> >>>  flags specified in dg-options, $TOOL_OPTIONS and $TEST_ALWAYS_FLAGS 
> >>>  appear to be considered.
> >>>
> >>>  Should we set the -mmcu option to  multilib_flags instead of cflags in 
> >>>  the board config? Should we use --tool_opt in RUNTESTFLAGS? How do
> >>>  other targets handle this?
> >>>
> >>>  Regards
> >>>  Senthil
> >>
> >> Probably check-flags in target-supports-dg.exp should check cflags
> >> in the board_info along with the other flags.  Can you try that to
> >> see if it does what you need?
> >>
> > 
> > Yes it does. The patch below did the job.
> 
> Please submit it, with a ChangeLog entry, to gcc-patc...@gcc.gnu.org.
> 

Sent.
http://gcc.gnu.org/ml/gcc-patches/2012-08/msg00689.html

> > Is there a reason why cflags wasn't included before?
> 
> Because I didn't know about it.  It wasn't intentional.
> 
> Janis
> 
> > Regards
> > Senthil
> > 
> > 
> > diff --git a/gcc/testsuite/lib/target-supports-dg.exp 
> > b/gcc/testsuite/lib/target-supports-dg.exp
> > index 2f6c4c2..bdf7476 100644
> > --- a/gcc/testsuite/lib/target-supports-dg.exp
> > +++ b/gcc/testsuite/lib/target-supports-dg.exp
> > @@ -304,6 +304,9 @@ proc check-flags { args } {
> >  # If running a subset of the test suite, $TEST_ALWAYS_FLAGS may not 
> > exist.
> >  catch {append compiler_flags " $TEST_ALWAYS_FLAGS "}
> >  set dest [target_info name]
> > +if [board_info $dest exists cflags] {
> > +append compiler_flags "[board_info $dest cflags] "
> > +}
> >  if [board_info $dest exists multilib_flags] {
> > append compiler_flags "[board_info $dest multilib_flags] "
> >  }
> 

Regards
Senthil


[AVR] Target specific regression test causes virtual memory exhaustion

2012-09-03 Thread Senthil Kumar Selvaraj
On a 64 bit machine, executing

$ make check RUNTESTFLAGS="avr-torture.exp=builtins-1.c 
--target_board=atxmega128a1"

causes virtual memory allocation failure and/or large scale machine
slowdown, with cc1 using up gobs (>35G) virtual memory.

I tracked this down to

void delay_4 (void)  { __builtin_avr_delay_cycles (-1ul); }

In avr_expand_delay_cycles, the operand is converted into a
HOST_WIDE_UINT, which on a 64 bit machine is 0x. The
range checks therefore get bypassed and the while(cycles >= 2) loop near
the end keeps generating nops, exhausting memory.

Given that avr_init_builtins declares delay_cycles builtin to take
take an unsigned long (which I guess is 4 bytes for the AVR target),
should the code be changed to consider only the lower 4 bytes? Or did I
do something wrong?

Regards
Senthil






[AVR] Missing avr51-flash1.x in avr target specific tests?

2012-10-04 Thread Senthil Kumar Selvaraj
Some tests in gcc/testsuite/gcc.target/avr/torture (builtins-2.c, for
e.g.) have -Tavr51-flash1.x specified in dg-options. The tests currently
fail with an unable to open linker script error for that file.

Is that linker script supposed to be checked into source control? Or am
I missing some configure/build option?

Regards
Senthil


DWARF location descriptor and multi-register frame pointer

2012-11-20 Thread Senthil Kumar Selvaraj
Hi all,

 For the AVR target, the FP register spans two registers, yet the DWARF
 location information (DW_AT_location) for a local variable refers to only one 
 of them.

 Looking at the code, I found that based_loc_descr in dwar2out.c, which handles
 base+offset based location descriptions, gets the register number from
 the rtx representing the reg and directly proceeds to create DW_OP_breg0+regno,
 without considering the mode. This is unlike reg_loc_descriptor, which
 checks hard_regno_nregs[regno][mode] and acts accordingly.

 How are frame pointer registers that span more than one hard register
 handled? Would it be appropriate to check the mode and do a
 multiple_reg_loc_descriptor call or something similar to handle this
 case?

 A related question - why does based_loc_descr generate DW_OP_fbreg 
 only if eliminate_regs eliminates the frame/arg pointer register? I
 traced through the execution for my native (x86_64) compiler, and found
 that the elimination is actually from the soft frame pointer to the hard frame
 pointer. What if HARD_FRAME_POINTER_REGNUM is the same as
 FRAME_POINTER_REGNUM? Shouldn't DW_OP_fbreg be generated for those
 targets too?

 Regards
 Senthil




Clobber REG_CC only for some constraint alternatives?

2020-08-14 Thread Senthil Kumar Selvaraj via Gcc
Hi,

  I'm working on porting AVR to MODE_CC, and there are quite a few
  patterns that clobber the condition code reg only for certain
  constraint alternatives. For e.g.,

(define_insn "mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
 "register_operand (operands[0], mode)
 || reg_or_0_operand (operands[1], mode)"
 {
  return output_movqi (insn, operands, NULL);
 }
 [(set_attr "length" "1,1,5,5,1,1,4")
 (set_attr "adjust_len" "mov8")
 (set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")])

As you can deduce from the (set_attr "cc" ..), only constraint
alternatives 0,2,3 and 6 clobber CC - others leave it unchanged.

My first version of the port adds a post-reload splitter that adds a
(clobber (reg:CC REG_CC)) unconditionally, and it appears to work. If I
do want to add the clobber conditionally, would something like the below
be a good way to do it (get_cc_reg_clobber_rtx returns either const0_rtx
or cc_reg_rtx based on get_attr_cc (insn))? Or is there a better/cleaner way?

(define_insn_and_split "mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
  "register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode)"
  "#"
  "&& reload_completed"
  [(parallel [(set (match_dup 0)
   (match_dup 1))
  (clobber (match_dup 2))])]
  {
operands[2] = get_cc_reg_clobber_rtx (curr_insn);
  }
  [(set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")])

(define_insn "*mov_insn_clobber_flags"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,Qm   ,r ,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,r Y00,Qm,i"))
   (clobber (reg:CC REG_CC))]
  "(register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode))
   && reload_completed"
  {
return output_movqi (insn, operands, NULL);
  }
  [(set_attr "length" "1,5,5,4")
   (set_attr "adjust_len" "mov8")])

(define_insn "*mov_insn_noclobber_flags"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d   ,q,r")
(match_operand:ALL1 1 "nox_general_operand"   "r,n Ynn,r,q"))
   (clobber (const_int 0))]
  "(register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode))
   && reload_completed"
  {
return output_movqi (insn, operands, NULL);
  }
  [(set_attr "length" "1,1,1,1")
   (set_attr "adjust_len" "mov8")])

Regards
Senthil


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Senthil Kumar Selvaraj via Gcc


Pip Cet writes:

> On Sun, Aug 16, 2020 at 12:50 AM Segher Boessenkool
>  wrote:
>> On Sat, Aug 15, 2020 at 10:18:27AM +, Pip Cet wrote:
>> > > > What I'm currently doing is this:
>> > > >
>> > > > (define_split
>> > > >   [(set (match_operand 0 "nonimmediate_operand")
>> > > > (match_operand 1 "nox_general_operand"))]
>> > > >   "reload_completed && mov_clobbers_cc (insn, operands)"
>> > > >   [(parallel [(set (match_dup 0) (match_dup 1))
>> > > >   (clobber (reg:CC REG_CC))])])
>> > > >
>> > > > which, if I understand correctly, introduces the parallel-clobber
>> > > > expression only when necessary, at the same time as post-reload
>> > > > splitters introduce REG_CC references. Is that correct?
>> > >
>> > > Yes.  And this will work correctly if somehow you ensure that REG_CC
>> > > isn't live anywhere you introduce such a clobber.
>> >
>> > IIUC, the idea is that references to REG_CC, except for clobbers, are
>> > only introduced in the post-reload split pass, so it cannot be live
>> > before our define_split runs. Does that make sense?
>>
>> Yes, it does.  It has some huge restrictions (using the reg in inline
>> assembler can never work reliably, for example, so you'll have to
>> disallow that).
>
> Is there any approach that doesn't suffer from that problem? My
> understanding was that we need to allow reload to insert CC-clobbering
> insns on this (and many other) architectures, and that there are so
> many places where reload might choose to do so (including before and
> after inline asm) that using the register prior to reload just isn't
> possible. I'd be glad to be wrong, though :-)
>
> Is it true that reload chooses which constraint alternative is used
> for each insn? Is that information available somehow to post-reload
> splits? The problem is that the "X" constraint matches whatever's
> there already, and doesn't replace it with the (scratch:CC) expression
> that would work, so I can't rewrite those insns not to clobber the CC
> reg.
>
> For example, here's what I currently have:
>
> (define_expand "mov"
>   [(parallel [(set (match_operand:MOVMODE 0 "nonimmediate_operand" "")
>(match_operand:MOVMODE 1 "general_operand" ""))
>   (clobber (reg:CC REG_CC))])]
>   ...)
>
> (define_insn "mov_insn"
>[(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,r  ,d,Qm
>   ,r ,q,r,*r")
> (match_operand:ALL1 1 "nox_general_operand"   "r,Y00,n Ynn,r
> Y00,Qm,r,q,i"))
>(clobber (match_scratch:CC 2 "=X,c,X,c,c,X,X,c"))]
>   ...)
>
> That works, but it results in an incorrect CC clobber for, say,
> register-to-register movqi. For that, I'd need something like
>
> (define_split
>   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
>  (match_operand:ALL1 1 "nox_general_operand"))
>   (clobber (reg:CC REG_CC))])]
>   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
>   [(parallel [(set (match_dup 0)
>(match_dup 1))
>   (clobber (scratch:CC))])])
>
> and so on, for all four constraint alternatives that don't actually
> clobber REG_CC (and also for a fifth which only rarely clobbers
> REG_CC). That's duplicated code, of course.

The (setattr "cc" ...) that is currently present for all
patterns accounts for the constraint alternatives,so  using
get_attr_cc to split to a (clobber) of either cc_reg_rtx or a
gen_rtx_SCRATCH (CCmode) appears to work.

(define_insn_and_split "mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))]
  "register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode)"
  "#"
  "&& reload_completed"
  [(parallel [(set (match_dup 0)
   (match_dup 1))
  (clobber (match_dup 2))])]
  {
operands[2] = get_cc_reg_clobber_rtx (curr_insn);
  }
  [(set_attr "cc" "ldi,none,clobber,clobber,none,none,clobber")])

(define_insn "*mov_insn"
  [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
,q,r,*r")
(match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
Y00,Qm,r,q,i"))
   (clobber (match_scratch:CC 2  "=X,X ,c   ,c 
,X,X,c"))]
  "(register_operand (operands[0], mode)
|| reg_or_0_operand (operands[1], mode))
   && reload_completed"
  {
return output_movqi (insn, operands, NULL);
  }
  [(set_attr "length" "1,1,5,5,1,1,4")
   (set_attr "adjust_len" "mov8")])

As you mentioned elsewhere, keeping the clobber with "X" helps keep a
single pattern for both CC reg clobbering and non-clobbering insns.


>
> All this is at least somewhat academic: the code produced isn't
> drastically better after my cc conversion, but it's not usually any
> worse, and I'm still looking at assembler examples that are pessimized
> a little to find out how to fix them...
>
>> And earlier passes (like combine) cannot optimise this,
>
> I'm not sure what "this" refe

Re: Clobber REG_CC only for some constraint alternatives?

2020-08-17 Thread Senthil Kumar Selvaraj via Gcc


Pip Cet writes:

> On Mon, Aug 17, 2020 at 7:31 AM Senthil Kumar Selvaraj
>  wrote:
>> > (define_split
>> >   [(parallel [(set (match_operand:ALL1 0 "nonimmediate_operand")
>> >  (match_operand:ALL1 1 "nox_general_operand"))
>> >   (clobber (reg:CC REG_CC))])]
>> >   "reload_completed && REG_P (operands[0]) && REG_P (operands[1])"
>> >   [(parallel [(set (match_dup 0)
>> >(match_dup 1))
>> >   (clobber (scratch:CC))])])
>> >
>> > and so on, for all four constraint alternatives that don't actually
>> > clobber REG_CC (and also for a fifth which only rarely clobbers
>> > REG_CC). That's duplicated code, of course.
>>
>> The (setattr "cc" ...) that is currently present for all
>> patterns accounts for the constraint alternatives,so  using
>> get_attr_cc to split to a (clobber) of either cc_reg_rtx or a
>> gen_rtx_SCRATCH (CCmode) appears to work.
>
> Thanks! Using an insn attribute is actually what I ended up doing
> (https://github.com/pipcet/gcc/commit/d4509afae9238e0ade4d3e1e97dd8577dae96115)
> :-)
>
> It's still confusing, IMHO, that insn attributes (but not the
> get_attr_alternative attribute which is mentioned in the
> documentation) are available when which_alternative is not.
>
>> (define_insn "*mov_insn"
>>   [(set (match_operand:ALL1 0 "nonimmediate_operand" "=r,d,Qm   ,r 
>> ,q,r,*r")
>> (match_operand:ALL1 1 "nox_general_operand"   "r Y00,n Ynn,r 
>> Y00,Qm,r,q,i"))
>>(clobber (match_scratch:CC 2  "=X,X ,c   ,c 
>> ,X,X,c"))]
>
> Hmm. Technically, of course, clearing register 0 (a special case of
> the first alternative) would clobber the flags, but as it happens, the
> rewrite above produces the right clobber expression which is simply
> accepted by the "X"... I'm not sure whether anything else is trying to
> recognize such insns, but as it stands that define_insn would
> recognize the incorrect insn:
>
> [(set (reg:QI 0) (const_int 0))
>  (clobber (scratch:CC))]

get_cc_reg_clobber_rtx also looks at the insn itself (i.e. whatever
the erstwhile NOTICE_UPDATE_CC hook did), so if the cc attribute is LDI,
and operands[1] is const0_rtx, it would return cc_reg_rtx as the clobber reg.

AFAICT, some passes after reload verify if operands match constraints,
and that would work in this case - I'm not sure if the pattern you
mentioned could show up, outside of wrong splitters/peepholes in the md file.

Another approach would be to conservatively use a 'c' constraint and
clobber REG_CC for all reg-reg moves.

Regards
Senthil


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-18 Thread Senthil Kumar Selvaraj via Gcc


Hans-Peter Nilsson writes:

> On Fri, 14 Aug 2020, Senthil Kumar Selvaraj via Gcc wrote:
>> As you can deduce from the (set_attr "cc" ..), only constraint
>> alternatives 0,2,3 and 6 clobber CC - others leave it unchanged.
>
> Yes, I recognize that.
>
>> My first version of the port adds a post-reload splitter that adds a
>> (clobber (reg:CC REG_CC)) unconditionally, and it appears to work.
>
> Ouch, temporarily lying to gcc here.
>
>> If I
>> do want to add the clobber conditionally, would something like the below
>> be a good way to do it (get_cc_reg_clobber_rtx returns either const0_rtx
>> or cc_reg_rtx based on get_attr_cc (insn))? Or is there a better/cleaner way?
>
> I suggest having a look at what I did for the CRIS port.
> Check the git history.
>
> In short:
> - Add the clobber initially, to *all* patterns.
> - Add postreload splitters for the special combinations that
> don't clobber CC (ones without clobbering CC).
> - Use the old "cc" attribute to control and generate
> clobber/useful CC-setting alternatives (for various new CC_
> modes) with use of define_subst.

This sounds like a great plan - the idea of always generating insn
variants for however many CCmodes are available, and then using
cc_enabled  to disable them selectively (based on the attribute value
corresponding to the alternative chosen) looks really neat. I did not
understand the purpose of subst_attr for "ccnz" "ccnzvc" and ""
though - wouldn't a (set_attr "cc_enabled", "...") do the same thing?

Also, I already have a version that hides REG_CC until reload (based on
the suggestion in the wiki page), but I guess this approach will work
equally well with that too?

Regards
Senthil


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-19 Thread Senthil Kumar Selvaraj via Gcc


Hans-Peter Nilsson writes:

> On Wed, 19 Aug 2020, Senthil Kumar Selvaraj wrote:
>>
>> Hans-Peter Nilsson writes:
>>
>> > On Fri, 14 Aug 2020, Senthil Kumar Selvaraj via Gcc wrote:
>> >> As you can deduce from the (set_attr "cc" ..), only constraint
>> >> alternatives 0,2,3 and 6 clobber CC - others leave it unchanged.
>> >
>> > Yes, I recognize that.
>> >
>> >> My first version of the port adds a post-reload splitter that adds a
>> >> (clobber (reg:CC REG_CC)) unconditionally, and it appears to work.
>> >
>> > Ouch, temporarily lying to gcc here.
>> >
>> >> If I
>> >> do want to add the clobber conditionally, would something like the below
>> >> be a good way to do it (get_cc_reg_clobber_rtx returns either const0_rtx
>> >> or cc_reg_rtx based on get_attr_cc (insn))? Or is there a better/cleaner 
>> >> way?
>> >
>> > I suggest having a look at what I did for the CRIS port.
>> > Check the git history.
>> >
>> > In short:
>> > - Add the clobber initially, to *all* patterns.
>> > - Add postreload splitters for the special combinations that
>> > don't clobber CC (ones without clobbering CC).
>> > - Use the old "cc" attribute to control and generate
>> > clobber/useful CC-setting alternatives (for various new CC_
>> > modes) with use of define_subst.
>>
>> This sounds like a great plan - the idea of always generating insn
>> variants for however many CCmodes are available, and then using
>> cc_enabled  to disable them selectively (based on the attribute value
>> corresponding to the alternative chosen) looks really neat. I did not
>> understand the purpose of subst_attr for "ccnz" "ccnzvc" and ""
>> though - wouldn't a (set_attr "cc_enabled", "...") do the same thing?
>
> You mean adding a whole new line with separate per-alternative
> settings instead of just a suffix to the name, per instruction,
> to get it compare-elim-optimized?  (Ok, plus a tweak to
> SELECT_CC_MODE; see a33649e).  That's not simpler to me, and
> error-prone without benefits.  Note that the "cc" attribute was
> already there from cc0 times, just as for AVR!
>
> But beware, the cc attributes *might* need tweaking, and also
> note that disabling an alternative for an insn may make gcc pick
> a subsequent one, which may be invalid when matching operands
> for the disabled alternative.
>
> But that's the simplistic and obvious reply, perhaps I'm
> misinterpreting and you mean something else?  For the purpose of
> the different CCmodes, see cris-modes.def, in particular the
> second paragraph, which I'm not repeating here.

My question was a lot more basic. From the internals documentation,
I understand that, with define_subst_attr for setcc, setnz and setnzc,
and the corresponding define_substs,

(define_insn "*movsi_internal ...)

results in a

(define_insn "*movsi_internal ...)

and in addition to that,

(define_insn "*movsi_internal_setcc ...) // output template of setcc_subst
(define_insn "*movsi_internal_setnz ...) // output template of setnz_subst
(define_insn "*movsi_internal_setnzvc ...) // output template of setnzvc_subst

I understood why this was done.

What I didn't understand was the (set-attr "cc")
part - as far I can tell, this results in (set_attr "cc_enabled" ...) in
all of the three substituted patterns, so I wondered why not just have
(set_attr "cc_enabled" ...) in the original define_insn instead.

I now realize that with (set-attr "cc"), the original
unsubstituted pattern will have only a (set_attr "cc" ...) and would
therefore not match the attr check for "enabled" - correctly so, as the
original insn pattern clobbers CRIS_CC0_REGNUM. Did I get that right?

>
>> Also, I already have a version that hides REG_CC until reload (based on
>> the suggestion in the wiki page), but I guess this approach will work
>> equally well with that too?
>
> Ow, I see it does!  I don't know, that's unchartered territory
> to me, but it does seem like visium was written that way.  It
> seems a bit wordy though; the define_split isn't necessary if
> you have the clobber from the start and use define_subst for the
> other alternatives.  You still need splitters for special-cases
> of insns where condition codes aren't affected though.  If these
> "special cases" approach being the norm, then I don't know.

Ok, I'll try the cris approach too, and see if it makes any difference.
>
> IMHO, introducing REG_CC clobbers only *after* reload seems like
> it could backfire.  Doing things behind gcc's back is what we're
> eliminating, not re-introducing, to be dogmatic.
>
> That wiki page mentions using define_subst to avoid all
> (near-)duplicates, but not in its examples.

Yes, I saw that too.

Regards
Senthil


Re: Clobber REG_CC only for some constraint alternatives?

2020-08-20 Thread Senthil Kumar Selvaraj via Gcc


Pip Cet writes:

> On Tue, Aug 18, 2020 at 6:52 AM Senthil Kumar Selvaraj
>  wrote:
>> > recognize such insns, but as it stands that define_insn would
>> > recognize the incorrect insn:
>> >
>> > [(set (reg:QI 0) (const_int 0))
>> >  (clobber (scratch:CC))]
>>
>> get_cc_reg_clobber_rtx also looks at the insn itself (i.e. whatever
>> the erstwhile NOTICE_UPDATE_CC hook did), so if the cc attribute is LDI,
>> and operands[1] is const0_rtx, it would return cc_reg_rtx as the clobber reg.
>>
>> AFAICT, some passes after reload verify if operands match constraints,
>> and that would work in this case - I'm not sure if the pattern you
>> mentioned could show up, outside of wrong splitters/peepholes in the md file.
>
> I don't think they can, but it's still another case of lying to GCC.
>
> At the very least, output_movqi should assert that it isn't asked to
> produce code for this situation.

I agree.
>
>> Another approach would be to conservatively use a 'c' constraint and
>> clobber REG_CC for all reg-reg moves.
>
> I'd prefer splitting the constraint alternatives to have one clear-r0
> alternative, an ldi alternative, and a clear -r1_31 alternative.
>
> As for define_subst, is it really worth it? If I'm reading the
> documentation correctly, it's not powerful enough to deal with scratch
> operands on its own, so we'd probably need three or four variants of
> define_subst just to handle those cases.

Is this related to cmpelim not looking at REG_CC clobber if there are
other (clobber scratch..) preceding it?
>
> I'm probably missing something obvious, but what's the reason for
> keeping the CC-clobbering post-reload splitter when we already have a
> CC-setting one? Are there significant post-reload optimizations that
> can deal with a clobber but not an arbitrary assignment to CC? If so,
> wouldn't it be better to fix those optimizations?

The post-reload splitter introduces the clobber. The wiki
suggests that approach if most insns clobber REG_CC, perhaps because of
the missed optimizations you describe below?
>
> (There are at least some pre-reload optimizations that don't work with
> the CC-clobbering insn patterns. lower_subreg.c's simple_move, for
> example, recognizes only two-operand sets. This resulted in pessimized
> code, but is easy to fix.)

Regards
Senthil