[Bug target/85683] New: [8 Regression] GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64]

2018-05-07 Thread dmitry at zend dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85683

Bug ID: 85683
   Summary: [8 Regression] GCC 8 stopped using RMW (Read Modify
Write) instructions on x86[_64]
   Product: gcc
   Version: 8.0.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: dmitry at zend dot com
  Target Milestone: ---

GCC 8 stopped generation of RMW instructions and uses 3 instructions instead of
one. Although, this shouldn't affect executed uops, this increases code size
(pressure to instruction decoder and instruction cache) and requires an extra
register (that may cause extra spills and related degradations). 

=== test_rmw.c ===
typedef struct _refcounted {
unsigned int refcount;
} refcounted;

typedef struct _value {
refcounted *val;
} value;

extern void value_dtor_func(refcounted *val);

void ptr_dtor(value *ptr)
{
refcounted *ref = ptr->val;

--ref->refcount;
if (ref->refcount == 0) {
value_dtor_func(ref);
}
}

===

$ gcc -O2 -S test_rmw.c
$ cat test_rmw.s

movq(%rdi), %rdi
movl(%rdi), %eax
subl$1, %eax
movl%eax, (%rdi)
je  .L4
ret
.L4:
jmp value_dtor_func

[Bug c/43686] New: GCC doesn't duplicate computed gotos for functions marked as "hot"

2010-04-08 Thread dmitry at zend dot com
I've found the bug working on direct threaded interpreter for PHP. Moving from
GCC 4.3 to GCC 4.4 caused a significant performance degradation. Looking into
produced assembler code I realized that GCC 4.4 doesn't replace all jmps to
indirect jmp with indirect jmp itself. The reason is the following new
condition in function duplicate_computed_gotos() bb-reorder.c 

if (!optimize_bb_for_size_p (bb))
  continue;

I thought I would able to fix the problem using "hot" attribute, but
according to this condition, in case I mark function with __attribute__((hot))
duplication doesn't work, and in case I mark it with __attribute__((cold)) it
starts work. As result "hot" function works slower than "cold".

You can use the simplified code to verify it. I ran it with 'gcc -O2 -S
direct.c'

direct.c

#define NEXT goto **ip++
#define guard(n) asm("#" #n)

__attribute__((cold)) void *emu (void **prog)
{
  static void  *labels[] =
{&&next1,&&next2,&&next3,&&next4,&&next5,&&next6,&&next7,&&next8,&&next9,&&loop};
  void **ip;
  intcount;

  if (!prog) {
  return labels;
  }  

  ip=prog;
  count = 1000;


  NEXT;
 next1:
  guard(1);
  NEXT;
 next2:
  guard(2);
  NEXT;
 next3:
  guard(3);
  NEXT;
 next4:
  guard(4);
  NEXT;
 next5:
  guard(5);
  NEXT;
 next6:
  guard(6);
  NEXT;
 next7:
  guard(7);
  NEXT;
 next8:
  guard(8);
  NEXT;
 next9:
  guard(9);
  NEXT;
 loop:
  if (count>0) {
count--;
ip=prog;
NEXT;
  }
  return 0;
}


int main() {
void *prog[]   = {(void*)0,(void*)1,
  (void*)0,(void*)2,
  (void*)0,(void*)3,
  (void*)0,(void*)4,
  (void*)0,(void*)9};
void **labels = emu(0);
int i;
for (i=0; i < sizeof(prog)/sizeof(prog[0]); i++) {
prog[i] = labels[(int)prog[i]];
}
emu(prog);
return 0;
}

I saw that the check causing the slowdown was removed in trunk, however I can't
check that it was done in a proper way.


-- 
   Summary: GCC doesn't duplicate computed gotos for functions
marked as "hot"
   Product: gcc
   Version: 4.4.3
    Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: dmitry at zend dot com
 GCC build triplet: i686-redhat-linux
  GCC host triplet: i686-redhat-linux
GCC target triplet: i686-redhat-linux


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43686



[Bug c/43686] GCC doesn't duplicate computed gotos for functions marked as "hot"

2010-04-08 Thread dmitry at zend dot com


--- Comment #2 from dmitry at zend dot com  2010-04-08 13:54 ---
yes. It's definitely the same issue.

The only additional note that __attribute__((hot)) doesn't fix the problem (as
I would expect tracing down optimize_bb_for_size_p()), but makes an additional
slowdown. In opposite, the __attribute__((cold)) solves the issue. It looks
very strange.

I suppose some condition has to be inverted :)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43686