Register pressure aware loop unrolling

2015-12-10 Thread Shiva Chen
Hi all,

Loop unrolling would decrease performance in some case due to unrolling high 
register pressure
loops and produce lots of spill code.

I try to implement register pressure aware loop 
unrolling(-funroll-loops-pressure-aware)
to avoid unrolling high register pressure loops.

The idea is to calculate live register number for each basic block to estimate 
register pressure.
If the loop contain the basic block with high register pressure, don't 
unrolling it.

Register pressure estimate to high if (live_reg_num > available_hard_reg_num)
like ira and gcse did.

However, loop unrolling may increase the register pressure.
Therefore, when live register number near to available hard register number,
it's not profitable to unroll it.
E.g. live_reg_num = 12, available_hard_reg_num = 14

To estimate the register pressure increment after unrolling, adding parameter 
PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.

The equation become 

high_reg_pressure_p = true if (live_reg_num + 
PARAM_LOOP_UNROLL_PRESSURE_INCREMENT > available_hard_reg_num)

Bootstrapped and tested on x86-64.

Any suggestion ?

Thanks,
Shiva


2015-12-11  Shiva Chen  

* cfgloop.h (struct loop): Add high_reg_pressure_p field.
(UAP_UNROLL_PRESSURE_AWARE): New enums.
* reg-pressure.h (struct bb_data): Data stored for each basic block.
* common.opt: Add new flag -funroll-loops-pressure-aware.
* params.def: Add parameter PARAM_LOOP_UNROLL_PRESSURE_INCREMENT.
* loop-init.c(pass_rtl_unroll_loops:gate):
Enable unrolling while -funroll-loops-pressure-aware enable.
(pass_rtl_unroll_loops:execute): Likewise.
* toplev.c (process_options): Likewise.
* loop-unroll.c (decide_unroll_constant_iterations):
Not unroll the loop with high register pressure 
if -funroll-loops-pressure-aware enable.
(decide_unroll_runtime_iterations): Likewise.
(decide_unroll_stupid): Likewise.
* reg-pressure.c (get_regno_pressure_class): Get pressure class.
(change_pressure): Change register pressure while scan basic blocks.
(calculate_bb_reg_pressure): Calculate register pressure
for each basic block.
(check_register_pressure): Determine high_reg_pressure_p for each loop.
(calculate_loop_pressure): Calculate register pressure for each loop.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d2d09f6..6c62b63 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1384,6 +1384,7 @@ OBJS = \
reginfo.o \
regrename.o \
regstat.o \
+   reg-pressure.o \
reload.o \
reload1.o \
reorg.o \
diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index ee73bf9..c4a06e7 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -191,6 +191,9 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if we should try harder to vectorize this loop.  */
   bool force_vectorize;

+  /* True if the loop have high register pressure.  */
+  bool high_reg_pressure_p;
+
   /* True if the loop is part of an oacc kernels region.  */
   bool in_oacc_kernels_region;

@@ -742,7 +745,8 @@ loop_optimizer_finalize ()
 enum
 {
   UAP_UNROLL = 1,  /* Enables unrolling of loops if it seems profitable.  
*/
-  UAP_UNROLL_ALL = 2   /* Enables unrolling of all loops.  */
+  UAP_UNROLL_ALL = 2,  /* Enables unrolling of all loops.  */
+  UAP_UNROLL_PRESSURE_AWARE = 4/* Enables unrolling of loops with 
pressure aware.  */
 };

 extern void doloop_optimize_loops (void);
diff --git a/gcc/common.opt b/gcc/common.opt
index e593631..a44c7dc 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2425,6 +2425,10 @@ funroll-all-loops
 Common Report Var(flag_unroll_all_loops) Optimization
 Perform loop unrolling for all loops.

+funroll-loops-pressure-aware
+Common Report Var(flag_unroll_loops_pressure_aware) Optimization
+Perform loop unrolling for low register pressure loops
+
 ; Nonzero means that loop optimizer may assume that the induction variables
 ; that control loops do not overflow and that the loops with nontrivial
 ; exit condition are not infinite
diff --git a/gcc/loop-init.c b/gcc/loop-init.c
index e32c94a..232465c 100644
--- a/gcc/loop-init.c
+++ b/gcc/loop-init.c
@@ -560,7 +560,8 @@ public:
   /* opt_pass methods: */
   virtual bool gate (function *)
 {
-  return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops);
+  return (flag_peel_loops || flag_unroll_loops || flag_unroll_all_loops
+ || flag_unroll_loops_pressure_aware);
 }

   virtual unsigned int execute (function *);
@@ -580,6 +581,8 @@ pass_rtl_unroll_loops::execute (function *fun)
flags |= UAP_UNROLL;
   if (flag_unroll_all_loops)
flags |= UAP_UNROLL_ALL;
+  if (flag_unroll_loops_pressure_aware)
+   flags |= UAP_UNROLL_PRESSURE_AWARE;

   unroll_loops (flags);
 }
diff --git a/gcc/loop-unroll.c b/gcc/loop-unroll.c
index 3e26b21..cffd9140 100644
--- a/gcc/lo

[libatomic PATCH] Fix libatomic behavior for big endian toolchain

2014-10-17 Thread Shiva Chen
Hi,

I noticed that libatomic implementation for builtin function parameter
smaller than word.
It would shift the parameter value to word and then store word.
However, the shift amount for big endian would be wrong.
This patch fix libatomic builtin function behavior for big endian toolchain.

Is it ok for trunk ?

Shiva


2014-10-17 Shiva Chen 

Fix libatomic behavior for big endian toolchain
* libatomic/cas_n.c: Fix shift amount for big endian toolchain
* libatomic/config/arm/exch_n.c: Fix shift amount for big endian toolchain
* libatomic/exch_n.c: Fix shift amount for big endian toolchain
* libatomic/fop_n.c: Fix shift amount for big endian toolchain
diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index 801262d..aea49f0 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -60,7 +60,11 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
   if (N < WORDSIZE)
 {
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+#ifdef __ARMEB__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = SIZE(MASK) << shift;
 }
   else
diff --git a/libatomic/config/arm/exch_n.c b/libatomic/config/arm/exch_n.c
index c90d57f..0d71c5a 100644
--- a/libatomic/config/arm/exch_n.c
+++ b/libatomic/config/arm/exch_n.c
@@ -88,7 +88,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel)
   __atomic_thread_fence (__ATOMIC_SEQ_CST);
 
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ INVERT_MASK_1;
+#ifdef __ARMEB__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = MASK_1 << shift;
   wnewval = newval << shift;
 
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index 23558b0..e293d0b 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -77,7 +77,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel)
   if (N < WORDSIZE)
 {
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+#ifdef __ARMEB__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = SIZE(MASK) << shift;
 }
   else
diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index 4a18da9..b3184b7 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -113,7 +113,11 @@ SIZE(C2(libat_fetch_,NAME)) (UTYPE *mptr, UTYPE opval, int 
smodel)
   pre_barrier (smodel);
 
   wptr = (UWORD *)mptr;
+#ifdef __ARMEB__
+  shift = (WORDSIZE - N) * CHAR_BIT;
+#else
   shift = 0;
+#endif
   mask = -1;
 
   wopval = (UWORD)opval << shift;
@@ -137,7 +141,11 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, 
int smodel)
   pre_barrier (smodel);
 
   wptr = (UWORD *)mptr;
+#ifdef __ARMEB__
+  shift = (WORDSIZE - N) * CHAR_BIT;
+#else
   shift = 0;
+#endif
   mask = -1;
 
   wopval = (UWORD)opval << shift;


Re: [libatomic PATCH] Fix libatomic behavior for big endian toolchain

2014-10-17 Thread Shiva Chen
Hi, Joseph

I have been modify the patch as your suggestion
use # if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
in architecture-independent files

Is it ok for trunk ?

And I don't have svn write access
Could you help me to commit this patch?

Shiva

2014-10-17 23:41 GMT+08:00 Joseph S. Myers :
> Changes to architecture-independent files must use
> architecture-independent conditionals, so __BYTE_ORDER__ not __ARMEB__.
>
> --
> Joseph S. Myers
> jos...@codesourcery.com
diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index 801262d..5b5b8d7 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -60,7 +60,11 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, 
UTYPE newval,
   if (N < WORDSIZE)
 {
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = SIZE(MASK) << shift;
 }
   else
diff --git a/libatomic/config/arm/exch_n.c b/libatomic/config/arm/exch_n.c
index c90d57f..0d71c5a 100644
--- a/libatomic/config/arm/exch_n.c
+++ b/libatomic/config/arm/exch_n.c
@@ -88,7 +88,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel)
   __atomic_thread_fence (__ATOMIC_SEQ_CST);
 
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ INVERT_MASK_1;
+#ifdef __ARMEB__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = MASK_1 << shift;
   wnewval = newval << shift;
 
diff --git a/libatomic/exch_n.c b/libatomic/exch_n.c
index 23558b0..4418f1f 100644
--- a/libatomic/exch_n.c
+++ b/libatomic/exch_n.c
@@ -77,7 +77,11 @@ SIZE(libat_exchange) (UTYPE *mptr, UTYPE newval, int smodel)
   if (N < WORDSIZE)
 {
   wptr = (UWORD *)((uintptr_t)mptr & -WORDSIZE);
-  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  shift = ((WORDSIZE - N - ((uintptr_t)mptr % WORDSIZE)) * CHAR_BIT);
+#else
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT);
+#endif
   mask = SIZE(MASK) << shift;
 }
   else
diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index 4a18da9..5d26cde 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -113,7 +113,11 @@ SIZE(C2(libat_fetch_,NAME)) (UTYPE *mptr, UTYPE opval, int 
smodel)
   pre_barrier (smodel);
 
   wptr = (UWORD *)mptr;
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  shift = (WORDSIZE - N) * CHAR_BIT;
+#else
   shift = 0;
+#endif
   mask = -1;
 
   wopval = (UWORD)opval << shift;
@@ -137,7 +141,11 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, 
int smodel)
   pre_barrier (smodel);
 
   wptr = (UWORD *)mptr;
+#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
+  shift = (WORDSIZE - N) * CHAR_BIT;
+#else
   shift = 0;
+#endif
   mask = -1;
 
   wopval = (UWORD)opval << shift;


Changelog
Description: Binary data


aarch64 simd index out of range message not correct on 32 bit host

2015-05-28 Thread Shiva Chen
Hi,

I notice that aarch64 simd index range message not correct on 32 bit host.

The message print by the function aarch64_simd_lane_bounds in aarch64.c.

The function print HOST_WIDE_INT variable by %ld which is correct on
64 bit host.

However, on 32 bit host HOST_WIDE_INT would be long long.

Therefore, print out incorrect message on 32 bit host.

Fix the message by printing HOST_WIDE_INT variables on 32 bit host by %lld.


Shiva
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 083b9b4..49b007e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8903,9 +8903,21 @@ aarch64_simd_lane_bounds (rtx operand, HOST_WIDE_INT 
low, HOST_WIDE_INT high,
   if (lane < low || lane >= high)
   {
 if (exp)
-  error ("%Klane %ld out of range %ld - %ld", exp, lane, low, high - 1);
+  {
+#if INT64_T_IS_LONG
+error ("%Klane %ld out of range %ld - %ld", exp, lane, low, high - 1);
+#else
+error ("%Klane %lld out of range %lld - %lld", exp, lane, low, high - 
1);
+#endif
+  }
 else
-  error ("lane %ld out of range %ld - %ld", lane, low, high - 1);
+  {
+#if INT64_T_IS_LONG
+error ("lane %ld out of range %ld - %ld", lane, low, high - 1);
+#else
+error ("lane %lld out of range %lld - %lld", lane, low, high - 1);
+#endif
+  }
   }
 }
 


ChangeLog.fix_simd_index_message
Description: Binary data


Re: aarch64 simd index out of range message not correct on 32 bit host

2015-05-29 Thread Shiva Chen
Hi, Andrew

You are right, it's much simpler.

Could you give me a tip how %wd works on GCC ?

Could you send a new patch to fix the message since you have better solution ?

Thanks,

Shiva

2015-05-29 15:13 GMT+08:00 Andrew Pinski :
> On Fri, May 29, 2015 at 2:33 PM, Shiva Chen  wrote:
>> Hi,
>>
>> I notice that aarch64 simd index range message not correct on 32 bit host.
>>
>> The message print by the function aarch64_simd_lane_bounds in aarch64.c.
>>
>> The function print HOST_WIDE_INT variable by %ld which is correct on
>> 64 bit host.
>>
>> However, on 32 bit host HOST_WIDE_INT would be long long.
>>
>> Therefore, print out incorrect message on 32 bit host.
>>
>> Fix the message by printing HOST_WIDE_INT variables on 32 bit host by %lld.
>
>
> Actually there is a simpler way.  Since this is error which uses the
> GCC diagnostic format you can just use %wd instead.
> So:
> error ("%Klane %ld out of range %ld - %ld", exp, lane, low, high - 1);
> Becomes:
> error ("%Klane %wd out of range %wd - %wd", exp, lane, low, high - 1);
>
> Thanks,
> Andrew Pinski
>
>>
>>
>> Shiva


Re: aarch64 simd index out of range message not correct on 32 bit host

2015-05-29 Thread Shiva Chen
Hi, Andrew

I modify the patch as you suggestion and testing on 32/64 bit host.

Thanks your tips.

I really appreciate for your help.

Shiva

2015-05-29 15:57 GMT+08:00 Andrew Pinski :
> On Fri, May 29, 2015 at 3:46 PM, Shiva Chen  wrote:
>> Hi, Andrew
>>
>> You are right, it's much simpler.
>>
>> Could you give me a tip how %wd works on GCC ?
>
> the w modifier is the GCC diagnostic format modifier that specifies
> HOST_WIDE_INT.
>
>>
>> Could you send a new patch to fix the message since you have better solution 
>> ?
>
> I don't build using a 32bit host so I can't test it.
>
> Thanks,
> Andrew
>
>>
>> Thanks,
>>
>> Shiva
>>
>> 2015-05-29 15:13 GMT+08:00 Andrew Pinski :
>>> On Fri, May 29, 2015 at 2:33 PM, Shiva Chen  wrote:
>>>> Hi,
>>>>
>>>> I notice that aarch64 simd index range message not correct on 32 bit host.
>>>>
>>>> The message print by the function aarch64_simd_lane_bounds in aarch64.c.
>>>>
>>>> The function print HOST_WIDE_INT variable by %ld which is correct on
>>>> 64 bit host.
>>>>
>>>> However, on 32 bit host HOST_WIDE_INT would be long long.
>>>>
>>>> Therefore, print out incorrect message on 32 bit host.
>>>>
>>>> Fix the message by printing HOST_WIDE_INT variables on 32 bit host by %lld.
>>>
>>>
>>> Actually there is a simpler way.  Since this is error which uses the
>>> GCC diagnostic format you can just use %wd instead.
>>> So:
>>> error ("%Klane %ld out of range %ld - %ld", exp, lane, low, high - 1);
>>> Becomes:
>>> error ("%Klane %wd out of range %wd - %wd", exp, lane, low, high - 1);
>>>
>>> Thanks,
>>> Andrew Pinski
>>>
>>>>
>>>>
>>>> Shiva
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 083b9b4..5343d09 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8903,9 +8903,9 @@ aarch64_simd_lane_bounds (rtx operand, HOST_WIDE_INT low, 
HOST_WIDE_INT high,
   if (lane < low || lane >= high)
   {
 if (exp)
-  error ("%Klane %ld out of range %ld - %ld", exp, lane, low, high - 1);
+  error ("%Klane %wd out of range %wd - %wd", exp, lane, low, high - 1);
 else
-  error ("lane %ld out of range %ld - %ld", lane, low, high - 1);
+  error ("lane %wd out of range %wd - %wd", lane, low, high - 1);
   }
 }
 


ChangeLog.fix_simd_index_message
Description: Binary data


[GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-02 Thread Shiva Chen
Hi,

I noticed that armv8(32 bit target) linux toolchain

run asan testcase would get the following message:


FAIL: c-c++-common/asan/heap-overflow-1.c -O0 output pattern test, is
Executing on host:
/home/gccbuilder-x86/test/mgcc5.0/testsuite/../tools/x86_64/install/bin/qemu-arm
-E 
LD_LIBRARY_PATH=/home/gccbuilder-x86/test/mgcc5.0/Release/install/armv8-marvell-linux-gnueabihf-hard-5.1.1_x86_64/bin/../arm-linux-gnueabihf/libc/lib/arm-linux-gnueabihf:/home/gccbuilder-x86/test/mgcc5.0/Release/install/armv8-marvell-linux-gnueabihf-hard-5.1.1_x86_64/bin/../arm-linux-gnueabihf/libc/usr/lib/arm-linux-gnueabihf
-L 
/home/gccbuilder-x86/test/mgcc5.0/Release/install/armv8-marvell-linux-gnueabihf-hard-5.1.1_x86_64/bin/../arm-linux-gnueabihf/libc
./heap-overflow-1.exe
=
==2182==ERROR: AddressSanitizer: heap-buffer-overflow on address
0xf4a007fa at pc 0x000108a0 bp 0xf6ffc264 sp 0xf6ffc25c
READ of size 1 at 0xf4a007fa thread T0
ASAN:SIGSEGV


sanitizer library use the source in gcc-src/libbacktrace to allocate memory.

The error cause by null pointer reference in libbacktrace/mmap.c


void
backtrace_free (struct backtrace_state *state, void *addr, size_t size,
backtrace_error_callback error_callback ATTRIBUTE_UNUSED,
void *data ATTRIBUTE_UNUSED)
...
  if (locked)
{
  backtrace_free_locked (state, addr, size);

  if (state->threaded) <= line 201
__sync_lock_release (&state->lock_alloc); <= line 202
}
}

.loc 1 201 0
cmp r3, #0 <= r3 contain the value of state->threaded
.loc 1 202 0
addne r3, r5, #32
movne r2, #0
stl r2, [r3] <= should be conditional execution

when r3 is 0, line 202 should not execute.

It seems that stl should generate as stlne.

Otherwise, slt will get null reference when r3 is 0.


To fix the issue, add %? when output stl assembly pattern in sync.md.

Is this patch ok for trunk?

Thanks,
Shiva
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 44cda61..79b039e 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -75,9 +75,9 @@
   {
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release 
(model))
-  return \"ldr\\t%0, %1\";
+  return \"ldr%?\\t%0, %1\";
 else
-  return \"lda\\t%0, %1\";
+  return \"lda%?\\t%0, %1\";
   }
 )
 
@@ -91,9 +91,9 @@
   {
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire 
(model))
-  return \"str\t%1, %0\";
+  return \"str%?\t%1, %0\";
 else
-  return \"stl\t%1, %0\";
+  return \"stl%?\t%1, %0\";
   }
 )
 


ChangeLog.fix_slt_lda_missing_conditional_code
Description: Binary data


Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-03 Thread Shiva Chen
Hi, Ramana

I'm not sure what copyright assignment means ?

Does it mean the patch have copyright assignment or not ?

I update the patch to add "predicable" and  "predicable_short_it"
attribute as suggestion.

However, I don't have svn write access yet.

Shiva

2015-06-03 16:36 GMT+08:00 Kyrill Tkachov :
>
> On 03/06/15 09:32, Ramana Radhakrishnan wrote:
>>>
>>> This pattern is not predicable though, i.e. it doesn't have the
>>> "predicable" attribute set to "yes".
>>> Therefore the compiler should be trying to branch around here rather than
>>> try to do a cond_exec.
>>> Why does the generated code above look like it's converted to conditional
>>> execution?
>>> Could you produce a self-contained reduced testcase for this?
>>
>> CCFSM state machine in ARM state.
>>
>> arm.c (final_prescan_insn).
>
>
> Ah ok.
> This patch makes sense then.
> As Ramana mentioned, please mark the pattern with "predicable" and also set
> the "predicable_short_it" attribute to "no" so that it will not be
> conditionalised in Thumb2 mode or when -mrestrict-it is enabled.
>
> Thanks,
> Kyrill
>
>
>
>>
>> Ramana
>>
>>> Thanks,
>>> Kyrill
>>>
 @@ -91,9 +91,9 @@
   {
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) ||
 is_mm_acquire (model))
 -  return \"str\t%1, %0\";
 +  return \"str%?\t%1, %0\";
 else
 -  return \"stl\t%1, %0\";
 +  return \"stl%?\t%1, %0\";
   }
 )

>
diff --git a/gcc/config/arm/sync.md b/gcc/config/arm/sync.md
index 44cda61..cf8f3a3 100644
--- a/gcc/config/arm/sync.md
+++ b/gcc/config/arm/sync.md
@@ -75,11 +75,12 @@
   {
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_release 
(model))
-  return \"ldr\\t%0, %1\";
+  return \"ldr%?\\t%0, %1\";
 else
-  return \"lda\\t%0, %1\";
+  return \"lda%?\\t%0, %1\";
   }
-)
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])
 
 (define_insn "atomic_store"
   [(set (match_operand:QHSI 0 "memory_operand" "=Q")
@@ -91,11 +92,12 @@
   {
 enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
 if (is_mm_relaxed (model) || is_mm_consume (model) || is_mm_acquire 
(model))
-  return \"str\t%1, %0\";
+  return \"str%?\t%1, %0\";
 else
-  return \"stl\t%1, %0\";
+  return \"stl%?\t%1, %0\";
   }
-)
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])
 
 ;; Note that ldrd and vldr are *not* guaranteed to be single-copy atomic,
 ;; even for a 64-bit aligned address.  Instead we use a ldrexd unparied


Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-03 Thread Shiva Chen
Hi, Ramana

Currently, I work for Marvell and the company have copyright assignment on file.

Hi, all

After adding the attribute and rebuild gcc, I got the assembler error message

load_n.s:39: Error: bad instruction `ldrbeq r0,[r0]'

When i look into armv8 ISA document, it seems ldrb Encoding A1 have
conditional code field.

Does it mean we should also patch assembler or I just miss
understanding something ?

Following command use to generate load_n.s:

/home/shivac/build-system-trunk/Release/build/armv8-marvell-linux-gnueabihf-hard/gcc-final/./gcc/cc1
-fpreprocessed load_n.i -quiet -dumpbase load_n.c -march=armv8-a
-mfloat-abi=hard -mfpu=fp-armv8  -mtls-dialect=gnu -auxbase-strip
.libs/load_1_.o -g3 -O2 -Wall -Werror -version -fPIC -funwind-tables
-o load_n.s


The test.c is a simple test case to reproduce missing conditional code
in mmap.c.

Any suggestion ?


Shiva

2015-06-03 17:29 GMT+08:00 Shiva Chen :
> Hi, Ramana
>
> I'm not sure what copyright assignment means ?
>
> Does it mean the patch have copyright assignment or not ?
>
> I update the patch to add "predicable" and  "predicable_short_it"
> attribute as suggestion.
>
> However, I don't have svn write access yet.
>
> Shiva
>
> 2015-06-03 16:36 GMT+08:00 Kyrill Tkachov :
>>
>> On 03/06/15 09:32, Ramana Radhakrishnan wrote:
>>>>
>>>> This pattern is not predicable though, i.e. it doesn't have the
>>>> "predicable" attribute set to "yes".
>>>> Therefore the compiler should be trying to branch around here rather than
>>>> try to do a cond_exec.
>>>> Why does the generated code above look like it's converted to conditional
>>>> execution?
>>>> Could you produce a self-contained reduced testcase for this?
>>>
>>> CCFSM state machine in ARM state.
>>>
>>> arm.c (final_prescan_insn).
>>
>>
>> Ah ok.
>> This patch makes sense then.
>> As Ramana mentioned, please mark the pattern with "predicable" and also set
>> the "predicable_short_it" attribute to "no" so that it will not be
>> conditionalised in Thumb2 mode or when -mrestrict-it is enabled.
>>
>> Thanks,
>> Kyrill
>>
>>
>>
>>>
>>> Ramana
>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>>> @@ -91,9 +91,9 @@
>>>>>   {
>>>>> enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
>>>>> if (is_mm_relaxed (model) || is_mm_consume (model) ||
>>>>> is_mm_acquire (model))
>>>>> -  return \"str\t%1, %0\";
>>>>> +  return \"str%?\t%1, %0\";
>>>>> else
>>>>> -  return \"stl\t%1, %0\";
>>>>> +  return \"stl%?\t%1, %0\";
>>>>>   }
>>>>> )
>>>>>
>>


load_n.i
Description: Binary data
struct backtrace_state
{
  int threaded;
  int lock_alloc;
};

void foo (struct backtrace_state *state)
{
  if (state->threaded)
__sync_lock_release (&state->lock_alloc);
}



RE: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-04 Thread Shiva Chen
Hi, Kyrill

Thanks for the tips of syntax.

It seems that correct syntax for

ldrb with condition code is ldreqb

ldab with condition code is ldabeq


So I modified the pattern as follow

  {
enum memmodel model = (enum memmodel) INTVAL (operands[2]);
if (model == MEMMODEL_RELAXED
|| model == MEMMODEL_CONSUME
|| model == MEMMODEL_RELEASE)
  return \"ldr%?\\t%0, %1\";
else
  return \"lda%?\\t%0, %1\";
  }
  [(set_attr "predicable" "yes")
   (set_attr "predicable_short_it" "no")])

It seems we don't have to worry about thumb mode, 

Because we already set "predicable" "yes" and predicable_short_it" "no" for the 
pattern.

The new patch could build gcc and run gcc regression test successfully.

Please correct me if I still missing something.

Thanks,

Shiva   

-Original Message-
From: Richard Earnshaw [mailto:richard.earns...@foss.arm.com] 
Sent: Thursday, June 04, 2015 4:42 PM
To: Kyrill Tkachov; Shiva Chen
Cc: Ramana Radhakrishnan; GCC Patches; ni...@redhat.com; Shiva Chen
Subject: Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl 
missing conditional code

On 04/06/15 09:17, Kyrill Tkachov wrote:
> Hi Shiva,
> 
> On 04/06/15 04:13, Shiva Chen wrote:
>> Hi, Ramana
>>
>> Currently, I work for Marvell and the company have copyright assignment on 
>> file.
>>
>> Hi, all
>>
>> After adding the attribute and rebuild gcc, I got the assembler error 
>> message
>>
>> load_n.s:39: Error: bad instruction `ldrbeq r0,[r0]'
>>
>> When i look into armv8 ISA document, it seems ldrb Encoding A1 have 
>> conditional code field.
>>
>> Does it mean we should also patch assembler or I just miss 
>> understanding something ?
>>
>> Following command use to generate load_n.s:
>>
>> /home/shivac/build-system-trunk/Release/build/armv8-marvell-linux-gnu
>> eabihf-hard/gcc-final/./gcc/cc1 -fpreprocessed load_n.i -quiet 
>> -dumpbase load_n.c -march=armv8-a -mfloat-abi=hard -mfpu=fp-armv8  
>> -mtls-dialect=gnu -auxbase-strip .libs/load_1_.o -g3 -O2 -Wall 
>> -Werror -version -fPIC -funwind-tables -o load_n.s
>>
>>
>> The test.c is a simple test case to reproduce missing conditional 
>> code in mmap.c.
>>
>> Any suggestion ?
> 
> I reproduced the assembler failure with your patch.
> 
> The reason is that for arm mode we use divided syntax, where the 
> condition field goes in a different place. So, while ldrbeq r0,[r0] is 
> rejected, ldreqb r0, [r0] works.
> Since we always use divided syntax for arm mode, I think you'll need 
> to put the condition field in the right place depending on arm or thumb mode.
> Ugh, this is becoming ugly :(
> 

Use %(%)\t%1, %0\";

R.

> Kyrill
> 
>>
>>
>> Shiva
>>
>> 2015-06-03 17:29 GMT+08:00 Shiva Chen :
>>> Hi, Ramana
>>>
>>> I'm not sure what copyright assignment means ?
>>>
>>> Does it mean the patch have copyright assignment or not ?
>>>
>>> I update the patch to add "predicable" and  "predicable_short_it"
>>> attribute as suggestion.
>>>
>>> However, I don't have svn write access yet.
>>>
>>> Shiva
>>>
>>> 2015-06-03 16:36 GMT+08:00 Kyrill Tkachov :
>>>> On 03/06/15 09:32, Ramana Radhakrishnan wrote:
>>>>>> This pattern is not predicable though, i.e. it doesn't have the 
>>>>>> "predicable" attribute set to "yes".
>>>>>> Therefore the compiler should be trying to branch around here 
>>>>>> rather than try to do a cond_exec.
>>>>>> Why does the generated code above look like it's converted to 
>>>>>> conditional execution?
>>>>>> Could you produce a self-contained reduced testcase for this?
>>>>> CCFSM state machine in ARM state.
>>>>>
>>>>> arm.c (final_prescan_insn).
>>>>
>>>> Ah ok.
>>>> This patch makes sense then.
>>>> As Ramana mentioned, please mark the pattern with "predicable" and 
>>>> also set the "predicable_short_it" attribute to "no" so that it 
>>>> will not be conditionalised in Thumb2 mode or when -mrestrict-it is 
>>>> enabled.
>>>>
>>>> Thanks,
>>>> Kyrill
>>>>
>>>>
>>>>
>>>>> Ramana
>>>>>
>>>>>> Thanks,
>>>>>> Kyrill
>>>>>>
>>>>>>> @@ -91,9 +91,9 @@
>>>>>>>{
>>>>>>>  enum memmodel model = memmodel_from_int (INTVAL (operands[2]));
>>>>>>>  if (is_mm_relaxed (model) || is_mm_consume (model) || 
>>>>>>> is_mm_acquire (model))
>>>>>>> -  return \"str\t%1, %0\";
>>>>>>> +  return \"str%?\t%1, %0\";
>>>>>>>  else
>>>>>>> -  return \"stl\t%1, %0\";
>>>>>>> +  return \"stl%?\t%1, %0\";
>>>>>>>}
>>>>>>>  )
>>>>>>>
> 



Fix_slt_lda_missing_conditional_code.diff
Description: Fix_slt_lda_missing_conditional_code.diff


Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-05 Thread Shiva Chen
Hi, Kyrill

I update the patch as Richard's suggestion.

-  return \"str\t%1, %0\";
+  return \"str%(%)\t%1, %0\";
 else
-  return \"stl\t%1, %0\";
+  return \"stl%?\t%1, %0\";
   }
-)
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])
+  [(set_attr "predicable" "yes")
+   (set_attr "predicable_short_it" "no")])


Let me sum up.

We add predicable attribute to allow gcc do if-conversion in
ce1/ce2/ce3 not only in final phase by final_prescan_insn finite state
machine.

We set predicalble_short_it to "no" to restrict conditional code
generation on armv8 with thumb mode.

However, we could use the flags -mno-restrict-it to force generating
conditional code on thumb mode.

Therefore, we have to consider the assembly output format for strb
with condition code on arm/thumb mode.

Because arm/thumb mode use different syntax for strb,
we output the assembly as str%(%)
which will put the condition code in the right place according to
TARGET_UNIFIED_ASM.

Is there still missing something ?

Thanks,

Shiva

2015-06-04 18:00 GMT+08:00 Kyrill Tkachov :
> Hi Shiva,
>
> On 04/06/15 10:57, Shiva Chen wrote:
>>
>> Hi, Kyrill
>>
>> Thanks for the tips of syntax.
>>
>> It seems that correct syntax for
>>
>> ldrb with condition code is ldreqb
>>
>> ldab with condition code is ldabeq
>>
>>
>> So I modified the pattern as follow
>>
>>{
>>  enum memmodel model = (enum memmodel) INTVAL (operands[2]);
>>  if (model == MEMMODEL_RELAXED
>>  || model == MEMMODEL_CONSUME
>>  || model == MEMMODEL_RELEASE)
>>return \"ldr%?\\t%0, %1\";
>>  else
>>return \"lda%?\\t%0, %1\";
>>}
>>[(set_attr "predicable" "yes")
>> (set_attr "predicable_short_it" "no")])
>>
>> It seems we don't have to worry about thumb mode,
>
>
> I suggest you use Richard's suggestion from:
>  https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00384.html
> to write this in a clean way.
>
>>
>> Because we already set "predicable" "yes" and predicable_short_it" "no"
>> for the pattern.
>
>
> That's not quite true. The user may compile for armv8-a with
> -mno-restrict-it which will turn off this
> restriction for Thumb and allow the conditional execution of this.
> In any case, I think Richard's suggestion above should work.
>
> Thanks,
> Kyrill
>
>
>>
>> The new patch could build gcc and run gcc regression test successfully.
>>
>> Please correct me if I still missing something.
>>
>> Thanks,
>>
>> Shiva
>>
>> -Original Message-
>> From: Richard Earnshaw [mailto:richard.earns...@foss.arm.com]
>> Sent: Thursday, June 04, 2015 4:42 PM
>> To: Kyrill Tkachov; Shiva Chen
>> Cc: Ramana Radhakrishnan; GCC Patches; ni...@redhat.com; Shiva Chen
>> Subject: Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to
>> stl missing conditional code
>>
>> On 04/06/15 09:17, Kyrill Tkachov wrote:
>>>
>>> Hi Shiva,
>>>
>>> On 04/06/15 04:13, Shiva Chen wrote:
>>>>
>>>> Hi, Ramana
>>>>
>>>> Currently, I work for Marvell and the company have copyright assignment
>>>> on file.
>>>>
>>>> Hi, all
>>>>
>>>> After adding the attribute and rebuild gcc, I got the assembler error
>>>> message
>>>>
>>>> load_n.s:39: Error: bad instruction `ldrbeq r0,[r0]'
>>>>
>>>> When i look into armv8 ISA document, it seems ldrb Encoding A1 have
>>>> conditional code field.
>>>>
>>>> Does it mean we should also patch assembler or I just miss
>>>> understanding something ?
>>>>
>>>> Following command use to generate load_n.s:
>>>>
>>>> /home/shivac/build-system-trunk/Release/build/armv8-marvell-linux-gnu
>>>> eabihf-hard/gcc-final/./gcc/cc1 -fpreprocessed load_n.i -quiet
>>>> -dumpbase load_n.c -march=armv8-a -mfloat-abi=hard -mfpu=fp-armv8
>>>> -mtls-dialect=gnu -auxbase-strip .libs/load_1_.o -g3 -O2 -Wall
>>>> -Werror -version -fPIC -funwind-tables -o load_n.s
>>>>
>>>>
>>>> The test.c is a simple test case to reproduce missing conditional
>>>> code in mmap.c.
>>>>
>>>> Any suggestion ?
>>&

Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to stl missing conditional code

2015-06-05 Thread Shiva Chen
Hi, Kyrill

I add the testcase as stl-cond.c.

Could you help to check the testcase ?

If it's OK, Could you help me to apply the patch ?


Thanks,

Shiva

2015-06-05 16:34 GMT+08:00 Kyrill Tkachov :
> Hi Shiva,
>
> On 05/06/15 09:29, Shiva Chen wrote:
>>
>> Hi, Kyrill
>>
>> I update the patch as Richard's suggestion.
>>
>> -  return \"str\t%1, %0\";
>> +  return \"str%(%)\t%1, %0\";
>>   else
>> -  return \"stl\t%1, %0\";
>> +  return \"stl%?\t%1, %0\";
>> }
>> -)
>> +  [(set_attr "predicable" "yes")
>> +   (set_attr "predicable_short_it" "no")])
>> +  [(set_attr "predicable" "yes")
>> +   (set_attr "predicable_short_it" "no")])
>>
>>
>> Let me sum up.
>>
>> We add predicable attribute to allow gcc do if-conversion in
>> ce1/ce2/ce3 not only in final phase by final_prescan_insn finite state
>> machine.
>>
>> We set predicalble_short_it to "no" to restrict conditional code
>> generation on armv8 with thumb mode.
>>
>> However, we could use the flags -mno-restrict-it to force generating
>> conditional code on thumb mode.
>>
>> Therefore, we have to consider the assembly output format for strb
>> with condition code on arm/thumb mode.
>>
>> Because arm/thumb mode use different syntax for strb,
>> we output the assembly as str%(%)
>> which will put the condition code in the right place according to
>> TARGET_UNIFIED_ASM.
>>
>> Is there still missing something ?
>
>
> That's all correct, and well summarised :)
> The patch looks good to me, but please include the testcase
> (test.c from earlier) appropriately marked up for the testsuite.
> I think to the level of dg-assemble, just so we know everything is
> wired up properly.
>
> Thanks for dealing with this.
> Kyrill
>
>
>>
>> Thanks,
>>
>> Shiva
>>
>> 2015-06-04 18:00 GMT+08:00 Kyrill Tkachov :
>>>
>>> Hi Shiva,
>>>
>>> On 04/06/15 10:57, Shiva Chen wrote:
>>>>
>>>> Hi, Kyrill
>>>>
>>>> Thanks for the tips of syntax.
>>>>
>>>> It seems that correct syntax for
>>>>
>>>> ldrb with condition code is ldreqb
>>>>
>>>> ldab with condition code is ldabeq
>>>>
>>>>
>>>> So I modified the pattern as follow
>>>>
>>>> {
>>>>   enum memmodel model = (enum memmodel) INTVAL (operands[2]);
>>>>   if (model == MEMMODEL_RELAXED
>>>>   || model == MEMMODEL_CONSUME
>>>>   || model == MEMMODEL_RELEASE)
>>>> return \"ldr%?\\t%0, %1\";
>>>>   else
>>>> return \"lda%?\\t%0, %1\";
>>>> }
>>>> [(set_attr "predicable" "yes")
>>>>  (set_attr "predicable_short_it" "no")])
>>>>
>>>> It seems we don't have to worry about thumb mode,
>>>
>>>
>>> I suggest you use Richard's suggestion from:
>>>   https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00384.html
>>> to write this in a clean way.
>>>
>>>> Because we already set "predicable" "yes" and predicable_short_it" "no"
>>>> for the pattern.
>>>
>>>
>>> That's not quite true. The user may compile for armv8-a with
>>> -mno-restrict-it which will turn off this
>>> restriction for Thumb and allow the conditional execution of this.
>>> In any case, I think Richard's suggestion above should work.
>>>
>>> Thanks,
>>> Kyrill
>>>
>>>
>>>> The new patch could build gcc and run gcc regression test successfully.
>>>>
>>>> Please correct me if I still missing something.
>>>>
>>>> Thanks,
>>>>
>>>> Shiva
>>>>
>>>> -Original Message-
>>>> From: Richard Earnshaw [mailto:richard.earns...@foss.arm.com]
>>>> Sent: Thursday, June 04, 2015 4:42 PM
>>>> To: Kyrill Tkachov; Shiva Chen
>>>> Cc: Ramana Radhakrishnan; GCC Patches; ni...@redhat.com; Shiva Chen
>>>> Subject: Re: [GCC, ARM] armv8 linux toolchain asan testcase fail due to
>>>> stl missing conditional code
>>>>
>>>> On 04

Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression

2015-04-20 Thread Shiva Chen
Hi, Jeff

Thanks for your advice.

can_replace_by.patch is the new patch to handle both cases.

pr43920-2.c.244r.jump2.ori is the original  jump2 rtl dump

pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump
after patch  can_replace_by.patch

Could you help me to review the patch?

Thanks again.

Shiva

2015-04-18 0:03 GMT+08:00 Jeff Law :
> On 04/17/2015 03:57 AM, Shiva Chen wrote:
>>
>> Hi,
>>
>> I think the rtl dump in
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916
>> is not jump2 phase rtl dump.
>>
>> Because jump2 is after ira, the register number should be hardware
>> register number.
>>
>> the jump2 rtl dump should as follow
>>
>> ...
>> 31: NOTE_INSN_BASIC_BLOCK 5
>> 32: [r6:SI]=r4:SI
>>REG_DEAD r6:SI
>>REG_DEAD r4:SI
>> 33: [r5:SI]=r0:SI
>>REG_DEAD r5:SI
>>REG_DEAD r0:SI
>>  7: r0:SI=0
>>REG_EQUAL 0
>> 85: use r0:SI
>> 86:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 77: barrier
>> 46: L46:
>> 45: NOTE_INSN_BASIC_BLOCK 6
>>  8: r0:SI=r4:SI
>>REG_DEAD r4:SI
>>REG_EQUAL 0x
>> 87: use r0:SI
>> 88:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 79: barrier
>> 54: L54:
>> 53: NOTE_INSN_BASIC_BLOCK 7
>>  9: r0:SI=0x <== lost REG_EQUAL after patch
>> 34: L34:
>> 35: NOTE_INSN_BASIC_BLOCK 8
>> 41: use r0:SI
>> 90:
>> {return;sp:SI=sp:SI+0x18;r3:SI=[sp:SI];r4:SI=[sp:SI+0x4];r5:SI=[sp:SI+0x8];r6:SI=[sp:SI+0xc];r7:SI=[sp:SI+0x10];pc:SI=[sp:SI+0x14];}
>>REG_UNUSED pc:SI
>>REG_UNUSED r3:SI
>>REG_CFA_RESTORE r7:SI
>>REG_CFA_RESTORE r6:SI
>>REG_CFA_RESTORE r5:SI
>>REG_CFA_RESTORE r4:SI
>>REG_CFA_RESTORE r3:SI
>> 89: barrier
>
> Intead of the slim dump, can you please include the full RTL dump.  I find
> those much easier to read.
>
>
>
>>
>> Possible patch for  can_replace_by in cfgcleanup.c.
>>
>> -  if (!note1 || !note2 || !rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0))
>> -  || !CONST_INT_P (XEXP (note1, 0)))
>> +
>> +  if (!note1 || !CONST_INT_P (XEXP (note1, 0)))
>>   return dir_none;
>>
>> +  if (note2)
>> +{
>> +  if (!rtx_equal_p (XEXP (note1, 0), XEXP (note2, 0)))
>> +   return dir_none;
>> +}
>> +  else
>> +{
>> +  if (!CONST_INT_P (SET_SRC (s2))
>> + || !rtx_equal_p (XEXP (note1, 0), SET_SRC (s2)))
>> +   return dir_none;
>> +}
>> +
>>
>> I'm not sure the idea is ok or it might crash something.
>> Any suggestion would be very helpful.
>
> Seems like you're on a reasonable path to me.  I suggest you stick with it.
>
> Basically what it appears you're trying to do is unify insns from different
> blocks where one looks like
>
> (set x y)  with an attached REG_EQUAL note
>
> And the other looks like
>
> (set x const_int)
>
> Where the REG_EQUAL note has the same value as the const_int in the second
> set.
>
> I think you'd want to handle both cases i1 has the note i2, no note and i1
> has no note and i2 has a note.
>
> Jeff
>
> jeff


can_replace_by.patch
Description: Binary data


pr43920-2.c.244r.jump2.ori
Description: Binary data


pr43920-2.c.244r.jump2.patch_can_replace_by
Description: Binary data


Changelog.can_replace_by
Description: Binary data


Re: [wwwdocs] nds32 documentation - remove broken reference

2019-07-30 Thread Shiva Chen
Hi Gerald,

The update link will be
http://www.andestech.com/en/products-solutions/product-documentation/
Thanks for your kindly remind,
Shiva

Gerald Pfeifer  於 2019年7月28日 週日 下午4:54寫道:

> I could not find an updated link on www.andestech.com, in fact the
> reference I could find there was broken as well.
>
> If anyone has an update link, happy to add that again!
>
> Applied for now.
>
> Gerald
>
> Index: readings.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/readings.html,v
> retrieving revision 1.315
> diff -u -r1.315 readings.html
> --- readings.html   19 Jun 2019 20:27:02 -  1.315
> +++ readings.html   28 Jul 2019 08:52:46 -
> @@ -70,7 +70,6 @@
>   andes (nds32)
>Manufacturer: Various licenses of Andes Technology Corporation.
>CPUs include: AndesCore families N7, N8, SN8, N9, N10, N12 and
> N13.
> -  http://www.andestech.com/product.php?cls=9";>Andes
> Documentation
>GDB includes a simulator for all CPUs.
>   
>
>


Re: [PATCH] [NDS32] Fix nds32_split_ashiftdi3 with large shift amount

2019-03-22 Thread Shiva Chen
Hi Kito,
Thanks for the patch.

Kito Cheng  於 2019年2月22日 週五 下午4:42寫道:

> From: Kito Cheng 
>
> ChangeLog:
> gcc/
> * config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3):
> Fix wrong code gen with large shift amount.
> ---
>  gcc/config/nds32/nds32-md-auxiliary.c | 21 ++---
>  1 file changed, 14 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/nds32/nds32-md-auxiliary.c
> b/gcc/config/nds32/nds32-md-auxiliary.c
> index 33844be..71fa421 100644
> --- a/gcc/config/nds32/nds32-md-auxiliary.c
> +++ b/gcc/config/nds32/nds32-md-auxiliary.c
> @@ -3304,15 +3304,22 @@ nds32_split_ashiftdi3 (rtx dst, rtx src, rtx
> shiftamount)
>ext_start = gen_reg_rtx (SImode);
>
>/*
> -if (shiftamount < 32)
> +# In fact, we want to check shift amonut is great than or equal
> 32,
> +# but in some corner case, the shift amount might be very large
> value,
> +# however we've defined SHIFT_COUNT_TRUNCATED, so GCC think we've
> +# handle that correctly without any truncate.
>
The comments seem that defining SHIFT_COUNT_TRUNCATED will trigger the
issue.
But if we define SHIFT_COUNT_TRUNCATED as 0, the issue still occurs, right?

+# so check the the condition of (shiftamount & 32) is most
> +# safe way to do.
> +if (shiftamount & 32)
> +  dst_low_part = 0
> +  dst_high_part = src_low_part << shiftamount & 0x1f
>
 shiftamount & 0x1f probably could remove because nds32 sll will read low
5-bits as the shift amount.

+else
>dst_low_part = src_low_part << shiftamout
>dst_high_part = wext (src, 32 - shiftamount)
># wext can't handle wext (src, 32) since it's only take rb[0:4]
># for extract.
>dst_high_part = shiftamount == 0 ? src_high_part : dst_high_part
> -else
> -  dst_low_part = 0
> -  dst_high_part = src_low_part << shiftamount & 0x1f
> +
>*/
>
>emit_insn (gen_subsi3 (ext_start,
> @@ -3331,11 +3338,11 @@ nds32_split_ashiftdi3 (rtx dst, rtx src, rtx
> shiftamount)
>emit_insn (gen_ashlsi3 (dst_high_part_g32, src_low_part,
>  new_shift_amout));
>
> -  emit_insn (gen_slt_compare (select_reg, shiftamount, GEN_INT (32)));
> +  emit_insn (gen_andsi3 (select_reg, shiftamount, GEN_INT (32)));
>
> -  emit_insn (gen_cmovnsi (dst_low_part, select_reg,
> +  emit_insn (gen_cmovzsi (dst_low_part, select_reg,
>   dst_low_part_l32, dst_low_part_g32));
> -  emit_insn (gen_cmovnsi (dst_high_part, select_reg,
> +  emit_insn (gen_cmovzsi (dst_high_part, select_reg,
>   dst_high_part_l32, dst_high_part_g32));
>  }
>  }
>
Could you add a test case to illustrate the codgen changed by the patch?

Thanks,
Shiva


Re: [PATCH 2/7] [NDS32] Fix testsuite for nds32 target

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Kito Cheng 
>
> Chung-Ju Wu  
> Shiva Chen  
>
> ChangeLog:
>
> gcc/testsuite/
> * gcc.c-torture/execute/20010122-1.c: Add -malways-save-lp for
> nds32.
> * gcc.c-torture/execute/pr60822.c: Add -mcmodel=large for nds32.
> * gcc.c-torture/execute/pr79286.c: Ditto.
> * gcc.dg/graphite/interchange-15.c: Ditto.
> * gcc.dg/graphite/interchange-3.c: Ditto.
> * gcc.dg/graphite/interchange-7.c: Ditto.
> * gcc.dg/graphite/interchange-9.c: Ditto.
> * gcc.dg/graphite/interchange-mvt.c: Ditto.
> * gcc.dg/graphite/uns-interchange-15.c: Ditto.
> * gcc.dg/graphite/uns-interchange-9.c: Ditto.
> * gcc.dg/graphite/uns-interchange-mvt.c: Ditto.
> * gcc.dg/lower-subreg-1.c: Skip nds32.
> * gcc.dg/stack-usage-1.c: Add -fno-omit-frame-pointer for nds32.
> ---
>  gcc/testsuite/gcc.c-torture/execute/20010122-1.c| 1 +
>  gcc/testsuite/gcc.c-torture/execute/pr60822.c   | 1 +
>  gcc/testsuite/gcc.c-torture/execute/pr79286.c   | 1 +
>  gcc/testsuite/gcc.dg/graphite/interchange-15.c  | 1 +
>  gcc/testsuite/gcc.dg/graphite/interchange-3.c   | 1 +
>  gcc/testsuite/gcc.dg/graphite/interchange-7.c   | 1 +
>  gcc/testsuite/gcc.dg/graphite/interchange-9.c   | 1 +
>  gcc/testsuite/gcc.dg/graphite/interchange-mvt.c | 1 +
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-15.c  | 1 +
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-9.c   | 1 +
>  gcc/testsuite/gcc.dg/graphite/uns-interchange-mvt.c | 1 +
>  gcc/testsuite/gcc.dg/lower-subreg-1.c   | 2 +-
>  gcc/testsuite/gcc.dg/stack-usage-1.c| 1 +
>  13 files changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.c-torture/execute/20010122-1.c
> b/gcc/testsuite/gcc.c-torture/execute/20010122-1.c
> index 4eeb8c7..6cd02bc 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/20010122-1.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/20010122-1.c
> @@ -1,4 +1,5 @@
>  /* { dg-skip-if "requires frame pointers" { *-*-* }
> "-fomit-frame-pointer" "" } */
> +/* { dg-additional-options "-malways-save-lp" { target nds32*-*-* } } */
>  /* { dg-require-effective-target return_address } */
>
>  extern void exit (int);
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr60822.c
> b/gcc/testsuite/gcc.c-torture/execute/pr60822.c
> index dcd2447..f658827 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/pr60822.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr60822.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target int32plus } */
> +/* { dg-options "-mcmodel=large" { target nds32*-*-* } } */
>  struct X {
>  char fill0[80];
>  int a;
> diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79286.c
> b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
> index e6d0e93..fef026d 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/pr79286.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
> @@ -1,3 +1,4 @@
> +/* { dg-options "-mcmodel=large" { target nds32*-*-* } } */
>  int a = 0, c = 0;
>  static int d[][8] = {};
>
> diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-15.c
> b/gcc/testsuite/gcc.dg/graphite/interchange-15.c
> index 7410f29..85d3645 100644
> --- a/gcc/testsuite/gcc.dg/graphite/interchange-15.c
> +++ b/gcc/testsuite/gcc.dg/graphite/interchange-15.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target size32plus } */
> +/* { dg-options "-mcmodel=large" { target nds32*-*-* } } */
>
>  #define DEBUG 0
>  #if DEBUG
> diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-3.c
> b/gcc/testsuite/gcc.dg/graphite/interchange-3.c
> index 4aec824..7e60385 100644
> --- a/gcc/testsuite/gcc.dg/graphite/interchange-3.c
> +++ b/gcc/testsuite/gcc.dg/graphite/interchange-3.c
> @@ -1,4 +1,5 @@
>  /* { dg-require-effective-target size32plus } */
> +/* { dg-options "-mcmodel=large" { target nds32*-*-* } } */
>
>  /* Formerly known as ltrans-3.c */
>
> diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-7.c
> b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
> index 50f7dd7..5be43a5 100644
> --- a/gcc/testsuite/gcc.dg/graphite/interchange-7.c
> +++ b/gcc/testsuite/gcc.dg/graphite/interchange-7.c
> @@ -1,5 +1,6 @@
>  /* { dg-require-effective-target size32plus } */
>  /* { dg-require-stack-size "8*111*" } */
> +/* { dg-options "-mcmodel=large" { target nds32*-*-* } } */
>
>  /* Formerly known as ltrans-8.c */
>
> diff --git a/gcc/testsuite/gcc.dg/graphite/interchange-9.c
> b/gcc/testsuite/gcc.dg/graphite/interchange-9.c
> index 88a3578..dd3bc

Re: [PATCH 1/7] [NDS32] Provide one valid nds32 assembly instruction to check assembler debugging options and features.

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Chung-Ju Wu 
>
> Chung-Ju Wu  
>
> ChangeLog
> gcc/
> * configure.ac: Add nds32 target for dwarf2 debug_line checking.
> * configure: Regenerated.
> ---
>  gcc/configure| 2 +-
>  gcc/configure.ac | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/configure b/gcc/configure
> index ba9c3dc..3dcf775 100755
> --- a/gcc/configure
> +++ b/gcc/configure
> @@ -27813,7 +27813,7 @@ esac
>  # version to the per-target configury.
>  case "$cpu_type" in
>aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | m32c |
> m68k \
> -  | microblaze | mips | nios2 | pa | riscv | rs6000 | score | sparc | spu
> \
> +  | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score |
> sparc | spu \
>| tilegx | tilepro | visium | xstormy16 | xtensa)
>  insn="nop"
>  ;;
> diff --git a/gcc/configure.ac b/gcc/configure.ac
> index b49670a..51f520c 100644
> --- a/gcc/configure.ac
> +++ b/gcc/configure.ac
> @@ -4941,7 +4941,7 @@ esac
>  # version to the per-target configury.
>  case "$cpu_type" in
>aarch64 | alpha | arc | arm | avr | bfin | cris | csky | i386 | m32c |
> m68k \
> -  | microblaze | mips | nios2 | pa | riscv | rs6000 | score | sparc | spu
> \
> +  | microblaze | mips | nds32 | nios2 | pa | riscv | rs6000 | score |
> sparc | spu \
>| tilegx | tilepro | visium | xstormy16 | xtensa)
>  insn="nop"
>  ;;
> --
> 1.8.3.1
>
>


Re: [PATCH 3/7] [NDS32] Rewrite PIC/TLS patterns.

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Monk Chiang 
>
> Monk Chiang  
> Kito Cheng  
> Shiva Chen  
>
> ChangeLog
> gcc/
> * config/nds32/nds32-md-auxiliary.c (nds32_legitimize_pic_address):
> Use new PIC pattern.
> (nds32_legitimize_tls_address): Use new TLS pattern.
> (nds32_output_symrel): New.
> * config/nds32/nds32-protos.h (nds32_output_symrel): Declare.
> (nds32_alloc_relax_group_id): Ditto.
> * config/nds32/nds32-relax-opt.c (nds32_alloc_relax_group_id):
> New.
> (nds32_group_insns): Use nds32_alloc_relax_group_id instead of
> use relax_group_id.
> (nds32_group_tls_insn): Ditto.
> (nds32_group_float_insns): Ditto.
> * config/nds32/nds32.md (tls_le): New.
> (sym_got): Ditto.
>
> gcc/testsuite/
>
> * gcc.target/nds32/pic.c: New.
> * gcc.target/nds32/tls-le.c: Ditto.
> ---
>  gcc/config/nds32/nds32-md-auxiliary.c   | 43
> ++---
>  gcc/config/nds32/nds32-protos.h |  3 +++
>  gcc/config/nds32/nds32-relax-opt.c  | 19 ---
>  gcc/config/nds32/nds32.md   | 27 +
>  gcc/testsuite/gcc.target/nds32/pic.c| 42
> 
>  gcc/testsuite/gcc.target/nds32/tls-le.c | 17 +
>  6 files changed, 128 insertions(+), 23 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/nds32/pic.c
>  create mode 100644 gcc/testsuite/gcc.target/nds32/tls-le.c
>
> diff --git a/gcc/config/nds32/nds32-md-auxiliary.c
> b/gcc/config/nds32/nds32-md-auxiliary.c
> index 3c510cf..35fcc64 100644
> --- a/gcc/config/nds32/nds32-md-auxiliary.c
> +++ b/gcc/config/nds32/nds32-md-auxiliary.c
> @@ -3493,6 +3493,7 @@ nds32_legitimize_pic_address (rtx x)
>rtx addr = x;
>rtx reg = gen_reg_rtx (Pmode);
>rtx pat;
> +  int relax_group_id = nds32_alloc_relax_group_id ();
>
>if (GET_CODE (x) == LABEL_REF
>|| (GET_CODE (x) == SYMBOL_REF
> @@ -3501,16 +3502,14 @@ nds32_legitimize_pic_address (rtx x)
>  {
>addr = gen_rtx_UNSPEC (SImode, gen_rtvec (1, x), UNSPEC_GOTOFF);
>addr = gen_rtx_CONST (SImode, addr);
> -  emit_insn (gen_sethi (reg, addr));
> -  emit_insn (gen_lo_sum (reg, reg, addr));
> +  emit_insn (gen_sym_got (reg, addr, GEN_INT (relax_group_id)));
>x = gen_rtx_PLUS (Pmode, reg, pic_offset_table_rtx);
>  }
>else if (GET_CODE (x) == SYMBOL_REF)
>  {
>addr = gen_rtx_UNSPEC (SImode, gen_rtvec (1, x), UNSPEC_GOT);
>addr = gen_rtx_CONST (SImode, addr);
> -  emit_insn (gen_sethi (reg, addr));
> -  emit_insn (gen_lo_sum (reg, reg, addr));
> +  emit_insn (gen_sym_got (reg, addr, GEN_INT (relax_group_id)));
>
>x = gen_const_mem (SImode, gen_rtx_PLUS (Pmode,
> pic_offset_table_rtx,
>reg));
> @@ -3534,8 +3533,7 @@ nds32_legitimize_pic_address (rtx x)
>   pat = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op0), UNSPEC_GOTOFF);
>   pat = gen_rtx_PLUS (Pmode, pat, op1);
>   pat = gen_rtx_CONST (Pmode, pat);
> - emit_insn (gen_sethi (reg, pat));
> - emit_insn (gen_lo_sum (reg, reg, pat));
> + emit_insn (gen_sym_got (reg, pat, GEN_INT (relax_group_id)));
>   x = gen_rtx_PLUS (Pmode, reg, pic_offset_table_rtx);
> }
>else if (GET_CODE (op0) == SYMBOL_REF
> @@ -3544,8 +3542,8 @@ nds32_legitimize_pic_address (rtx x)
>   /* This is a constant offset from a @GOT symbol reference.  */
>   addr = gen_rtx_UNSPEC (SImode, gen_rtvec (1, op0), UNSPEC_GOT);
>   addr = gen_rtx_CONST (SImode, addr);
> - emit_insn (gen_sethi (reg, addr));
> - emit_insn (gen_lo_sum (reg, reg, addr));
> + emit_insn (gen_sym_got (reg, addr, GEN_INT (relax_group_id)));
> +
>   addr = gen_const_mem (SImode, gen_rtx_PLUS (Pmode,
>   pic_offset_table_rtx,
>   reg));
> @@ -3668,6 +3666,7 @@ nds32_legitimize_tls_address (rtx x)
>rtx tmp_reg;
>rtx tp_reg = gen_rtx_REG (Pmode, TP_REGNUM);
>rtx pat, insns, reg0;
> +  int relax_group_id = nds32_alloc_relax_group_id ();
>
>if (GET_CODE (x) == SYMBOL_REF)
>  switch (SYMBOL_REF_TLS_MODEL (x))
> @@ -3685,7 +3684,7 @@ nds32_legitimize_tls_address (rtx x)
> reg0 = gen_rtx_REG (Pmode, 0);
> /* If we can confirm all clobber reigsters, it doesn't have to use
> call
>instruction.  */
> -   insns = emit_call_insn (gen_tls_desc (pat, GEN_INT (0)));
>

Re: [PATCH 4/7] [NDS32] nds32*-linux target using init_array/finit_array for ctor/dtor.

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Monk Chiang 
>
> Monk Chiang  
>
> ChangeLog
>
> * config.gcc (nds32*-*-linux*): Set gcc_cv_initfini_array to yes.
> ---
>  gcc/config.gcc | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/gcc/config.gcc b/gcc/config.gcc
> index 3eb2e80..a6d63dd 100644
> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -2486,6 +2486,7 @@ nds32*-*-*)
>   nds32*-*-linux*)
> tm_file="dbxelf.h elfos.h ${tm_file} gnu-user.h linux.h
> glibc-stdint.h nds32/linux.h nds32/nds32_intrinsic.h"
> tmake_file="${tmake_file} nds32/t-nds32 nds32/t-linux"
> +   gcc_cv_initfini_array=yes
> ;;
> esac
>
> --
> 1.8.3.1
>
>


Re: [PATCH 6/7] [NDS32] Handle subreg correctly in wext_odd_dep_p.

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Kito Cheng 
>
> Kito Cheng  
> Shiva Chen  
>
> ChangeLog:
> gcc/
> * config/nds32/nds32-pipelines-auxiliary.c (wext_odd_dep_p):
> Handle subreg.
>
> gcc/testsuite
> * gcc.target/nds32/wext-dep.c: New.
> ---
>  gcc/config/nds32/nds32-pipelines-auxiliary.c | 16 
>  gcc/testsuite/gcc.target/nds32/wext-dep.c| 11 +++
>  2 files changed, 23 insertions(+), 4 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/nds32/wext-dep.c
>
> diff --git a/gcc/config/nds32/nds32-pipelines-auxiliary.c
> b/gcc/config/nds32/nds32-pipelines-auxiliary.c
> index 0c043d4..07ba037 100644
> --- a/gcc/config/nds32/nds32-pipelines-auxiliary.c
> +++ b/gcc/config/nds32/nds32-pipelines-auxiliary.c
> @@ -363,14 +363,19 @@ wext_odd_dep_p (rtx insn, rtx def_reg)
>  return reg_overlap_p (def_reg, use_reg);
>
>gcc_assert (REG_P (def_reg) || GET_CODE (def_reg) == SUBREG);
> -  gcc_assert (REG_P (use_reg));
> +  gcc_assert (REG_P (use_reg) || GET_CODE (use_reg) == SUBREG);
>
>if (REG_P (def_reg))
>  {
> -  if (!TARGET_BIG_ENDIAN)
> -   return REGNO (def_reg) == REGNO (use_reg) + 1;
> +  if REG_P (use_reg)
> +   {
> + if (!TARGET_BIG_ENDIAN)
> +   return REGNO (def_reg) == REGNO (use_reg) + 1;
> + else
> +   return REGNO (def_reg) == REGNO (use_reg);
> +   }
>else
> -   return  REGNO (def_reg) == REGNO (use_reg);
> +   return true;
>  }
>
>if (GET_CODE (def_reg) == SUBREG)
> @@ -378,6 +383,9 @@ wext_odd_dep_p (rtx insn, rtx def_reg)
>if (!reg_overlap_p (def_reg, use_reg))
> return false;
>
> +  if (GET_CODE (use_reg) == SUBREG)
> +   return true;
> +
>if (!TARGET_BIG_ENDIAN)
> return SUBREG_BYTE (def_reg) == 4;
>else
> diff --git a/gcc/testsuite/gcc.target/nds32/wext-dep.c
> b/gcc/testsuite/gcc.target/nds32/wext-dep.c
> new file mode 100644
> index 000..2af04d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nds32/wext-dep.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mext-dsp -mcpu=n15" { target nds32*-*-* } } */
> +
> +/* The test case check that wext_odd_dep_p in nds32-pipelines-auxiliary.c
> +   could handle the use_reg is SUBREG. */
> +typedef long long T;
> +typedef T vl_t __attribute__((vector_size(2 * sizeof (T;
> +
> +void foo(vl_t *a, vl_t *b, int c) {
> +  *a = *b >> c;
> +}
> --
> 1.8.3.1
>
>


Re: [PATCH 5/7] [NDS32] Refine force unwind, linux kernel only using RT_SIGRETURN.

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Monk Chiang 
>
> Monk Chiang  
>
> ChangeLog
> libgcc/
> * config/nds32/linux-unwind.h (SIGRETURN): Remove.
> (RT_SIGRETURN): Update.
> (nds32_fallback_frame_state): Update.
> ---
>  libgcc/config/nds32/linux-unwind.h | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/libgcc/config/nds32/linux-unwind.h
> b/libgcc/config/nds32/linux-unwind.h
> index 0c4df4dd..adb6f9f 100644
> --- a/libgcc/config/nds32/linux-unwind.h
> +++ b/libgcc/config/nds32/linux-unwind.h
> @@ -42,8 +42,7 @@ struct _rt_sigframe {
>struct ucontext_t uc;
>  };
>
> -#define SIGRETURN 0xeb0e0a64
> -#define RT_SIGRETURN 0xab150a64
> +#define RT_SIGRETURN 0x8b00f044
>
>  #define MD_FALLBACK_FRAME_STATE_FOR nds32_fallback_frame_state
>
> @@ -74,16 +73,14 @@ nds32_fallback_frame_state (struct _Unwind_Context
> *context,
>
>/* Check if we are going through a signal handler.
>   See arch/nds32/kernel/signal.c implementation.
> -   SWI_SYS_SIGRETURN-> (0xeb0e0a64)
> -   SWI_SYS_RT_SIGRETURN -> (0xab150a64)
>   FIXME: Currently we only handle little endian (EL) case.  */
> -  if (pc[0] == SIGRETURN || pc[0] == RT_SIGRETURN)
> +  if (pc[0] == RT_SIGRETURN)
>  {
>/* Using '_sigfame' memory address to locate kernal's sigcontext.
>  The sigcontext structures in
> arch/nds32/include/asm/sigcontext.h.  */
>struct _rt_sigframe *rt_;
>rt_ = context->cfa;
> -  sc_ = &rt_->sig.uc.uc_mcontext;
> +  sc_ = &rt_->uc.uc_mcontext;
>  }
>else
>  return _URC_END_OF_STACK;
> --
> 1.8.3.1
>
>


Re: [PATCH 7/7] [NDS32] Fix nds32_split_ashiftdi3 with large shift amount

2019-03-31 Thread Shiva Chen
LGTM.

Kito Cheng  於 2019年3月26日 週二 下午1:29寫道:

> From: Kito Cheng 
>
> Kito Cheng  
> Shiva Chen  
>
> ChangeLog:
> gcc/
> * config/nds32/nds32-md-auxiliary.c (nds32_split_ashiftdi3):
> Fix wrong code gen with large shift amount.
>
> gcc/testsuite/
> * gcc.target/nds32/ashiftdi3.c: New.
> ---
>  gcc/config/nds32/nds32-md-auxiliary.c  | 21 ++---
>  gcc/testsuite/gcc.target/nds32/ashiftdi3.c | 23 +++
>  2 files changed, 37 insertions(+), 7 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/nds32/ashiftdi3.c
>
> diff --git a/gcc/config/nds32/nds32-md-auxiliary.c
> b/gcc/config/nds32/nds32-md-auxiliary.c
> index 35fcc64..eadd841 100644
> --- a/gcc/config/nds32/nds32-md-auxiliary.c
> +++ b/gcc/config/nds32/nds32-md-auxiliary.c
> @@ -3304,15 +3304,22 @@ nds32_split_ashiftdi3 (rtx dst, rtx src, rtx
> shiftamount)
>ext_start = gen_reg_rtx (SImode);
>
>/*
> -if (shiftamount < 32)
> +# In fact, we want to check shift amonut is great than or equal
> 32,
> +# but in some corner case, the shift amount might be very large
> value,
> +# however we've defined SHIFT_COUNT_TRUNCATED, so GCC think we've
> +# handle that correctly without any truncate.
> +# so check the the condition of (shiftamount & 32) is most
> +# safe way to do.
> +if (shiftamount & 32)
> +  dst_low_part = 0
> +  dst_high_part = src_low_part << shiftamount & 0x1f
> +else
>dst_low_part = src_low_part << shiftamout
>dst_high_part = wext (src, 32 - shiftamount)
># wext can't handle wext (src, 32) since it's only take rb[0:4]
># for extract.
>dst_high_part = shiftamount == 0 ? src_high_part : dst_high_part
> -else
> -  dst_low_part = 0
> -  dst_high_part = src_low_part << shiftamount & 0x1f
> +
>*/
>
>emit_insn (gen_subsi3 (ext_start,
> @@ -3331,11 +3338,11 @@ nds32_split_ashiftdi3 (rtx dst, rtx src, rtx
> shiftamount)
>emit_insn (gen_ashlsi3 (dst_high_part_g32, src_low_part,
>  new_shift_amout));
>
> -  emit_insn (gen_slt_compare (select_reg, shiftamount, GEN_INT (32)));
> +  emit_insn (gen_andsi3 (select_reg, shiftamount, GEN_INT (32)));
>
> -  emit_insn (gen_cmovnsi (dst_low_part, select_reg,
> +  emit_insn (gen_cmovzsi (dst_low_part, select_reg,
>   dst_low_part_l32, dst_low_part_g32));
> -  emit_insn (gen_cmovnsi (dst_high_part, select_reg,
> +  emit_insn (gen_cmovzsi (dst_high_part, select_reg,
>   dst_high_part_l32, dst_high_part_g32));
>  }
>  }
> diff --git a/gcc/testsuite/gcc.target/nds32/ashiftdi3.c
> b/gcc/testsuite/gcc.target/nds32/ashiftdi3.c
> new file mode 100644
> index 000..e3faff5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/nds32/ashiftdi3.c
> @@ -0,0 +1,23 @@
> +/* { dg-options "-mext-dsp" { target nds32 } } */
> +
> +#include
> +extern void abort (void);
> +
> +unsigned long long int ull;
> +
> +unsigned long long int __attribute__ ((noinline))
> +foo (unsigned long long int a, unsigned long long int b)
> +{
> +  return a << b;
> +}
> +
> +int
> +main (void)
> +{
> +  unsigned long long shiftamt = 0x45806aca;
> +  ull = foo(0xf, shiftamt);
> +  if (ull != 70368744176640)
> +abort ();
> +
> +  return 0;
> +}
> --
> 1.8.3.1
>
>


RE: [PATCH 1/7] SMS remove dependence on doloop: Use loop induction variable analysis in SMS pass

2016-07-12 Thread Shiva Chen
Hi, Jeff

Thanks for the tips.
I update the patches to pass regression test and x86 bootstrap when sms enabled 
as default.

Shiva

-Original Message-
From: Jeff Law [mailto:l...@redhat.com] 
Sent: Thursday, June 23, 2016 12:43 AM
To: Shiva Chen; GCC Patches; bschm...@redhat.com; Shiva Chen
Subject: Re: [PATCH 1/7] SMS remove dependence on doloop: Use loop induction 
variable analysis in SMS pass

On 05/05/2016 12:01 AM, Shiva Chen wrote:
> Hi,
>
> SMS transformation would change the kernel loop iteration count.
> To do this, SMS pass will find the register contain loop count and 
> generate the instructions to adjust loop count.
>
> Currently, SMS will try to find count_reg by recognizing doloop_end 
> pattern which means if the target didn't define doloop_end pattern or 
> the loop not suitable for doloop optimization, SMS pass won't active.
>
> The patch use induction variable analysis to find count_reg instead of 
> find doloop_end pattern.
So these patches need to be bootstrapped and regression tested.  Since SMS is 
not the default on any major platforms, I would recommend first hacking SMS to 
be enabled by default.  That isn't a patch you're going to submit, but instead 
it allows you to do bootstrap and regression testing on x86_64, ppc64 or 
whatever desktop/server machines you have access to.

I did that with patch #1 just to see what would happen and I get an assert 
triggered in generate_prolog_epilog:

   gcc_assert (REG_P (sub_reg));

Where sub_reg and count_reg are:

(subreg:SI (reg:DI 146 [ ivtmp.11 ]) 0)


AFAICT (reg:DI 146) is the actual IV, but the test actually occurs in
SImode:


(insn 80 79 82 7 (parallel [
 (set (reg:DI 146 [ ivtmp.11 ])
 (plus:DI (reg:DI 146 [ ivtmp.11 ])
 (const_int -1 [0x])))
 (clobber (reg:CC 17 flags))
 ]) 212 {*adddi_1}
  (expr_list:REG_UNUSED (reg:CC 17 flags)
 (nil)))

(insn 82 80 83 7 (set (reg:CCGOC 17 flags)
 (compare:CCGOC (subreg:SI (reg:DI 146 [ ivtmp.11 ]) 0)
 (const_int 0 [0]))) 
../../../../gcc/libstdc++-v3/libsupc++/hash_bytes.cc:54 3 {*cmpsi_ccno_1}
  (nil))

(jump_insn 83 82 84 7 (set (pc)
 (if_then_else (ge (reg:CCGOC 17 flags)
 (const_int 0 [0]))
 (label_ref:DI 81)
 (pc))) 
../../../../gcc/libstdc++-v3/libsupc++/hash_bytes.cc:54 635 {*jcc_1}
  (expr_list:REG_DEAD (reg:CCGOC 17 flags)
 (int_list:REG_BR_PROB 8500 (nil)))
  -> 81)


So you either need to be filtering out cases where the IV is in a 
different mode than its test or handling SUBREGs better throughout the code.

I haven't looked deeply at any of the patches yet.  I won't until 
they're in better shape from a correctness standpoint.

Jeff



0001-Use-loop-induction-variable-analysis-in-SMS-pass.patch
Description: 0001-Use-loop-induction-variable-analysis-in-SMS-pass.patch


0002-Use-get_simple_loop_desc-to-get-loop-iteration-count.patch
Description: 0002-Use-get_simple_loop_desc-to-get-loop-iteration-count.patch


0003-Update-loop-versioning-for-the-loop-with-step-1.patch
Description: 0003-Update-loop-versioning-for-the-loop-with-step-1.patch


0004-update-kernel-loop-iteration-adjustment-instruction-.patch
Description: 0004-update-kernel-loop-iteration-adjustment-instruction-.patch


0005-update-generate_prolog_epilog-for-the-case-without-d.patch
Description: 0005-update-generate_prolog_epilog-for-the-case-without-d.patch


0006-skip-the-loop-if-the-loop-size-too-small.patch
Description: 0006-skip-the-loop-if-the-loop-size-too-small.patch


0007-To-identify-READ_WRITE_REG-as-loop-induction-variabl.patch
Description: 0007-To-identify-READ_WRITE_REG-as-loop-induction-variabl.patch


0008-Fix-True-dependence-by-CC-register-wrap-around-and-c.patch
Description: 0008-Fix-True-dependence-by-CC-register-wrap-around-and-c.patch


[PATCH][SMS] SMS use loop induction variable analysis instead of depending on doloop optimization

2016-04-27 Thread Shiva Chen
Hi, 

According to Richard's suggestion in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01240.html
I try to remove the SMS dependency on doloop pass.

SMS would need to adjust kernel loop iteration count
during the transformation.

To adjust loop iteration count, SMS would need to find
count_reg which contain the loop iteration count
and then generate adjustment instruction.

Currently, SMS would find the doloop_end pattern to get
count_reg, and generate adjustment instruction according
to doloop optimization result (tranfer the loop to count
to zero with step = 1).

If can't find doloop_end pattern or the loop form not
the doloop optimization result, the SMS will skip the loop.

Doloop optimization could have benefit for some target
even if the target don't support special loop instruction.

E.g. For arm , doloop optimization could transfer
 the instructions to subs and branch which save the
 comparison instruction.

However, If the loop iteration count computation of
doloop optimization is too complicate, it would drop performance.
(PARAM_MAX_ITERATIONS_COMPUTATION_COST default value is 10
which may too high for the target not support special loop instruction)

This kind loop not suitable for doloop optimization and SMS
can't activate.

To free the SMS dependency on doloop optimization,
I try to use loop induction variable analysis to find count_reg
and generate kernel loop adjustment instruction for the loop
form without doloop optimization(increment/decrement
loop with step != 1).

Without doloop optimization, induction variable could be a
POST_INC/POST_DEC/PRE_INC/PRE_DEC in memory reference which
current implementation won't identify as loop iv. So I modify
relative code in loop-iv.c to identify this case.

With the patch, backend target could active SMS without define
doloop_end pattern.

Could anyone help me to review the patch?
Any suggestion would be very helpful.

Thanks,
Shiva



0001-SMS-use-loop-induction-variable-analysis-instead-of-.patch
Description: 0001-SMS-use-loop-induction-variable-analysis-instead-of-.patch


RE: [PATCH][SMS] SMS use loop induction variable analysis instead of depending on doloop optimization

2016-04-28 Thread Shiva Chen
Hi, 

I fixed some bug to pass testing on x86-64 and update the patch
as 0001-SMS-use-loop-induction-variable-analysis-v1.patch.

Thanks,
Shiva

-Original Message-
From: Shiva Chen 
Sent: Thursday, April 28, 2016 2:07 PM
To: GCC Patches ; Shiva Chen 
Subject: [PATCH][SMS] SMS use loop induction variable analysis instead of 
depending on doloop optimization

Hi, 

According to Richard's suggestion in
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01240.html
I try to remove the SMS dependency on doloop pass.

SMS would need to adjust kernel loop iteration count during the transformation.

To adjust loop iteration count, SMS would need to find count_reg which contain 
the loop iteration count and then generate adjustment instruction.

Currently, SMS would find the doloop_end pattern to get count_reg, and generate 
adjustment instruction according to doloop optimization result (tranfer the 
loop to count to zero with step = 1).

If can't find doloop_end pattern or the loop form not the doloop optimization 
result, the SMS will skip the loop.

Doloop optimization could have benefit for some target even if the target don't 
support special loop instruction.

E.g. For arm , doloop optimization could transfer
 the instructions to subs and branch which save the
 comparison instruction.

However, If the loop iteration count computation of doloop optimization is too 
complicate, it would drop performance.
(PARAM_MAX_ITERATIONS_COMPUTATION_COST default value is 10 which may too high 
for the target not support special loop instruction)

This kind loop not suitable for doloop optimization and SMS can't activate.

To free the SMS dependency on doloop optimization, I try to use loop induction 
variable analysis to find count_reg and generate kernel loop adjustment 
instruction for the loop form without doloop optimization(increment/decrement 
loop with step != 1).

Without doloop optimization, induction variable could be a 
POST_INC/POST_DEC/PRE_INC/PRE_DEC in memory reference which current 
implementation won't identify as loop iv. So I modify relative code in 
loop-iv.c to identify this case.

With the patch, backend target could active SMS without define doloop_end 
pattern.

Could anyone help me to review the patch?
Any suggestion would be very helpful.

Thanks,
Shiva



0001-SMS-use-loop-induction-variable-analysis-v1.patch
Description: 0001-SMS-use-loop-induction-variable-analysis-v1.patch


[PATCH 1/7] SMS remove dependence on doloop: Use loop induction variable analysis in SMS pass

2016-05-04 Thread Shiva Chen
Hi,

SMS transformation would change the kernel loop iteration count.
To do this, SMS pass will find the register contain loop count
and generate the instructions to adjust loop count.

Currently, SMS will try to find count_reg by recognizing doloop_end
pattern which means if the target didn't define doloop_end pattern
or the loop not suitable for doloop optimization, SMS pass won't active.

The patch use induction variable analysis to find count_reg instead of
find doloop_end pattern.

Thanks,
Shiva


0001-Use-loop-induction-variable-analysis-in-SMS-pass.patch
Description: 0001-Use-loop-induction-variable-analysis-in-SMS-pass.patch


[PATCH 3/7] SMS remove dependence on doloop: Update loop-versioning for the loop with step != 1

2016-05-04 Thread Shiva Chen
Hi,

To avoid the loop with loop_count < stage_count execute the SMS
version loop, SMS will try to get loop iteration count.
If the loop iteration count can't get by compile time, SMS will
do loop-versioning which will generate original loop and SMS
version loop with runtime checking code.

With doloop optimization, the loop form will transfer to decrease
to 0 with step 1. Therefore, the loop count will hold in
count_reg in the loop entry.

E.g. (for i = 23; i > 3; i = i - 5) => (for i = 4; i > 0; i--)

The loop versioning condition could generate as

if (count_reg > stage_count)
  execute sms version loop

To remove the dependency on doloop optimization, we need to generate
loop versioning condition without doloop pass.
Without doloop pass, the loop form could be increment/decrement
with step != 1.

The loop versioning condition could generate as

if (count_reg - loop_end_cond > stage_count * step) for decrement loop
  execute sms version loop


Thanks,
Shiva


0003-Update-loop-versioning-for-the-loop-with-step-1.patch
Description: 0003-Update-loop-versioning-for-the-loop-with-step-1.patch


[PATCH 2/7] SMS remove dependence on doloop: Use get_simple_loop_desc to get loop iteration count

2016-05-04 Thread Shiva Chen
Hi,

The loop after SMS transformation would at least execute stage_count
times (due to the duplication in prolog/epilog).
To avoid the loop with loop_count < stage_count execute the sms
version loop, SMS will try to get loop iteration count.

Currently, SMS would get iteration count by finding the instruction
which setup the iteration count. The instruction would generate by
doloop optimization.

To remove the dependency on doloop pass, we generalize the method
by using get_simple_loop_desc function to get loop iteration count.

Thanks,
Shiva


0002-Use-get_simple_loop_desc-to-get-loop-iteration-count.patch
Description: 0002-Use-get_simple_loop_desc-to-get-loop-iteration-count.patch


[PATCH 5/7] SMS remove dependence on doloop: update generate_prolog_epilog for the case without doloop pass

2016-05-04 Thread Shiva Chen
Hi,

SMS transformation will generate prolog/epilog which are part of
the duplication of the instructions in the loop.

Doloop optimization will generate a new register(count_reg) to hold
iteration count. Therefore, there will no count_reg reference in
the loop (except increment/decrement and comparison instructions).
When generating prolog/epilog for SMS, it won't need to
duplicate count_reg increment/decrement instruction.

Without doloop pass, There might have count_reg reference in the loop.
In this case, we have to duplicate increment/decrement instructions
then the reference in prolog/epilog could get correct value.

Thanks,
Shiva


0005-update-generate_prolog_epilog-for-the-case-without-d.patch
Description: 0005-update-generate_prolog_epilog-for-the-case-without-d.patch


[PATCH 6/7] SMS remove dependence on doloop: skip the loop if the loop size too small

2016-05-04 Thread Shiva Chen
Hi,

SMS loop versioning will generate extra condition code.
If SMS could not find enough overlapping in the loop,
it may drop performance.

Adding parameter PARAM_SMS_LOOP_MIN_SIZE with default
value 8 to skip small loops which may not find enough
overlapping instructions.

Thanks, 
Shiva


0006-skip-the-loop-if-the-loop-size-too-small.patch
Description: 0006-skip-the-loop-if-the-loop-size-too-small.patch


[PATCH 7/7] SMS remove dependence on doloop: To identify read/write register as loop induction variable

2016-05-04 Thread Shiva Chen
Hi,

We use loop induction variable analysis to find count_reg.
Without doloop optimization, count_reg might be a READ_WRITE_REG.

E.g.  while (reg < 0x200) { MEM [++reg:SI] = 5}

READ_WRITE_REG won't identify as loop induction variable
and SMS will skip the loop. We modify loop-iv.c to identify this case.

Thanks,
Shiva


0007-To-identify-READ_WRITE_REG-as-loop-induction-variabl.patch
Description: 0007-To-identify-READ_WRITE_REG-as-loop-induction-variabl.patch


[PATCH 4/7] SMS remove dependence on doloop: update kernel loop iteration adjustment instruction generation

2016-05-04 Thread Shiva Chen
Hi,

SMS transformation would change kernel loop iteration count
as new_loop_count = ori_loop_count - stage_count.

With doloop optimization, loop_count will hold in count_reg
in loop entry.

The loop count adjustment instruction could generate as

count_reg = count_reg - stage_count

before entry the loop

or modify the loop count initial instruction
if loop count could know while compile time

count_reg = new_loop_count_constant

To remove the dependency on doloop pass, we need to generate
loop adjustment instruction without doloop pass.

Without doloop pass, there might have count_reg reference
instruction in the loop. Therefore, we couldn't modify
loop count initial value to adjust loop count.

If there is count_reg reference in the loop, we modify loop
end condition to adjust loop count.

Function loop_iteration_adjust_by_replace in the patch try to
replace constant end condition in comparison instruction.

Function gen_loop_iteration_adjust in the patch will generate
loop end condition adjustment instruction if loop end condition
hold in a register.

If there is no count_reg reference in the loop, we will modify
loop count initial value as before.

Thanks, 
Shiva


0004-update-kernel-loop-iteration-adjustment-instruction-.patch
Description: 0004-update-kernel-loop-iteration-adjustment-instruction-.patch


[PATCH 0/7] SMS remove dependence on doloop

2016-05-04 Thread Shiva Chen
Hi,

The patch try to remove SMS dependency on doloop pass.
With the patch, backend don't have to define doloop_end pattern
or activate doloop optimization for SMS pass.

According the review comment from Bernd
https://gcc.gnu.org/ml/gcc-patches/2016-04/msg01960.html

I rewrite some comments and split the patch into following part.

[1/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00370.html
[2/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00372.html
[3/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00371.html
[4/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00374.html
[5/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00373.html
[6/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00376.html
[7/7] https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00375.html

Thanks,
Shiva



Re: [PATCH][SMS] SMS use loop induction variable analysis instead of depending on doloop optimization

2016-05-04 Thread Shiva Chen
Hi, Bernd

Thanks for the review.

>
> You might want to split it up if there are several logically independent 
> pieces. I can't quite make sense of it all, and I'm not too familiar with SMS 
> anyway, so the following is not a complete review, just a selection of issues 
> I observed.

Ok, I have split the patch in several pieces by following link.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00377.html

> There are a large number of formatting and style problems. I'll be pointing 
> out some instances, but please read
>http://www.gnu.org/prep/standards/standards.html#Writing-C
> and self-review your patch before resubmission.

I have rewrite most the the comments.
Please correct me if i still missing something.

>> -  /* We do not handle setting only part of the register.  */
>> -  if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
>> -return GRD_INVALID;
>> -
>
>
> Why this change?

The patch use loop induction variable analysis to find the register
contain loop count.
If the register is a read/write register, it would not identify as
loop iv and SMS would skip the loop.
Removing above condition and adding other code in loop-iv.c to
identify this case.

>>
>>   }
>>
>> +static rtx
>> +get_rhs (rtx_insn *insn, rtx reg)
>
>
> get_rhs might not be the most meaningful function name. We require 
> documentation before every function that says what it does, and what the 
> arguments mean. Please examine the surrounding code for examples.

Ok, i have  rename get_rhs to get_reg_calculate_expr and document
before the function.
>>
>>
>
> This all looks a little odd; if you're looking for autoincs, why not just 
> scan the entire INSN for a MEM, rather than classify it as mem_write or 
> mem_read_insn?

Yes, you're right. I have remove mem_read/write_insn_p and scan the
INSN directly.
>
>
>
> Bernd

Please reference the new submission by following link.
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg00375.html

Thanks,
Shiva


Re: [PATCH 7/7] SMS remove dependence on doloop: To identify read/write register as loop induction variable

2016-05-18 Thread Shiva Chen
Hi, Bernd

2016-05-13 20:33 GMT+08:00 Bernd Schmidt :
> On 05/05/2016 08:03 AM, Shiva Chen wrote:
>>
>>
>> -  /* We do not handle setting only part of the register.  */
>> -  if (DF_REF_FLAGS (adef) & DF_REF_READ_WRITE)
>> -return GRD_INVALID;
>> -
> This isn't right, at least not without other changes. This prevents using
> references where the register is set as part of a ZERO_EXTRACT or
> STRICT_LOW_PART, and you seem not to handle such cases in the new code.
>
> I think this interface is pretty bad. I think what you should do is check in
> the caller whether the def is of READ_WRITE type. If so, do a
> FOR_EACH_SUBRTX thing to look for a memory reference containing an autoinc
> expression, otherwise this new function (without the MEM parts) could be
> used. That'll also solve the problem above with the loss of
> DF_REF_READ_WRITE checking.
>
> The walk to find the appropriate MEM would be in a separate function, used
> in the several places that

Yes, the interface is bad.
I split the function into read_write_reg_mem and get_reg_calculate_expr.
The function read_write_reg_mem try to find autoinc in memory
reference by FOR_EACH_SUBRTX if the def is READ_WRITE type which fix
the lost of checking as your suggestion.

>> +  /* Find REG increment/decrement expr in following pattern
>> +
>> + (parallel
>> +   (CC = compare (REG - 1, 0))
>> +   (REG = REG - 1))
>> +   */
>
>
> Comment formatting, "*/" doesn't go on its own line. I think it's best not
> to quote a pattern here since you're not actually looking for it.  A better
> comment would be "For identifying ivs, we can actually ignore any other SETs
> occurring in parallel, so look for one with the correct SET_DEST."  I'm not
> actually sure, however, whether this is valid. It would depend on how this
> information is used later on, and what transformation other passes would
> want to do on the ivs.  Come to think of it, this is also true for autoinc
> ivs, but I guess these would only appear when run late as analysis for
> modulo-sched, so I guess that would be OK?
Thanks for the correctness of the comment.

>> @@ -2354,6 +2489,20 @@ iv_number_of_iterations (struct loop *loop,
>> rtx_insn *insn, rtx condition,
>>   goto fail;
>>
>> op0 = XEXP (condition, 0);
>> +
>> +  /* We would use loop induction variable analysis
>> + to get loop iteration count in SMS pass
>> + which should work with/without doloop pass
>> + active or not.
>> +
>> + With doloop pass enabled, doloop pass would
>> + generate pattern as (ne (REG - 1), 0) and
>> + we recognize it by following code.  */
>> +  if (GET_CODE (op0) == PLUS
>> +  && CONST_INT_P (XEXP (op0, 1))
>> +  && REG_P (XEXP (op0, 0)))
>> +op0 = XEXP (op0, 0);
>
>
> That really looks somewhat suspicious, and I can't really tell whether it's
> right. My instinct says no; how can you just drop a constant on the floor?

I think your instinct is right.
op0 and op1 split from loop end condition will through to iv_analyze
and get iv0 and iv1.
The following code will use iv0 and iv1 to estimate loop iteration count.
I think the correct think to do is transfer  (ne (REG - 1), 0)  to (ne
REG, 1) which means setting op0 = REG, op1 = 1.
Then iv1.base could set correctly by iv_analyze.

Thanks for your comments, they're truly helpful.
please correct me if i still missing something.

Shiva


0007-To-identify-READ_WRITE_REG-as-loop-induction-variable-v1.patch
Description: Binary data


[libatomic PATCH] [PING] Fix libatomic behavior for big endian toolchain

2015-01-07 Thread Shiva Chen
PING

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01763.html

2014-10-17 Shiva Chen 

Fix libatomic behavior for big endian toolchain
* cas_n.c: Fix shift amount for big endian toolchain
* config/arm/exch_n.c: Fix shift amount for big endian toolchain
* exch_n.c: Fix shift amount for big endian toolchain
* fop_n.c: Fix shift amount for big endian toolchain


Shiva


Re: LRA assign same hard register with live range overlapped pseduos

2013-04-22 Thread Shiva Chen
Hi, Vladimir

I add the code and the patch has passed bootstrap/regression test
on i686-pc-linux-gnu with current main trunk.

However, I don't have svn write access yet.
Could you please help to commit this patch?

Thanks for your comment and your help. I really appreciate it.


A plaintext gcc/ChangeLog is as below:

2013-04-23  Shiva Chen  

* lra-assigns.c (find_hard_regno_for): Use lra_reg_val_equal_p
to check the register content is equal or not.
* lra-constraints.c (match_reload): Use lra_assign_reg_val
to assign register content record.
* lra-eliminations.c (update_reg_eliminate): Use lra_set_up_reg_val
to update register content offset.
* lra-int.h (struct lra_reg): Add offset member.
(lra_reg_val_equal_p): New static inline function.
(lra_set_up_reg_val): New static inline function.
(lra_assign_reg_val): New static inline function.
* lra.c (lra_create_new_reg): Use lra_assign_reg_val
to assign register content record.
(initialize_lra_reg_info_element): Initial offset to zero

2013/4/23 Vladimir Makarov :
> On 13-04-22 2:26 AM, Shiva Chen wrote:
>>
>> Hi, Vladimir
>>
>> I write the new patch as your suggestion.
>> Could you help me to check is there something missing ?
>
> I think there is one more place to use lra_assign_reg_val:
>
> lra.c::lra_create_new_reg
>
> Please add the code and right changelog entry for the patch and you can
> commit the patch into trunk.
>
> Thanks, Shiva.
>
>> Thanks, Shiva
>>
>>   gcc/lra-assigns.c  |   12 +++-
>>   gcc/lra-constraints.c  |5 ++---
>>   gcc/lra-eliminations.c |   10 --
>>   gcc/lra-int.h  |   33 +
>>   gcc/lra.c  |1 +
>>   5 files changed, 51 insertions(+), 10 deletions(-)
>>
>> diff --git a/gcc/lra-assigns.c b/gcc/lra-assigns.c
>> index b204513..3f8a899 100644
>> --- a/gcc/lra-assigns.c
>> +++ b/gcc/lra-assigns.c
>> @@ -448,7 +448,7 @@ find_hard_regno_for (int regno, int *cost, int
>> try_only_hard_regno)
>> int hr, conflict_hr, nregs;
>> enum machine_mode biggest_mode;
>> unsigned int k, conflict_regno;
>> -  int val, biggest_nregs, nregs_diff;
>> +  int offset, val, biggest_nregs, nregs_diff;
>> enum reg_class rclass;
>> bitmap_iterator bi;
>> bool *rclass_intersect_p;
>> @@ -508,9 +508,10 @@ find_hard_regno_for (int regno, int *cost, int
>> try_only_hard_regno)
>>   #endif
>> sparseset_clear_bit (conflict_reload_and_inheritance_pseudos, regno);
>> val = lra_reg_info[regno].val;
>> +  offset = lra_reg_info[regno].offset;
>> CLEAR_HARD_REG_SET (impossible_start_hard_regs);
>> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos,
>> conflict_regno)
>> -if (val == lra_reg_info[conflict_regno].val)
>> +if (lra_reg_val_equal_p (conflict_regno, val, offset))
>> {
>>  conflict_hr = live_pseudos_reg_renumber[conflict_regno];
>>  nregs = (hard_regno_nregs[conflict_hr]
>> @@ -538,7 +539,7 @@ find_hard_regno_for (int regno, int *cost, int
>> try_only_hard_regno)
>> }
>> EXECUTE_IF_SET_IN_SPARSESET (conflict_reload_and_inheritance_pseudos,
>> conflict_regno)
>> -if (val != lra_reg_info[conflict_regno].val)
>> +if (!lra_reg_val_equal_p (conflict_regno, val, offset))
>> {
>>  lra_assert (live_pseudos_reg_renumber[conflict_regno] < 0);
>>  if ((hard_regno
>> @@ -1007,7 +1008,7 @@
>> setup_live_pseudos_and_spill_after_risky_transforms (bitmap
>>   {
>> int p, i, j, n, regno, hard_regno;
>> unsigned int k, conflict_regno;
>> -  int val;
>> +  int val, offset;
>> HARD_REG_SET conflict_set;
>> enum machine_mode mode;
>> lra_live_range_t r;
>> @@ -1050,8 +1051,9 @@
>> setup_live_pseudos_and_spill_after_risky_transforms (bitmap
>> COPY_HARD_REG_SET (conflict_set, lra_no_alloc_regs);
>> IOR_HARD_REG_SET (conflict_set,
>> lra_reg_info[regno].conflict_hard_regs);
>> val = lra_reg_info[regno].val;
>> +  offset = lra_reg_info[regno].offset;
>> EXECUTE_IF_SET_IN_SPARSESET (live_range_hard_reg_pseudos,
>> conflict_regno)
>> -   if (val != lra_reg_info[conflict_regno].val
>> +   if (!lra_reg_val_equal_p (conflict_regno, val, offset)
>>  /* If it is multi-register pseudos they should start on
>> the same hard register.  */
>>  || hard_regno != reg_renumber[