Re: Question regarding usage of vec.h

2020-08-03 Thread Chung-Lin Tang

On 2020/8/3 12:24 AM, Thomas Koenig via Gcc wrote:

Hi,

in a patch I am developing, I have the following code:

typedef struct {
   gfc_code *c;
   int branch_level;
   bool seen_goto;
   vec  extrema;
} do_t;

static vec doloop_list;

[...]


   do_t loop;

   doloop_list.safe_push (loop);
   doloop_list.last().extrema.reserve (1 << doloop_level);

where the last statement segfaults.  It also segfaults if
I replace the last line with

  doloop_list[0].extrema.reserve (1 << doloop_level);

Is there something glaringly obvious I am doing wrong, and
could this be easily fixed?


I think you need to initialize the 'vec extrema' field with vNULL,
or maybe some other constructor.

Chung-Lin


If not, I guess I'll just implement this using standard C
techniques (i.e. malloc/realloc). It will look a bit strange,
but, as a wise man once said, a patch that works beats
an elegant idea every time...

Best regards

 Thomas


Implementing OpenMP 5.0 requires directive

2020-10-30 Thread Chung-Lin Tang

Hi Jakub,
We've been going over how we should implement the requires directive, in a bit 
more complete
sense than the current state (i.e. only atomic_default_mem_order working).

For the three clauses where the specification requires that "must appear in all 
compilation
units of a program that contain device constructs or device routines or in none of 
them":
   - reverse_offload
   - unified_address
   - unified_shared_memory

The current design we're contemplating is to generate a mask variable of these 
3 clauses for
compilation units built with -fopenmp, have them tagged with an attribute to be 
collected
into a special section (e.g. ".gnu.gomp.requires"). Later at runtime device 
startup, have
them checked by the runtime against the capabilities of the libgomp offload 
target.
Cross-checking each word (assuming it is a word that we generate for each 
compile unit)
against each other can also implement the consistency requirement.

(actually, as a first stage implementation, we were hoping to just have the 
special section
implemented, which allows compilation of OpenMP programs using 
requires-directive, and
implement any runtime checking at a later stage)

We hope to check with you first on any design issues. Have you given any 
thought on this
directive?

Thanks,
Chung-Lin


Re: Question abt getting the number of available registers

2011-10-27 Thread Chung-Lin Tang
On 2011/10/27 04:33 PM, Revital Eres wrote:
> Hello,
> 
> I'm working on estimating register pressure in SMS and using
> ira_available_class_regs for getting the number of available
> registers.
> However I encounter a case where ira_available_class_regs showed 64
> available regs for a certain class while ira_class_hard_regs_num
> showed 61 so I am not sure which one I should use. (I'm also not sure
> if ira_class_hard_regs_num  can be used outside of ira-- if not, I
> would need to do that calculation myself)
> 
> Thanks,
> Revital

Looking at the calculations of both values, I would say they should be
the same, unless maybe... REG_ALLOC_ORDER for your target is missing
some registers?

Chung-Lin


Re: [MIPS] Test case dspr2-MULT is failed

2011-01-06 Thread Chung-Lin Tang
On 2010/12/31 09:38 PM, Richard Sandiford wrote:
> Mingjie Xing  writes:
>> There are two test cases failed when run 'make check-gcc
>> RUNTESTFLAGS="mips.exp"'.  The log is,
>>
>> Executing on host: /home/xmj/tools/build-test-trunk-mips/gcc/xgcc
>> -B/home/xmj/tools/build-test-trunk-mips/gcc/
>> /home/xmj/tools/test-trunk/gcc/testsuite/gcc.target/mips/dspr2-MULT.c
>>  -DNOMIPS16=__attribute__((nomips16)) -mabi=32 -mips32r2 -mgp32 -O2
>> -mdspr2 -mtune=74kc -ffixed-hi -ffixed-lo -S  -o dspr2-MULT.s
>> (timeout = 300)
>> PASS: gcc.target/mips/dspr2-MULT.c (test for excess errors)
>> PASS: gcc.target/mips/dspr2-MULT.c scan-assembler \tmult\t
>> PASS: gcc.target/mips/dspr2-MULT.c scan-assembler ac1
>> FAIL: gcc.target/mips/dspr2-MULT.c scan-assembler ac2
>> Executing on host: /home/xmj/tools/build-test-trunk-mips/gcc/xgcc
>> -B/home/xmj/tools/build-test-trunk-mips/gcc/
>> /home/xmj/tools/test-trunk/gcc/testsuite/gcc.target/mips/dspr2-MULTU.c
>>   -DNOMIPS16=__attribute__((nomips16)) -mabi=32 -mips32r2 -mgp32 -O2
>> -mdspr2 -mtune=74kc -ffixed-hi -ffixed-lo -S  -o dspr2-MULTU.s
>> (timeout = 300)
>> PASS: gcc.target/mips/dspr2-MULTU.c (test for excess errors)
>> PASS: gcc.target/mips/dspr2-MULTU.c scan-assembler \tmultu\t
>> PASS: gcc.target/mips/dspr2-MULTU.c scan-assembler ac1
>> FAIL: gcc.target/mips/dspr2-MULTU.c scan-assembler ac2
>>
>> Is it a bug?
> 
> It's a register-allocation optimisation regression that's been around
> for quite a long time now (probably over a year).  We're not making as
> much use of the 4 accumulator registers as we should.
> 
> In truth, I don't think we ever really made good use of them anyway.
> ISTR that trivial modifications of the testcase failed even before
> the regression.

I analyzed this testcase regression a while earlier; the direct cause of
this is due to mips_order_regs_for_local_alloc(), which now serves as
MIPS' ADJUST_REG_ALLOC_ORDER macro.

The mips_order_regs_for_local_alloc() function seems to be written for
the old local-alloc.c, still left as the deprecated
ORDER_REGS_FOR_LOCAL_ALLOC macro after the transition to IRA (actually
not called at all during then), and relatively recently 'revived' after
a patch by Bernd that created the ADJUST_REG_ALLOC_ORDER macro went in.

So you have a local-alloc.c heuristic working in IRA, which seemed to
cause these regressions.

Removing mips_order_regs_for_local_alloc() will let this testcase pass;
of course the real fix should be to review the MIPS reg-ordering logic,
left for you MIPS people...

Chung-Lin


Re: Question about conds attribute for *thumb2_alusi3_short

2013-06-24 Thread Chung-Lin Tang
On 13/6/24 下午11:43, Tom de Vries wrote:
> Richard,
> 
> I've noticed that f.i. *thumb2_alusi3_short has no explicit setting of the 
> conds
> attribute, which means the value of the conds attribute for this insn is 
> nocond:
> ...
> (define_insn "*thumb2_alusi3_short"
>   [(set (match_operand:SI  0 "s_register_operand" "=l")
> (match_operator:SI 3 "thumb_16bit_operator"
>[(match_operand:SI 1 "s_register_operand" "0")
> (match_operand:SI 2 "s_register_operand" "l")]))
>(clobber (reg:CC CC_REGNUM))]
>   "TARGET_THUMB2 && reload_completed
>&& GET_CODE(operands[3]) != PLUS
>&& GET_CODE(operands[3]) != MINUS"
>   "%I3%!\\t%0, %1, %2"
>   [(set_attr "predicable" "yes")
>(set_attr "length" "2")]
> )
> ...
> 
> AFAIU, this insn is either:
> - conditional, and does not modify cc, or
> - unconditional, and sets cc.
> So the clobber of CC in the RTL conservatively describes both cases.
> 
> It seems to me the logical conds setting for the conditional case is nocond, 
> set
> (or perhaps clob) for the unconditional case. So, is this a more accurate 
> value
> of conds for this insn:
> ...
>(set (attr "conds")
> (if_then_else
>   (match_test "GET_CODE (PATTERN (insn)) == COND_EXEC")
>   (const_string "nocond")
>   (const_string "set")))]
> ...
> ?
> 
> Is there a generic need to have this attribute accurate for all insns?

Following this thread that Tom pointed to me earlier in internal discussion:
http://gcc.gnu.org/ml/gcc-patches/2012-02/msg00723.html

If the short-CC-clobbered form is selected very late now, I think this
pattern simply is (or should) not be used for the conditional (within
IT-block) case. It should simply be set to "clob". Predicable might be
set to "no" as well...

Chung-Lin



Altera Nios II port submission

2013-07-22 Thread Chung-Lin Tang
To the GCC Steering Committee,

Mentor Graphics has submitted, and recently re-submitted an updated
version, of a GCC backend port for the Altera Nios II architecture,
currently on gcc-patches awaiting technical review [1].

We're proposing, upon port approval and commit to trunk, Sandra
Loosemore and myself (Chung-Lin Tang), both of Mentor Graphics, as
target maintainers.

Thank you,
Chung-Lin

[1] http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00526.html
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00527.html
http://gcc.gnu.org/ml/gcc-patches/2013-07/msg00528.html


Re: Question about simple_return pattern for the GCC ARM backend.

2013-12-28 Thread Chung-Lin Tang
On 2013/12/28 09:31 AM, Yangfei (Felix) wrote:
> Hi, 
> 
>   I think that simple_return standard pattern is useful for the ARM. I mean 
> it should be good for target code performance.
>  But seems this pattern is not there for the GCC ARM backend. Can anyone 
> explain the reason why we don’t need this?
> 
> Cheers,
> Fei
> 

It does use it. Search for the "return" expand pattern, and
the "returns" code iterator in config/arm/iterators.md.

Chung-Lin



Re: Register allocation cost question

2023-10-10 Thread Chung-Lin Tang via Gcc



On 2023/10/10 11:11 PM, Andrew Stubbs wrote:
> Hi all,
> 
> I'm trying to add a new register set to the GCN port, but I've hit a 
> problem I don't understand.
> 
> There are 256 new registers (each 2048 bit vector register) but the 
> register file has to be divided between all the running hardware 
> threads; if you can use fewer registers you can get more parallelism, 
> which means that it's important that they're allocated in order.
> 
> The problem is that they're not allocated in order. Somehow the IRA pass 
> is calculating different costs for the registers within the class. It 
> seems to prefer registers a32, a96, a160, and a224.
> 
> The internal regno are 448, 512, 576, 640. These are not random numbers! 
> They all have zero for the 6 LSB.
> 
> What could cause this? Did I overrun some magic limit? What target hook 
> might I have miscoded?
> 
> I'm also seeing wrong-code bugs when I allow more than 32 new registers, 
> but that might be an unrelated problem. Or the allocation is broken? I'm 
> still analyzing this.
> 
> If it matters, ... the new registers can't be used for general purposes, 
> so I'm trying to set them up as a temporary spill destination. This 
> means they're typically not busy. It feels like it shouldn't be this 
> hard... :(

Have you tried experimenting with REG_ALLOC_ORDER? I see that the GCN port 
currently isn't using this target macro.

Chung-Lin