Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richard and Richi.
> 
> Currently, we are statically returning vectorization factor in 
> 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> according to compile option.
> 
> For example:
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
> 
> with --param=riscv-autovec-lmul = m1:
> 
> vsetvli a5,a2,e32,m1,ta,ma
> vle32.v v2,0(a0)
> vle32.v v1,0(a1)
> vsetvli a6,zero,e32,m1,ta,ma
> slli a3,a5,2
> vadd.vv v1,v1,v2
> sub a2,a2,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> The 'vadd.vv' is only performing operations on a single register.
> 
> with --param=riscv-autovec-lmul=m8:
> 
>   vsetvli a5,a2,e8,m2,ta,ma
>   vle32.v v16,0(a0)
>   vle32.v v8,0(a1)
>   vsetvli a6,zero,e32,m8,ta,ma
>   slli a3,a5,2
>   vadd.vv v8,v8,v16
>   vsetvli zero,a2,e32,m8,ta,ma
>   sub a2,a2,a5
>   vse32.v v8,0(a4)
>   add a0,a0,a3
>   add a1,a1,a3
>   add a4,a4,a3
>   bne a2,zero,.L3
> 
> The 'vadd.vv' here is performing operations on 8 consecutive registers:
> 
> vadd.vv [v8 - v15], [v8 - v15], [v16 - v23]
> 
> Users statically set the vectorization factor is not ideal.
> 
> We want GCC to dynamic choose vectorization factor to do the 
> auto-vectorization according to loop analysis.
> 
> Currently, I have implement simplistic loop analysis like analyze live range 
> of each local decl of current function.
> 
> Here is the analysis, we have 32 vector registers for RVV.
> So we calculate the live range of current function local decl:
> 
> the number of decls live at the same time * LMUL <= 32. 
> According to this analysis, I set the vectorization factor in 
> TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> 
> Then this simplistic algorithm (implemented in RISC-V backend) work well for 
> the testcases I produces.
> 
> However, I can only choose optimal vectorization for whole function but 
> failed to specific loop.
> 
> Here is the example:
> 
> void foo2 (int32_t *__restrict a,
>   int32_t *__restrict b,
>   int32_t *__restrict c,
>   int32_t *__restrict a2,
>   int32_t *__restrict b2,
>   int32_t *__restrict c2,
>   int32_t *__restrict a3,
>   int32_t *__restrict b3,
>   int32_t *__restrict c3,
>   int32_t *__restrict a4,
>   int32_t *__restrict b4,
>   int32_t *__restrict c4,
>   int32_t *__restrict a5,
>   int32_t *__restrict b5,
>   int32_t *__restrict c5,
>   int n)
> {
> // Loop 1
> for (int i = 0; i < n; i++)
>a[i] = a[i] + b[i];
> // Loop 2
> for (int i = 0; i < n; i++){
>   a[i] = b[i] + c[i];
>   a2[i] = b2[i] + c2[i];
>   a3[i] = b3[i] + c3[i];
>   a4[i] = b4[i] + c4[i];
>   a5[i] = a[i] + a4[i];
>   a[i] = a3[i] + a2[i]+ a5[i];
> }
> }
> 
> Loop 1 we can aggressively choose LMUL = 8, but Loop 2 should choose LMUL = 4 
> (since LMUL = 8 will cause vector register spillings).
> 
> If I split loop 1 and loop 2 into 2 separate functions, my algorithm works 
> well.
> 
> However, if we put these 2 loop in the same function, I finally pick LMUL = 4 
> for both loop 1 and loop 2 since as I said above, I do the analysis base on 
> function not base
> on the loop.
> 
> I am struggling whether we could have a good idea for such issue. Can we pass 
> through loop_vec_info
> to 'preferred_simd_mode' target hook?

That's not how it's currently designed to work - there's
the autovectorize_vector_modes hook where you should provide a vector
of modes the vectorizer iterates over and return VECT_COMPARE_COST
if you want to evaluate costs between choices.  Your analysis should
then happen in the finish_cost method.

That's how it's currently designed.  It might not be optimal for
compile-time reasons when there are many modes, giving the target
more control (and context) might be possible.

Richard.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Thanks Richi.

I am trying to figure out how to adjust finish_cost to lower the LMUL

For example:

void
foo (int32_t *__restrict a, int32_t *__restrict b, int n)
{
  for (int i = 0; i < n; i++)
a[i] = a[i] + b[i];
}

preferred_simd_mode pick LMUL = 8 (RVVM8SImode)

Is is possible that we can adjust the COST in finish cost make Loop vectorizer 
pick LMUL = 4?

I am experimenting with this following cost:

  if (loop_vinfo)
{
  if (loop_vinfo->vector_mode == RVVM8SImode)
{
  m_costs[vect_prologue] = 2;
  m_costs[vect_body] = 20;
  m_costs[vect_epilogue] = 2;
}
  else
{
  m_costs[vect_prologue] = 1;
  m_costs[vect_body] = 1;
  m_costs[vect_epilogue] = 1;
}
}

I increase LMUL = 8 cost. The codegen is odd:

foo:
ble a2,zero,.L12
addiw a5,a2,-1
li a4,30
sext.w t1,a2
bleu a5,a4,.L7
srliw a7,t1,5
slli a7,a7,7
li a4,32
add a7,a7,a0
mv a5,a0
mv a3,a1
vsetvli zero,a4,e32,m8,ta,ma
.L4:
vle32.v v8,0(a5)
vle32.v v16,0(a3)
vadd.vv v8,v8,v16
vse32.v v8,0(a5)
addi a5,a5,128
addi a3,a3,128
bne a5,a7,.L4
andi a2,a2,-32
beq t1,a2,.L14
.L3:
slli a4,a2,32
subw a5,t1,a2
srli a4,a4,32
slli a5,a5,32
slli a4,a4,2
srli a5,a5,32
add a0,a0,a4
add a1,a1,a4
vsetvli a4,a5,e8,m1,ta,ma
vle32.v v8,0(a0)
vle32.v v4,0(a1)
vsetvli a2,zero,e32,m4,ta,ma
vadd.vv v4,v4,v8
vsetvli zero,a5,e32,m4,ta,ma
vse32.v v4,0(a0)
sub a3,a5,a4
beq a5,a4,.L12
slli a4,a4,2
vsetvli zero,a3,e8,m1,ta,ma
add a0,a0,a4
add a1,a1,a4
vle32.v v4,0(a0)
vle32.v v8,0(a1)
vsetvli a2,zero,e32,m4,ta,ma
vadd.vv v4,v4,v8
vsetvli zero,a3,e32,m4,ta,ma
vse32.v v4,0(a0)
.L12:
ret
.L7:
li a2,0
j .L3
.L14:
ret

I hope it can generate the code like this:

foo:
ble a2,zero,.L5
mv a4,a0
.L3:
vsetvli a5,a2,e32,m4,ta,ma
vle32.v v8,0(a0)
vle32.v v4,0(a1)
vsetvli a6,zero,e32,m4,ta,ma
slli a3,a5,2
vadd.vv v4,v4,v8
sub a2,a2,a5
vsetvli zero,a5,e32,m4,ta,ma
vse32.v v4,0(a4)
add a0,a0,a3
add a1,a1,a3
add a4,a4,a3
bne a2,zero,.L3
.L5:
ret

I am experimenting whether we can adjust cost statically to make loop 
vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8.
If we can do that, I think we can apply analysis and then adjust the cost 
according to analysis.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-31 15:38
To: juzhe.zh...@rivai.ai
CC: gcc; richard.sandiford
Subject: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi, Richard and Richi.
> 
> Currently, we are statically returning vectorization factor in 
> 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> according to compile option.
> 
> For example:
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
> 
> with --param=riscv-autovec-lmul = m1:
> 
> vsetvli a5,a2,e32,m1,ta,ma
> vle32.v v2,0(a0)
> vle32.v v1,0(a1)
> vsetvli a6,zero,e32,m1,ta,ma
> slli a3,a5,2
> vadd.vv v1,v1,v2
> sub a2,a2,a5
> vsetvli zero,a5,e32,m1,ta,ma
> vse32.v v1,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> The 'vadd.vv' is only performing operations on a single register.
> 
> with --param=riscv-autovec-lmul=m8:
> 
>   vsetvli a5,a2,e8,m2,ta,ma
>   vle32.v v16,0(a0)
>   vle32.v v8,0(a1)
>   vsetvli a6,zero,e32,m8,ta,ma
>   slli a3,a5,2
>   vadd.vv v8,v8,v16
>   vsetvli zero,a2,e32,m8,ta,ma
>   sub a2,a2,a5
>   vse32.v v8,0(a4)
>   add a0,a0,a3
>   add a1,a1,a3
>   add a4,a4,a3
>   bne a2,zero,.L3
> 
> The 'vadd.vv' here is performing operations on 8 consecutive registers:
> 
> vadd.vv [v8 - v15], [v8 - v15], [v16 - v23]
> 
> Users statically set the vectorization factor is not ideal.
> 
> We want GCC to dynamic choose vectorization factor to do the 
> auto-vectorization according to loop analysis.
> 
> Currently, I have implement simplistic loop analysis like analyze live range 
> of each local decl of current function.
> 
> Here is the analysis, we have 32 vector registers for RVV.
> So we calculate the live range of current function local decl:
> 
> the number of decls live at the same time * LMUL <= 32. 
> According to this analysis, I set the vectorization factor in 
> TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> 
> Then this simplistic algorithm (implemented in RISC-V backend) work well for 
> the testcases I produces.
> 
> However, I can only choose optimal vectorization for whole function but 
> failed to specific loop.
> 
> Here is the example:
> 
> void foo2 (int32_t *__restrict a,
>   int32_t *__restrict b,
>   int32_t *__restrict c,
>   int32_t *__restrict a2,
>   int32_t *__restrict b2,
>   int32_t *__restrict c2,
>   int32_t *__restrict a3,
>   int32_t *__restrict b3,
>   int32_t *__restrict c3,
>   int32_t *__restrict a4,
>   int32_t *__restrict b4,
>   int32_t *__restrict c4,
>   int32_t *__restrict a5,
>   int32_t *__restrict b5,
>   int32_t *

Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Sandiford via Gcc
"juzhe.zh...@rivai.ai"  writes:
> Thanks Richi.
>
> I am trying to figure out how to adjust finish_cost to lower the LMUL
>
> For example:
>
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
>
> preferred_simd_mode pick LMUL = 8 (RVVM8SImode)

But is the LMUL decided by the mode?  Like Richard says, the vectoriser
already provides a way of trying vectorisation with different modes and
picking the best one, via autovectorize_vector_modes, VECT_COMPARE_COST,
and the cost structures.  preferred_simd_mode then just picks the first
mode to try -- the choide isn't final.

The idea is that you get to see what vectorisation looks like with
multiple mode choices, and can pick the best one.

It's not clear from your reply whether you've tried that or not.

Thanks,
Richard


Re: analyzer: Weekly update on extending C++ support (3)

2023-08-31 Thread Benjamin Priour via Gcc
Hi David,


On Wed, Aug 30, 2023 at 1:01 AM David Malcolm  wrote:

>
> > ?
>
> Yes; please submit it, so that we can work towards getting what you
> have into trunk.
>
> Given that we don't properly support C++ yet, improvements to C++
> support don't have to be perfect.
>
>
It is next in queue for regstrapping, and great news, I nailed the
"supersedes" issue correctly.
I'll split it in two patches, first one for operator new per se that should
fix PR105948,
and all non user-defined variants of operator new should be supported, with
and without
exceptions enabled. Second will be a fix of PR110830, to prevent
superseding warnings
when their saved_diagnostic path are actually divergent.

My current implementation of the latter is a bit naive but works, thus I'm
trying to improve
its running time.


> >
> > About generalizing tests to C++, I'll soon have a second batch of
> > similar
> > size ready,
> > probably by Monday. I try to find exactly one "real" bug to build
> > each
> > batch around, to not simply
> > have a collection of C made C++ friendly.
>
> (nods)
>
> Thanks for pushing the 1st patch.  I've updated my working copies to
> try to ensure that my new tests go in c-c++-common as far as possible.
>
> >
> > A few questions on that point.
> >
> > Test gcc.dg/analyzer/pr103892.c requires -O2.
> > As such the IPA generated from C and C++ differ,
> > C++ optimization cuts off function argstr_get_word from the IPA,
> > hence reducing the number of SN nodes from the SG.
> > This lesser number of SN is sufficient to make the analysis trips
> > over
> > being too-complex.
> > The current formula for that upper limit is
> > limit = m_sg.num_nodes () * param_analyzer_bb_explosion_factor;
> > Thus shorter programs - such as the analyzer tests - are more likely
> > to
> > be diagnosed as too complex. To avoid false positives perhaps we
> > should
> > add an offset, so that short SG get their chance ?
>
> That's an interesting idea...
>
> > This is just a tweak, and pr103892.c could as well be discarded for
> > C++,
> > divergent IPA between C and C++ are unavoidable at some point, and
> > some
> > tests won't make the transition anyway.
>
> ...but this approach is simpler, so maybe go with that.
>
>
Nods.

>
> > In gcc.dg/analyzer/malloc-1.c:test_50{b,c}, C++ is imprecise as of
> > the
> > memory freed.
> >
> > void test_50b (void)
> > {
> >   free ((void *) test_50a); /* { dg-warning "'free' of '&test_50a'
> > which
> > points to memory not on the heap \\\[CWE-590\\\]" } */
> >   /* { dg-bogus "'free' of 'test_50a' which points to memory not on
> > the
> > heap \\\[CWE-590\\\]" "c++ lacks precision of freed memory" { xfail
> > c++ }
> > .-1 } */
> > }
> >
> > void test_50c (void)
> > {
> >  my_label:
> >   free (&&my_label); /* { dg-warning "'free' of '&my_label' which
> > points to
> > memory not on the heap \\\[CWE-590\\\]" } */
> >   /* { dg-warning "'free' of '&& my_label' which points to memory not
> > on
> > the heap \\\[CWE-590\\\]" "" { xfail c++ } .-1 } */
> > }
> >
> > What do you think of this ?
> > I've checked malloc_state_machine::handle_free_of_non_heap, arg and
> > diag_arg are strictly equal.
> > There might be something to be done with how a tree is transformed
> > into a
> > diagnostic tree by get_diagnostic_tree,
> > but I'll wait for your feedback first.
>
> What does g++ emit for this with your updated test?
>
>
I'm not sure what you meant here.
For a free ((void *) test_50a); gcc emits "'free' of '&test_50a'", whereas
g++ emits "'free' of 'test_50a'",
which is less precise about the actually memory freed. This however only
seems to occur with this
function pointers and labels.


> >
> > Test gcc.dg/analyzer/pr94362-1.c actually has an additional
> > null_deref
> > warning in C++, which is not affected by exceptions
> > or optimizations. I will keep updated on that one.
>
> [...snip...; I see you covered this in a followup]
>
>
The fix worked and even a few other XFAILs pass in other tests.
I am regstrapping the second batch of transitioned test,
following up with the "operator new" stuff.
I had an issue with the regstrap, I don't why but comparing the *.sum
files from my control and patched versions gives me unintelligible output.
It warns me about previous tests having disappeared, even some totally
unrelated to the analyzer, although all of them are still here and manually
running
dejagnu on each of them - one by one - gives the exact same output as
control.

So I'm cleaning up my control and patched folders, and re-regstrap
everything.
Something broke and I don't know what.


> BTW, was there any other work of yours that I need to review? (I know
> about the work on placement-new and testsuite migration)
>
> Thanks again for your work on this
> Dave
>
>
Thanks,
Benjamin.


Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Thanks Richi.
> 
> I am trying to figure out how to adjust finish_cost to lower the LMUL
> 
> For example:
> 
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
> 
> preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> 
> Is is possible that we can adjust the COST in finish cost make Loop 
> vectorizer pick LMUL = 4?

I see you have a autovectorize_vector_modes hook and you use
VECT_COMPARE_COSTS.  So the appropriate place would be to
amend your vector_costs::better_main_loop_than_p.

> I am experimenting with this following cost:
> 
>   if (loop_vinfo)
> {
>   if (loop_vinfo->vector_mode == RVVM8SImode)
> {
>   m_costs[vect_prologue] = 2;
>   m_costs[vect_body] = 20;
>   m_costs[vect_epilogue] = 2;
> }
>   else
> {
>   m_costs[vect_prologue] = 1;
>   m_costs[vect_body] = 1;
>   m_costs[vect_epilogue] = 1;
> }
> }
> 
> I increase LMUL = 8 cost. The codegen is odd:
> 
> foo:
> ble a2,zero,.L12
> addiw a5,a2,-1
> li a4,30
> sext.w t1,a2
> bleu a5,a4,.L7
> srliw a7,t1,5
> slli a7,a7,7
> li a4,32
> add a7,a7,a0
> mv a5,a0
> mv a3,a1
> vsetvli zero,a4,e32,m8,ta,ma
> .L4:
> vle32.v v8,0(a5)
> vle32.v v16,0(a3)
> vadd.vv v8,v8,v16
> vse32.v v8,0(a5)
> addi a5,a5,128
> addi a3,a3,128
> bne a5,a7,.L4
> andi a2,a2,-32
> beq t1,a2,.L14
> .L3:
> slli a4,a2,32
> subw a5,t1,a2
> srli a4,a4,32
> slli a5,a5,32
> slli a4,a4,2
> srli a5,a5,32
> add a0,a0,a4
> add a1,a1,a4
> vsetvli a4,a5,e8,m1,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a2,zero,e32,m4,ta,ma
> vadd.vv v4,v4,v8
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a0)
> sub a3,a5,a4
> beq a5,a4,.L12
> slli a4,a4,2
> vsetvli zero,a3,e8,m1,ta,ma
> add a0,a0,a4
> add a1,a1,a4
> vle32.v v4,0(a0)
> vle32.v v8,0(a1)
> vsetvli a2,zero,e32,m4,ta,ma
> vadd.vv v4,v4,v8
> vsetvli zero,a3,e32,m4,ta,ma
> vse32.v v4,0(a0)
> .L12:
> ret
> .L7:
> li a2,0
> j .L3
> .L14:
> ret
> 
> I hope it can generate the code like this:
> 
> foo:
> ble a2,zero,.L5
> mv a4,a0
> .L3:
> vsetvli a5,a2,e32,m4,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a6,zero,e32,m4,ta,ma
> slli a3,a5,2
> vadd.vv v4,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> .L5:
> ret
> 
> I am experimenting whether we can adjust cost statically to make loop 
> vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8. 
> If we can do that, I think we can apply analysis and then adjust the 
> cost according to analysis.
>
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 15:38
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richard and Richi.
> > 
> > Currently, we are statically returning vectorization factor in 
> > 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> > according to compile option.
> > 
> > For example:
> > void
> > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > a[i] = a[i] + b[i];
> > }
> > 
> > with --param=riscv-autovec-lmul = m1:
> > 
> > vsetvli a5,a2,e32,m1,ta,ma
> > vle32.v v2,0(a0)
> > vle32.v v1,0(a1)
> > vsetvli a6,zero,e32,m1,ta,ma
> > slli a3,a5,2
> > vadd.vv v1,v1,v2
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > 
> > The 'vadd.vv' is only performing operations on a single register.
> > 
> > with --param=riscv-autovec-lmul=m8:
> > 
> >   vsetvli a5,a2,e8,m2,ta,ma
> >   vle32.v v16,0(a0)
> >   vle32.v v8,0(a1)
> >   vsetvli a6,zero,e32,m8,ta,ma
> >   slli a3,a5,2
> >   vadd.vv v8,v8,v16
> >   vsetvli zero,a2,e32,m8,ta,ma
> >   sub a2,a2,a5
> >   vse32.v v8,0(a4)
> >   add a0,a0,a3
> >   add a1,a1,a3
> >   add a4,a4,a3
> >   bne a2,zero,.L3
> > 
> > The 'vadd.vv' here is performing operations on 8 consecutive registers:
> > 
> > vadd.vv [v8 - v15], [v8 - v15], [v16 - v23]
> > 
> > Users statically set the vectorization factor is not ideal.
> > 
> > We want GCC to dynamic choose vectorization factor to do the 
> > auto-vectorization according to loop analysis.
> > 
> > Currently, I have implement simplistic loop analysis like analyze live 
> > range of each local decl of current function.
> > 
> > Here is the analysis, we have 32 vector registers for RVV.
> > So we calculate the live range of current function local decl:
> > 
> > the number of decls live at the same time * LMUL <= 32. 
> > According to this analysis, I set the vectorization factor in 
> > TARGET_VECTORIZE_PREFERRED_SIMD_MODE
> > 
> > Then this simplistic algorithm (implemented in RISC-V backend) work well 
> > for the testcases I produces.
> > 
> > However, I can only 

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Hi. Thanks Richard and Richi.

Now, I figure out how to choose smaller LMUL now.

void
costs::finish_cost (const vector_costs *scalar_costs)
{
  loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
  if (loop_vinfo)
{
  if (loop_vinfo->vector_mode == RVVM8SImode
  || riscv_v_ext_vls_mode_p (loop_vinfo->vector_mode))
{
  m_costs[vect_prologue] = 8;
  m_costs[vect_body] = 8;
  m_costs[vect_epilogue] = 8;
}
  else
{
  m_costs[vect_prologue] = 1;
  m_costs[vect_body] = 1;
  m_costs[vect_epilogue] = 1;
}
}
   // m_suggested_unroll_factor = 2;
  vector_costs::finish_cost (scalar_costs);
}

Previous odd codes are because of VLS modes

Now, I can get the LMUL = 4 by adjusting cost.
vsetvli a5,a2,e32,m4,ta,ma
vle32.v v8,0(a0)
vle32.v v4,0(a1)
vsetvli a6,zero,e32,m4,ta,ma
slli a3,a5,2
vadd.vv v4,v4,v8
sub a2,a2,a5
vsetvli zero,a5,e32,m4,ta,ma
vse32.v v4,0(a4)
add a0,a0,a3
add a1,a1,a3
add a4,a4,a3
bne a2,zero,.L3

Fantastic architecture of GCC Vector Cost model!

Thanks a lot.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-31 19:20
To: juzhe.zh...@rivai.ai
CC: gcc; richard.sandiford
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Thanks Richi.
> 
> I am trying to figure out how to adjust finish_cost to lower the LMUL
> 
> For example:
> 
> void
> foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> {
>   for (int i = 0; i < n; i++)
> a[i] = a[i] + b[i];
> }
> 
> preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> 
> Is is possible that we can adjust the COST in finish cost make Loop 
> vectorizer pick LMUL = 4?
 
I see you have a autovectorize_vector_modes hook and you use
VECT_COMPARE_COSTS.  So the appropriate place would be to
amend your vector_costs::better_main_loop_than_p.
 
> I am experimenting with this following cost:
> 
>   if (loop_vinfo)
> {
>   if (loop_vinfo->vector_mode == RVVM8SImode)
> {
>   m_costs[vect_prologue] = 2;
>   m_costs[vect_body] = 20;
>   m_costs[vect_epilogue] = 2;
> }
>   else
> {
>   m_costs[vect_prologue] = 1;
>   m_costs[vect_body] = 1;
>   m_costs[vect_epilogue] = 1;
> }
> }
> 
> I increase LMUL = 8 cost. The codegen is odd:
> 
> foo:
> ble a2,zero,.L12
> addiw a5,a2,-1
> li a4,30
> sext.w t1,a2
> bleu a5,a4,.L7
> srliw a7,t1,5
> slli a7,a7,7
> li a4,32
> add a7,a7,a0
> mv a5,a0
> mv a3,a1
> vsetvli zero,a4,e32,m8,ta,ma
> .L4:
> vle32.v v8,0(a5)
> vle32.v v16,0(a3)
> vadd.vv v8,v8,v16
> vse32.v v8,0(a5)
> addi a5,a5,128
> addi a3,a3,128
> bne a5,a7,.L4
> andi a2,a2,-32
> beq t1,a2,.L14
> .L3:
> slli a4,a2,32
> subw a5,t1,a2
> srli a4,a4,32
> slli a5,a5,32
> slli a4,a4,2
> srli a5,a5,32
> add a0,a0,a4
> add a1,a1,a4
> vsetvli a4,a5,e8,m1,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a2,zero,e32,m4,ta,ma
> vadd.vv v4,v4,v8
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a0)
> sub a3,a5,a4
> beq a5,a4,.L12
> slli a4,a4,2
> vsetvli zero,a3,e8,m1,ta,ma
> add a0,a0,a4
> add a1,a1,a4
> vle32.v v4,0(a0)
> vle32.v v8,0(a1)
> vsetvli a2,zero,e32,m4,ta,ma
> vadd.vv v4,v4,v8
> vsetvli zero,a3,e32,m4,ta,ma
> vse32.v v4,0(a0)
> .L12:
> ret
> .L7:
> li a2,0
> j .L3
> .L14:
> ret
> 
> I hope it can generate the code like this:
> 
> foo:
> ble a2,zero,.L5
> mv a4,a0
> .L3:
> vsetvli a5,a2,e32,m4,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a6,zero,e32,m4,ta,ma
> slli a3,a5,2
> vadd.vv v4,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> .L5:
> ret
> 
> I am experimenting whether we can adjust cost statically to make loop 
> vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8. 
> If we can do that, I think we can apply analysis and then adjust the 
> cost according to analysis.
>
> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 15:38
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi, Richard and Richi.
> > 
> > Currently, we are statically returning vectorization factor in 
> > 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> > according to compile option.
> > 
> > For example:
> > void
> > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > a[i] = a[i] + b[i];
> > }
> > 
> > with --param=riscv-autovec-lmul = m1:
> > 
> > vsetvli a5,a2,e32,m1,ta,ma
> > vle32.v v2,0(a0)
> > vle32.v v1,0(a1)
> > vsetvli a6,zero,e32,m1,ta,ma
> > slli a3,a5,2
> > vadd.vv v1,v1,v2
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m1,ta,ma
> > vse32.v v1,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > 
> > The 'vadd.vv' is only performing operations on a single regist

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi. Thanks Richard and Richi.
> 
> Now, I figure out how to choose smaller LMUL now.
> 
> void
> costs::finish_cost (const vector_costs *scalar_costs)
> {
>   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
>   if (loop_vinfo)
> {
>   if (loop_vinfo->vector_mode == RVVM8SImode
>   || riscv_v_ext_vls_mode_p (loop_vinfo->vector_mode))
> {
>   m_costs[vect_prologue] = 8;
>   m_costs[vect_body] = 8;
>   m_costs[vect_epilogue] = 8;
> }
>   else
> {
>   m_costs[vect_prologue] = 1;
>   m_costs[vect_body] = 1;
>   m_costs[vect_epilogue] = 1;
> }
> }
>// m_suggested_unroll_factor = 2;
>   vector_costs::finish_cost (scalar_costs);
> }

I don't think that's "good" use of the API.

> Previous odd codes are because of VLS modes
> 
> Now, I can get the LMUL = 4 by adjusting cost.
> vsetvli a5,a2,e32,m4,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a6,zero,e32,m4,ta,ma
> slli a3,a5,2
> vadd.vv v4,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> Fantastic architecture of GCC Vector Cost model!
> 
> Thanks a lot.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 19:20
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Thanks Richi.
> > 
> > I am trying to figure out how to adjust finish_cost to lower the LMUL
> > 
> > For example:
> > 
> > void
> > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > a[i] = a[i] + b[i];
> > }
> > 
> > preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> > 
> > Is is possible that we can adjust the COST in finish cost make Loop 
> > vectorizer pick LMUL = 4?
>  
> I see you have a autovectorize_vector_modes hook and you use
> VECT_COMPARE_COSTS.  So the appropriate place would be to
> amend your vector_costs::better_main_loop_than_p.
>  
> > I am experimenting with this following cost:
> > 
> >   if (loop_vinfo)
> > {
> >   if (loop_vinfo->vector_mode == RVVM8SImode)
> > {
> >   m_costs[vect_prologue] = 2;
> >   m_costs[vect_body] = 20;
> >   m_costs[vect_epilogue] = 2;
> > }
> >   else
> > {
> >   m_costs[vect_prologue] = 1;
> >   m_costs[vect_body] = 1;
> >   m_costs[vect_epilogue] = 1;
> > }
> > }
> > 
> > I increase LMUL = 8 cost. The codegen is odd:
> > 
> > foo:
> > ble a2,zero,.L12
> > addiw a5,a2,-1
> > li a4,30
> > sext.w t1,a2
> > bleu a5,a4,.L7
> > srliw a7,t1,5
> > slli a7,a7,7
> > li a4,32
> > add a7,a7,a0
> > mv a5,a0
> > mv a3,a1
> > vsetvli zero,a4,e32,m8,ta,ma
> > .L4:
> > vle32.v v8,0(a5)
> > vle32.v v16,0(a3)
> > vadd.vv v8,v8,v16
> > vse32.v v8,0(a5)
> > addi a5,a5,128
> > addi a3,a3,128
> > bne a5,a7,.L4
> > andi a2,a2,-32
> > beq t1,a2,.L14
> > .L3:
> > slli a4,a2,32
> > subw a5,t1,a2
> > srli a4,a4,32
> > slli a5,a5,32
> > slli a4,a4,2
> > srli a5,a5,32
> > add a0,a0,a4
> > add a1,a1,a4
> > vsetvli a4,a5,e8,m1,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > sub a3,a5,a4
> > beq a5,a4,.L12
> > slli a4,a4,2
> > vsetvli zero,a3,e8,m1,ta,ma
> > add a0,a0,a4
> > add a1,a1,a4
> > vle32.v v4,0(a0)
> > vle32.v v8,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a3,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > .L12:
> > ret
> > .L7:
> > li a2,0
> > j .L3
> > .L14:
> > ret
> > 
> > I hope it can generate the code like this:
> > 
> > foo:
> > ble a2,zero,.L5
> > mv a4,a0
> > .L3:
> > vsetvli a5,a2,e32,m4,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a6,zero,e32,m4,ta,ma
> > slli a3,a5,2
> > vadd.vv v4,v4,v8
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > .L5:
> > ret
> > 
> > I am experimenting whether we can adjust cost statically to make loop 
> > vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8. 
> > If we can do that, I think we can apply analysis and then adjust the 
> > cost according to analysis.
> >
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-31 15:38
> > To: juzhe.zh...@rivai.ai
> > CC: gcc; richard.sandiford
> > Subject: Re: Question about dynamic choosing vectorization factor for RVV
> > On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Hi, Richard and Richi.
> > > 
> > > Currently, we are statically returning vectorization factor in 
> > > 'TARGET_VECTORIZE_PREFERRED_SIMD_MODE'
> > > according to compile option.
> > > 
> > > For example:
> > > void
> > > foo (int32_t *__restrict a, int3

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Hi, Richi.

>> I don't think that's "good" use of the API.
You mean I should use 'better_main_loop_than_p‘ ?
Yes. I plan to use it.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-31 19:29
To: juzhe.zh...@rivai.ai
CC: gcc; richard.sandiford
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi. Thanks Richard and Richi.
> 
> Now, I figure out how to choose smaller LMUL now.
> 
> void
> costs::finish_cost (const vector_costs *scalar_costs)
> {
>   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
>   if (loop_vinfo)
> {
>   if (loop_vinfo->vector_mode == RVVM8SImode
>   || riscv_v_ext_vls_mode_p (loop_vinfo->vector_mode))
> {
>   m_costs[vect_prologue] = 8;
>   m_costs[vect_body] = 8;
>   m_costs[vect_epilogue] = 8;
> }
>   else
> {
>   m_costs[vect_prologue] = 1;
>   m_costs[vect_body] = 1;
>   m_costs[vect_epilogue] = 1;
> }
> }
>// m_suggested_unroll_factor = 2;
>   vector_costs::finish_cost (scalar_costs);
> }
 
I don't think that's "good" use of the API.
 
> Previous odd codes are because of VLS modes
> 
> Now, I can get the LMUL = 4 by adjusting cost.
> vsetvli a5,a2,e32,m4,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a6,zero,e32,m4,ta,ma
> slli a3,a5,2
> vadd.vv v4,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> Fantastic architecture of GCC Vector Cost model!
> 
> Thanks a lot.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 19:20
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Thanks Richi.
> > 
> > I am trying to figure out how to adjust finish_cost to lower the LMUL
> > 
> > For example:
> > 
> > void
> > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > a[i] = a[i] + b[i];
> > }
> > 
> > preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> > 
> > Is is possible that we can adjust the COST in finish cost make Loop 
> > vectorizer pick LMUL = 4?
>  
> I see you have a autovectorize_vector_modes hook and you use
> VECT_COMPARE_COSTS.  So the appropriate place would be to
> amend your vector_costs::better_main_loop_than_p.
>  
> > I am experimenting with this following cost:
> > 
> >   if (loop_vinfo)
> > {
> >   if (loop_vinfo->vector_mode == RVVM8SImode)
> > {
> >   m_costs[vect_prologue] = 2;
> >   m_costs[vect_body] = 20;
> >   m_costs[vect_epilogue] = 2;
> > }
> >   else
> > {
> >   m_costs[vect_prologue] = 1;
> >   m_costs[vect_body] = 1;
> >   m_costs[vect_epilogue] = 1;
> > }
> > }
> > 
> > I increase LMUL = 8 cost. The codegen is odd:
> > 
> > foo:
> > ble a2,zero,.L12
> > addiw a5,a2,-1
> > li a4,30
> > sext.w t1,a2
> > bleu a5,a4,.L7
> > srliw a7,t1,5
> > slli a7,a7,7
> > li a4,32
> > add a7,a7,a0
> > mv a5,a0
> > mv a3,a1
> > vsetvli zero,a4,e32,m8,ta,ma
> > .L4:
> > vle32.v v8,0(a5)
> > vle32.v v16,0(a3)
> > vadd.vv v8,v8,v16
> > vse32.v v8,0(a5)
> > addi a5,a5,128
> > addi a3,a3,128
> > bne a5,a7,.L4
> > andi a2,a2,-32
> > beq t1,a2,.L14
> > .L3:
> > slli a4,a2,32
> > subw a5,t1,a2
> > srli a4,a4,32
> > slli a5,a5,32
> > slli a4,a4,2
> > srli a5,a5,32
> > add a0,a0,a4
> > add a1,a1,a4
> > vsetvli a4,a5,e8,m1,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > sub a3,a5,a4
> > beq a5,a4,.L12
> > slli a4,a4,2
> > vsetvli zero,a3,e8,m1,ta,ma
> > add a0,a0,a4
> > add a1,a1,a4
> > vle32.v v4,0(a0)
> > vle32.v v8,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a3,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > .L12:
> > ret
> > .L7:
> > li a2,0
> > j .L3
> > .L14:
> > ret
> > 
> > I hope it can generate the code like this:
> > 
> > foo:
> > ble a2,zero,.L5
> > mv a4,a0
> > .L3:
> > vsetvli a5,a2,e32,m4,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a6,zero,e32,m4,ta,ma
> > slli a3,a5,2
> > vadd.vv v4,v4,v8
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > .L5:
> > ret
> > 
> > I am experimenting whether we can adjust cost statically to make loop 
> > vectorizer use LMUL = 4 even though preferred_simd_mode return LMUL = 8. 
> > If we can do that, I think we can apply analysis and then adjust the 
> > cost according to analysis.
> >
> > Thanks.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-31 15:38
> > To: juzhe.zh...@rivai.ai
> > CC: gcc; richard.sandiford
> > Subject: Re: Question about dynamic choosing vector

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread juzhe.zh...@rivai.ai
Hi, Richi.

  /* Keep track of the VF for each mode.  Initialize all to 0 which indicates
 a mode has not been analyzed.  */
  auto_vec cached_vf_per_mode;
  for (unsigned i = 0; i < vector_modes.length (); ++i)
cached_vf_per_mode.safe_push (0);

I saw codes here:
the 'cached_vf_per_mode' is allocated size '8'.

But for RVV, I will need to push these following modes:

RVVM8QI, RVVM4QI, RVVM2QI, RVVM1QI, V128QI, V64QI, V32QI, V16QI, V8QI, V4QI, 
V2QI

There are 11 modes.
Should I increase the number from 8 to 11?

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Biener
Date: 2023-08-31 19:29
To: juzhe.zh...@rivai.ai
CC: gcc; richard.sandiford
Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
 
> Hi. Thanks Richard and Richi.
> 
> Now, I figure out how to choose smaller LMUL now.
> 
> void
> costs::finish_cost (const vector_costs *scalar_costs)
> {
>   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
>   if (loop_vinfo)
> {
>   if (loop_vinfo->vector_mode == RVVM8SImode
>   || riscv_v_ext_vls_mode_p (loop_vinfo->vector_mode))
> {
>   m_costs[vect_prologue] = 8;
>   m_costs[vect_body] = 8;
>   m_costs[vect_epilogue] = 8;
> }
>   else
> {
>   m_costs[vect_prologue] = 1;
>   m_costs[vect_body] = 1;
>   m_costs[vect_epilogue] = 1;
> }
> }
>// m_suggested_unroll_factor = 2;
>   vector_costs::finish_cost (scalar_costs);
> }
 
I don't think that's "good" use of the API.
 
> Previous odd codes are because of VLS modes
> 
> Now, I can get the LMUL = 4 by adjusting cost.
> vsetvli a5,a2,e32,m4,ta,ma
> vle32.v v8,0(a0)
> vle32.v v4,0(a1)
> vsetvli a6,zero,e32,m4,ta,ma
> slli a3,a5,2
> vadd.vv v4,v4,v8
> sub a2,a2,a5
> vsetvli zero,a5,e32,m4,ta,ma
> vse32.v v4,0(a4)
> add a0,a0,a3
> add a1,a1,a3
> add a4,a4,a3
> bne a2,zero,.L3
> 
> Fantastic architecture of GCC Vector Cost model!
> 
> Thanks a lot.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 19:20
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Thanks Richi.
> > 
> > I am trying to figure out how to adjust finish_cost to lower the LMUL
> > 
> > For example:
> > 
> > void
> > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > {
> >   for (int i = 0; i < n; i++)
> > a[i] = a[i] + b[i];
> > }
> > 
> > preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> > 
> > Is is possible that we can adjust the COST in finish cost make Loop 
> > vectorizer pick LMUL = 4?
>  
> I see you have a autovectorize_vector_modes hook and you use
> VECT_COMPARE_COSTS.  So the appropriate place would be to
> amend your vector_costs::better_main_loop_than_p.
>  
> > I am experimenting with this following cost:
> > 
> >   if (loop_vinfo)
> > {
> >   if (loop_vinfo->vector_mode == RVVM8SImode)
> > {
> >   m_costs[vect_prologue] = 2;
> >   m_costs[vect_body] = 20;
> >   m_costs[vect_epilogue] = 2;
> > }
> >   else
> > {
> >   m_costs[vect_prologue] = 1;
> >   m_costs[vect_body] = 1;
> >   m_costs[vect_epilogue] = 1;
> > }
> > }
> > 
> > I increase LMUL = 8 cost. The codegen is odd:
> > 
> > foo:
> > ble a2,zero,.L12
> > addiw a5,a2,-1
> > li a4,30
> > sext.w t1,a2
> > bleu a5,a4,.L7
> > srliw a7,t1,5
> > slli a7,a7,7
> > li a4,32
> > add a7,a7,a0
> > mv a5,a0
> > mv a3,a1
> > vsetvli zero,a4,e32,m8,ta,ma
> > .L4:
> > vle32.v v8,0(a5)
> > vle32.v v16,0(a3)
> > vadd.vv v8,v8,v16
> > vse32.v v8,0(a5)
> > addi a5,a5,128
> > addi a3,a3,128
> > bne a5,a7,.L4
> > andi a2,a2,-32
> > beq t1,a2,.L14
> > .L3:
> > slli a4,a2,32
> > subw a5,t1,a2
> > srli a4,a4,32
> > slli a5,a5,32
> > slli a4,a4,2
> > srli a5,a5,32
> > add a0,a0,a4
> > add a1,a1,a4
> > vsetvli a4,a5,e8,m1,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > sub a3,a5,a4
> > beq a5,a4,.L12
> > slli a4,a4,2
> > vsetvli zero,a3,e8,m1,ta,ma
> > add a0,a0,a4
> > add a1,a1,a4
> > vle32.v v4,0(a0)
> > vle32.v v8,0(a1)
> > vsetvli a2,zero,e32,m4,ta,ma
> > vadd.vv v4,v4,v8
> > vsetvli zero,a3,e32,m4,ta,ma
> > vse32.v v4,0(a0)
> > .L12:
> > ret
> > .L7:
> > li a2,0
> > j .L3
> > .L14:
> > ret
> > 
> > I hope it can generate the code like this:
> > 
> > foo:
> > ble a2,zero,.L5
> > mv a4,a0
> > .L3:
> > vsetvli a5,a2,e32,m4,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a6,zero,e32,m4,ta,ma
> > slli a3,a5,2
> > vadd.vv v4,v4,v8
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > .L5:
> > ret
> > 
> > I am experimenting whether we can adjust cost statically to make loop 
> > vectorizer

Re: Re: Question about dynamic choosing vectorization factor for RVV

2023-08-31 Thread Richard Biener via Gcc
On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:

> Hi, Richi.
> 
>   /* Keep track of the VF for each mode.  Initialize all to 0 which indicates
>  a mode has not been analyzed.  */
>   auto_vec cached_vf_per_mode;
>   for (unsigned i = 0; i < vector_modes.length (); ++i)
> cached_vf_per_mode.safe_push (0);
> 
> I saw codes here:
> the 'cached_vf_per_mode' is allocated size '8'.
> 
> But for RVV, I will need to push these following modes:
> 
> RVVM8QI, RVVM4QI, RVVM2QI, RVVM1QI, V128QI, V64QI, V32QI, V16QI, V8QI, V4QI, 
> V2QI
> 
> There are 11 modes.
> Should I increase the number from 8 to 11?

It will just perform dynamic allocation, no need to adjust.

> Thanks.
> 
> 
> juzhe.zh...@rivai.ai
>  
> From: Richard Biener
> Date: 2023-08-31 19:29
> To: juzhe.zh...@rivai.ai
> CC: gcc; richard.sandiford
> Subject: Re: Re: Question about dynamic choosing vectorization factor for RVV
> On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
>  
> > Hi. Thanks Richard and Richi.
> > 
> > Now, I figure out how to choose smaller LMUL now.
> > 
> > void
> > costs::finish_cost (const vector_costs *scalar_costs)
> > {
> >   loop_vec_info loop_vinfo = dyn_cast (m_vinfo);
> >   if (loop_vinfo)
> > {
> >   if (loop_vinfo->vector_mode == RVVM8SImode
> >   || riscv_v_ext_vls_mode_p (loop_vinfo->vector_mode))
> > {
> >   m_costs[vect_prologue] = 8;
> >   m_costs[vect_body] = 8;
> >   m_costs[vect_epilogue] = 8;
> > }
> >   else
> > {
> >   m_costs[vect_prologue] = 1;
> >   m_costs[vect_body] = 1;
> >   m_costs[vect_epilogue] = 1;
> > }
> > }
> >// m_suggested_unroll_factor = 2;
> >   vector_costs::finish_cost (scalar_costs);
> > }
>  
> I don't think that's "good" use of the API.
>  
> > Previous odd codes are because of VLS modes
> > 
> > Now, I can get the LMUL = 4 by adjusting cost.
> > vsetvli a5,a2,e32,m4,ta,ma
> > vle32.v v8,0(a0)
> > vle32.v v4,0(a1)
> > vsetvli a6,zero,e32,m4,ta,ma
> > slli a3,a5,2
> > vadd.vv v4,v4,v8
> > sub a2,a2,a5
> > vsetvli zero,a5,e32,m4,ta,ma
> > vse32.v v4,0(a4)
> > add a0,a0,a3
> > add a1,a1,a3
> > add a4,a4,a3
> > bne a2,zero,.L3
> > 
> > Fantastic architecture of GCC Vector Cost model!
> > 
> > Thanks a lot.
> > 
> > 
> > juzhe.zh...@rivai.ai
> >  
> > From: Richard Biener
> > Date: 2023-08-31 19:20
> > To: juzhe.zh...@rivai.ai
> > CC: gcc; richard.sandiford
> > Subject: Re: Re: Question about dynamic choosing vectorization factor for 
> > RVV
> > On Thu, 31 Aug 2023, juzhe.zh...@rivai.ai wrote:
> >  
> > > Thanks Richi.
> > > 
> > > I am trying to figure out how to adjust finish_cost to lower the LMUL
> > > 
> > > For example:
> > > 
> > > void
> > > foo (int32_t *__restrict a, int32_t *__restrict b, int n)
> > > {
> > >   for (int i = 0; i < n; i++)
> > > a[i] = a[i] + b[i];
> > > }
> > > 
> > > preferred_simd_mode pick LMUL = 8 (RVVM8SImode)
> > > 
> > > Is is possible that we can adjust the COST in finish cost make Loop 
> > > vectorizer pick LMUL = 4?
> >  
> > I see you have a autovectorize_vector_modes hook and you use
> > VECT_COMPARE_COSTS.  So the appropriate place would be to
> > amend your vector_costs::better_main_loop_than_p.
> >  
> > > I am experimenting with this following cost:
> > > 
> > >   if (loop_vinfo)
> > > {
> > >   if (loop_vinfo->vector_mode == RVVM8SImode)
> > > {
> > >   m_costs[vect_prologue] = 2;
> > >   m_costs[vect_body] = 20;
> > >   m_costs[vect_epilogue] = 2;
> > > }
> > >   else
> > > {
> > >   m_costs[vect_prologue] = 1;
> > >   m_costs[vect_body] = 1;
> > >   m_costs[vect_epilogue] = 1;
> > > }
> > > }
> > > 
> > > I increase LMUL = 8 cost. The codegen is odd:
> > > 
> > > foo:
> > > ble a2,zero,.L12
> > > addiw a5,a2,-1
> > > li a4,30
> > > sext.w t1,a2
> > > bleu a5,a4,.L7
> > > srliw a7,t1,5
> > > slli a7,a7,7
> > > li a4,32
> > > add a7,a7,a0
> > > mv a5,a0
> > > mv a3,a1
> > > vsetvli zero,a4,e32,m8,ta,ma
> > > .L4:
> > > vle32.v v8,0(a5)
> > > vle32.v v16,0(a3)
> > > vadd.vv v8,v8,v16
> > > vse32.v v8,0(a5)
> > > addi a5,a5,128
> > > addi a3,a3,128
> > > bne a5,a7,.L4
> > > andi a2,a2,-32
> > > beq t1,a2,.L14
> > > .L3:
> > > slli a4,a2,32
> > > subw a5,t1,a2
> > > srli a4,a4,32
> > > slli a5,a5,32
> > > slli a4,a4,2
> > > srli a5,a5,32
> > > add a0,a0,a4
> > > add a1,a1,a4
> > > vsetvli a4,a5,e8,m1,ta,ma
> > > vle32.v v8,0(a0)
> > > vle32.v v4,0(a1)
> > > vsetvli a2,zero,e32,m4,ta,ma
> > > vadd.vv v4,v4,v8
> > > vsetvli zero,a5,e32,m4,ta,ma
> > > vse32.v v4,0(a0)
> > > sub a3,a5,a4
> > > beq a5,a4,.L12
> > > slli a4,a4,2
> > > vsetvli zero,a3,e8,m1,ta,ma
> > > add a0,a0,a4
> > > add a1,a1,a4
> > > vle32.v v4,0(a0)
> > > vle32.v v8,0(a1)
> > > vsetvli a2,zero,e32,m4,ta,ma
> > > vadd.vv v4,v4,v8
> > > vsetvli zero,a3,e32,m4,ta,ma
> > > vse32.v v4,0(a0)
> > > .L12:
> > > ret
> > > .L7:
> > > li a2,0
> > > j .L3
> > > .L14:
> > > ret
> > > 

CLZ when CLZ_DEFINED_VALUE_AT_ZERO is false

2023-08-31 Thread Krister Walfridsson via Gcc
My translation validation tool reports some miscompilations related to the 
internal call CLZ(0) when CLZ_DEFINED_VALUE_AT_ZERO is false, but I am not 
sure I use the correct semantics...


I started by modeling CLZ(0) as undefined behavior, but that made the tool 
report a miscompilation for gcc.c-torture/execute/920501-6.c where sccp 
changes a loop to


  _36 = t_10(D) != 0;
  _35 = .CLZ (t_10(D));
  _34 = 63 - _35;
  _33 = (unsigned int) _34;
  _32 = (long long unsigned int) _33;
  _31 = _32 + 1;
  b_38 = _36 ? _31 : 1;

This may call CLZ with 0, but the value is not used, so it seems 
reasonable to allow this. I therefore changed my tool to treat CLZ(0) as 
returning an undefined value instead (by generating it as a symbolic 
value, which then makes the tool test that the IR is valid for all 
values).


This still makes this optimization fail because the calculation of _34 may 
overflow... :(  This could be a bug in the sccp pass (which could have 
done the calculation as unsigned), but fixing this would still make the 
tool report a miscompilation as the ranges calculated during the dom3 pass 
claims that _35 has a range:

  _35  : [irange] int [0, 63]

So what is the correct semantics for CLZ when CLZ_DEFINED_VALUE_AT_ZERO is 
false?


   /Krister


Confusing location of error in source code

2023-08-31 Thread Alejandro Colomar via Gcc
Hi!

I've been confused for some time with a compilation error that
pointed to a slightly-off location.  I wasn't seeing that I used
a temporary variable in a constant expression.  The code could be
reduced to:

$ cat const.c 
int
main(void)
{
int x = 42;

_Static_assert(0 || 7 > x, "");
}
$ cc -Wall -Wextra const.c 
const.c: In function ‘main’:
const.c:6:26: error: expression in static assertion is not constant
6 | _Static_assert(0 || 7 > x, "");
  |~~^~~~


I think the appropriate error report should point to this other point:


6 | _Static_assert(0 || 7 > x, "");
  |~^

Cheers,
Alex

-- 

GPG key fingerprint: A9348594CE31283A826FBDD8D57633D441E25BB5


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread David Malcolm via Gcc
On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> On Tue, Aug 29, 2023 at 5:14 PM David Malcolm 
> wrote:
> > 
> > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > Additionally, by using the old model and the pointer per your
> > > suggestion,
> > > we are able to find the representative tree and emit a more
> > > accurate
> > > diagnostic!
> > > 
> > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > ‘1’
> > > but ob_refcnt field is: ‘2’
> > >    23 |   return list;
> > >   |  ^~~~
> > >   ‘create_py_object’: events 1-4
> > >     |
> > >     |    4 |   PyObject* item = PyLong_FromLong(3);
> > >     |  |    ^~
> > >     |  |    |
> > >     |  |    (1) when ‘PyLong_FromLong’
> > > succeeds
> > >     |    5 |   PyObject* list = PyList_New(1);
> > >     |  |    ~
> > >     |  |    |
> > >     |  |    (2) when ‘PyList_New’ succeeds
> > >     |..
> > >     |   14 |   PyList_Append(list, item);
> > >     |  |   ~
> > >     |  |   |
> > >     |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > >     |..
> > >     |   23 |   return list;
> > >     |  |  
> > >     |  |  |
> > >     |  |  (4) here
> > >     |
> > 
> > Excellent, that's a big improvement.
> > 
> > > 
> > > If a representative tree is not found, I decided we should just
> > > bail
> > > out
> > > of emitting a diagnostic for now, to avoid confusing the user on
> > > what
> > > the problem is.
> > 
> > Fair enough.
> > 
> > > 
> > > I've attached the patch for this (on top of the previous one)
> > > below.
> > > If
> > > it also looks good, I can merge it with the last patch and push
> > > it in
> > > at
> > > the same time.
> > 
> > I don't mind either way, but please can you update the tests so
> > that we
> > have some automated test coverage that the correct name is being
> > printed in the warning.
> > 
> > Thanks
> > Dave
> > 
> 
> Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> to
> preserve our chain on the list:
> 
> ---
> 
> Thanks; pushed to trunk with nits fixed:
> https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> .

Thanks; looks good.

Do you want to add this to the GCC 14 part of the "History" section on
the wiki page:
  https://gcc.gnu.org/wiki/StaticAnalyzer
or should I?

> 
> Incidentally, I updated my formatting settings in VSCode, which I've
> previously mentioned in passing. In case anyone is interested:
> 
> "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> AlignAfterOpenBracket: Align,
> AllowAllParametersOfDeclarationOnNextLine: true }",
> 
> This fixes some issues with the indent width and also ensures
> function
> parameters of appropriate length are aligned properly and on a new
> line each (like the rest of the analyzer code).

Thanks
Dave




Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Eric Feng via Gcc
On Thu, Aug 31, 2023 at 1:01 PM David Malcolm  wrote:
>
> On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
> > On Tue, Aug 29, 2023 at 5:14 PM David Malcolm 
> > wrote:
> > >
> > > On Tue, 2023-08-29 at 13:28 -0400, Eric Feng wrote:
> > > > Additionally, by using the old model and the pointer per your
> > > > suggestion,
> > > > we are able to find the representative tree and emit a more
> > > > accurate
> > > > diagnostic!
> > > >
> > > > rc3.c:23:10: warning: expected ‘item’ to have reference count:
> > > > ‘1’
> > > > but ob_refcnt field is: ‘2’
> > > >23 |   return list;
> > > >   |  ^~~~
> > > >   ‘create_py_object’: events 1-4
> > > > |
> > > > |4 |   PyObject* item = PyLong_FromLong(3);
> > > > |  |^~
> > > > |  ||
> > > > |  |(1) when ‘PyLong_FromLong’
> > > > succeeds
> > > > |5 |   PyObject* list = PyList_New(1);
> > > > |  |~
> > > > |  ||
> > > > |  |(2) when ‘PyList_New’ succeeds
> > > > |..
> > > > |   14 |   PyList_Append(list, item);
> > > > |  |   ~
> > > > |  |   |
> > > > |  |   (3) when ‘PyList_Append’ succeeds, moving buffer
> > > > |..
> > > > |   23 |   return list;
> > > > |  |  
> > > > |  |  |
> > > > |  |  (4) here
> > > > |
> > >
> > > Excellent, that's a big improvement.
> > >
> > > >
> > > > If a representative tree is not found, I decided we should just
> > > > bail
> > > > out
> > > > of emitting a diagnostic for now, to avoid confusing the user on
> > > > what
> > > > the problem is.
> > >
> > > Fair enough.
> > >
> > > >
> > > > I've attached the patch for this (on top of the previous one)
> > > > below.
> > > > If
> > > > it also looks good, I can merge it with the last patch and push
> > > > it in
> > > > at
> > > > the same time.
> > >
> > > I don't mind either way, but please can you update the tests so
> > > that we
> > > have some automated test coverage that the correct name is being
> > > printed in the warning.
> > >
> > > Thanks
> > > Dave
> > >
> >
> > Sorry — forgot to hit 'reply all' in the previous e-mail. Resending
> > to
> > preserve our chain on the list:
> >
> > ---
> >
> > Thanks; pushed to trunk with nits fixed:
> > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > .
>
> Thanks; looks good.
>
> Do you want to add this to the GCC 14 part of the "History" section on
> the wiki page:
>   https://gcc.gnu.org/wiki/StaticAnalyzer
> or should I?
Happy to add it myself, but I'm not finding an option to edit the page
(created an account under ef...@gcc.gnu.org). Do I need to be added to
the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?

>
> >
> > Incidentally, I updated my formatting settings in VSCode, which I've
> > previously mentioned in passing. In case anyone is interested:
> >
> > "C_Cpp.clang_format_style": "{ BasedOnStyle: GNU, UseTab: Always,
> > TabWidth: 8, IndentWidth: 2, BinPackParameters: false,
> > AlignAfterOpenBracket: Align,
> > AllowAllParametersOfDeclarationOnNextLine: true }",
> >
> > This fixes some issues with the indent width and also ensures
> > function
> > parameters of appropriate length are aligned properly and on a new
> > line each (like the rest of the analyzer code).
>
> Thanks
> Dave
>
>


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread David Malcolm via Gcc
On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> On Thu, Aug 31, 2023 at 1:01 PM David Malcolm 
> wrote:
> > 
> > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:

[...]

> > > 
> > > Thanks; pushed to trunk with nits fixed:
> > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > .
> > 
> > Thanks; looks good.
> > 
> > Do you want to add this to the GCC 14 part of the "History" section
> > on
> > the wiki page:
> >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > or should I?
> Happy to add it myself, but I'm not finding an option to edit the
> page
> (created an account under ef...@gcc.gnu.org). Do I need to be added
> to
> the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?

I can do this.  What's your account's WikiName ?

Thanks
Dave



gcc-11-20230831 is now available

2023-08-31 Thread GCC Administrator via Gcc
Snapshot gcc-11-20230831 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/11-20230831/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 11 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-11 revision 0684267ab769f850d26a9b226806e570f1efe1af

You'll find:

 gcc-11-20230831.tar.xz   Complete GCC

  SHA256=f039343b5395207fabd6f93e03699dd941898a4aefde619a1e0936b6fffcba7f
  SHA1=1b99760f4fe1e193ac4cfe4daeaa707f17386b2e

Diffs from 11-20230824 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-11
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Eric Feng via Gcc
On Thu, Aug 31, 2023 at 4:19 PM David Malcolm  wrote:
>
> On Thu, 2023-08-31 at 15:09 -0400, Eric Feng wrote:
> > On Thu, Aug 31, 2023 at 1:01 PM David Malcolm 
> > wrote:
> > >
> > > On Wed, 2023-08-30 at 18:15 -0400, Eric Feng wrote:
>
> [...]
>
> > > >
> > > > Thanks; pushed to trunk with nits fixed:
> > > > https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=597b9ec69bca8acb7a3d65641c0a730de8b27ed4
> > > > .
> > >
> > > Thanks; looks good.
> > >
> > > Do you want to add this to the GCC 14 part of the "History" section
> > > on
> > > the wiki page:
> > >   https://gcc.gnu.org/wiki/StaticAnalyzer
> > > or should I?
> > Happy to add it myself, but I'm not finding an option to edit the
> > page
> > (created an account under ef...@gcc.gnu.org). Do I need to be added
> > to
> > the EditorGroup (https://gcc.gnu.org/wiki/EditorGroup) to do so?
>
> I can do this.  What's your account's WikiName ?
Thank you — it is EricFeng.
>
> Thanks
> Dave
>


Re: [PATCH] analyzer: implement reference count checking for CPython plugin [PR107646]

2023-08-31 Thread Hans-Peter Nilsson via Gcc
(Looks like this was committed as r14-3580-g597b9ec69bca8a)

> Cc: gcc@gcc.gnu.org, gcc-patc...@gcc.gnu.org, Eric Feng 
> From: Eric Feng via Gcc 

> gcc/testsuite/ChangeLog:
>   PR analyzer/107646
>   * gcc.dg/plugin/analyzer_cpython_plugin.c: Implements reference count
>   * checking for PyObjects.
>   * gcc.dg/plugin/cpython-plugin-test-2.c: Moved to...
>   * gcc.dg/plugin/cpython-plugin-test-PyList_Append.c: ...here (and
>   * added more tests).
>   * gcc.dg/plugin/cpython-plugin-test-1.c: Moved to...
>   * gcc.dg/plugin/cpython-plugin-test-no-plugin.c: ...here (and added
>   * more tests).
>   * gcc.dg/plugin/plugin.exp: New tests.
>   * gcc.dg/plugin/cpython-plugin-test-PyList_New.c: New test.
>   * gcc.dg/plugin/cpython-plugin-test-PyLong_FromLong.c: New test.
>   * gcc.dg/plugin/cpython-plugin-test-refcnt-checking.c: New test.

It seems this was more or less a rewrite, but that said,
it's generally preferable to always *add* tests, never *modify* them.

>  .../gcc.dg/plugin/analyzer_cpython_plugin.c   | 376 +-

^^^ Ouch!  Was it not within reason to keep that test as it
was, and just add another test?

Anyway, the test after rewrite fails, and for some targets
like cris-elf and apparently m68k-linux, yields an error.
I see a PR was already opened.

Also, mostly for future reference, several files in the
patch miss a final newline, as seen by a "\ No newline at
end of file"-marker.

I think I found the problem; a mismatch between default C++
language standard between host-gcc and target-gcc.

(It's actually *not* as simple as "auto var = typeofvar()"
not being recognized in C++11 --or else there'd be an error
for the hash_set declaration too, which I just changed for
consistency-- but it's close enough for me.)

With this, retesting plugin.exp for cris-elf works.

Ok to commit?

-- >8 --
From: Hans-Peter Nilsson 
Date: Fri, 1 Sep 2023 04:36:03 +0200
Subject: [PATCH] testsuite: Fix analyzer_cpython_plugin.c declarations, PR 
testsuite/111264

Also, add missing newline at end of file.

PR testsuite/111264
* gcc.dg/plugin/analyzer_cpython_plugin.c: Make declarations
C++11-compatible.
---
 gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c 
b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
index 7af520436549..bf1982e79c37 100644
--- a/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
+++ b/gcc/testsuite/gcc.dg/plugin/analyzer_cpython_plugin.c
@@ -477,8 +477,8 @@ pyobj_refcnt_checker (const region_model *model,
   if (!ctxt)
 return;
 
-  auto region_to_refcnt = hash_map ();
-  auto seen_regions = hash_set ();
+  hash_map region_to_refcnt;
+  hash_set seen_regions;
 
   count_pyobj_references (model, region_to_refcnt, retval, seen_regions);
   check_refcnts (model, old_model, retval, ctxt, region_to_refcnt);
@@ -561,7 +561,7 @@ public:
 if (!ctxt)
   return;
 region_model *model = cd.get_model ();
-auto region_to_refcnt = hash_map ();
+hash_map region_to_refcnt;
 count_all_references(model, region_to_refcnt);
 dump_refcnt_info(region_to_refcnt, model, ctxt);
   }
@@ -1330,4 +1330,4 @@ plugin_init (struct plugin_name_args *plugin_info,
   sorry_no_analyzer ();
 #endif
   return 0;
-}
\ No newline at end of file
+}
-- 
2.30.2

brgds, H-P


Re: Confusing location of error in source code

2023-08-31 Thread Jonathan Wakely via Gcc
On Thu, 31 Aug 2023, 17:05 Alejandro Colomar via Gcc, 
wrote:

> Hi!
>
> I've been confused for some time with a compilation error that
> pointed to a slightly-off location.  I wasn't seeing that I used
> a temporary variable in a constant expression.  The code could be
> reduced to:
>
> $ cat const.c
> int
> main(void)
> {
> int x = 42;
>
> _Static_assert(0 || 7 > x, "");
> }
> $ cc -Wall -Wextra const.c
> const.c: In function ‘main’:
> const.c:6:26: error: expression in static assertion is not constant
> 6 | _Static_assert(0 || 7 > x, "");
>   |~~^~~~
>
>
> I think the appropriate error report should point to this other point:
>
>
> 6 | _Static_assert(0 || 7 > x, "");
>   |~^
>

Please use bugzilla for bug reports.