Upcoming removal of legacy ranges and conversion to wide_ints.

2023-04-25 Thread Aldy Hernandez via Gcc

After GCC 13 is released we will remove legacy range support from the
compiler, and convert irange's to wide_ints.  I want to give everyone
a heads up, to help understand what's involved and what the end result is.

Legacy ranges are basically the value_range type (int_range<1>) where
the internal representation has anti-ranges and allows symbolics, etc.
It is a holdout from the old VRP, and was left in place because it was
pervasive and too risky remove late in the GCC 13 cycle.

There will be lots of patches (about 30), mostly because legacy
touches a lot, and also because it was necessary to rip out things in
order to double check the work.  Furthermore, having small pieces
makes it easier to bisect any possible regressions.

I will give a bird's eye view of what follows with details in the
patches themselves.

First the good news.  VRP improves by 13.22%, jump threading by 11.6%
and overall compilation improves by 1.5%.  Andrew has some
improvements to the cache that should provide significant additional
speedups.

Here are the main parts of the work:

1. Converting users of the old API to the new irange API.

   There are a few holdouts, most notably the middle end warnings,
   some of which have strong dependencies on VR_ANTI_RANGE and the old
   API.  I have provided a transitional function (get_legacy_range)
   which translates an irange to whatever min/max/kind concoction they
   rely on.  I have no plans for converting these passes, as I don't
   understand the code well enough to fix it.  Naive attempts broke
   the tests, even though conceptually the changes were correct.  At
   least the damage will be limited to users of one function:
   get_legacy_range().

   The IPA passes also use the old API, but I have plans (and patches)
   for revamping all of the IPA ranges.  Details below.

2. Repurposing int_range<1> to its obvious meaning (a range with two
   endpoints).  This will reduce the memory footprint for small
   ranges.

3. Overhauling the vrange storage mechanism used for global ranges
   (SSA_NAME_RANGE_INFO) so it can be shared with the ranger cache.

   The reason for this is because the ranger cache used a tree range
   allocator, which didn't exactly scale to the eventual conversion to
   wide ints.  It also allows us to use one lean and efficient
   allocator for everything instead of two incomplete implementations.

   The storage remains slim (no trees, plus a trailing_wide_int like
   implementation).  The storage is also quite fast, since without
   legacy or trees, we can copy host integers arrays back and forth
   between the storage and the wide_ints.

4. Conversion of trees to wide ints in irange.

5. Various performance optimizations now possible because we have
   moved away from both legacy and trees.

A few notes.

The value_range typedef has been renamed to int_range<2>, since
int_range<1> now means two endpoints which can't represent the inverse
of a range (say not-zero) for anything but a handful of ranges.  The
(future) plan is to mechanically convert the definitions to
int_range<2> and finally rename Value_Range (the type agnostic range
class) to value_range for use in passes that must work in a variety of
different types (floats, integers, etc).

IPA has been rolling their own ranges forever.  They use a combination
of the legacy API along with handcrafted pairs of wide_ints (ipa_vr).
The passes must be divorced of its legacy dependency, and cleaned up a
bit.  IPA is also very tied to integers and pointers, and I see no
reason why we can't keep track of float arguments, etc.

I am sitting on a lot of additional patches to do 90% of the
conversion, but need to consult with the IPA experts on various issues
before proceeding (for instance, the lifetime of various structures).
Among these patches are generic vrange LTO streaming functions and
vrange hashing.  I think it's time we make vrange a first class
citizen for LTO/hashing and a few other things folks were doing in an
ad-hoc manner.  This should alleviate the maintenance burden on the
IPA maintainers going forward.  Note, that the IPA work is a
follow-up, and only after careful consultation with the relevant
maintainers.

In addition to the IPA/LTO work described above, future work this
cycle will include converting irange to the value/mask mechanism we
use elsewhere in the compiler (CCP, etc).  With wide ints in place, this 
should be relatively straightforward, plus it will give us additional

optimization opportunities in the ranger ecosystem.

Comments welcome.

Aldy and Andrew



zero length array example does not compile

2023-04-25 Thread Jonny Grant
Hello

https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

I wondered 'this_length' refers to in that example, it doesn't compile.

: In function 'main':
:13:34: error: 'this_length' undeclared (first use in this function)
   13 |   malloc (sizeof (struct line) + this_length);
  |  ^~~


https://godbolt.org/z/PWEcWsrKv

I probably the size of the struct? So that would be 4 bytes for me, as it is 
just the int. That doesn't seem very useful. Maybe I am missing something.

Kind regards
Jonny


Re: zero length array example does not compile

2023-04-25 Thread Jonathan Wakely via Gcc
On Tue, 25 Apr 2023 at 13:13, Jonny Grant wrote:
>
> Hello
>
> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
>
> I wondered 'this_length' refers to in that example, it doesn't compile.

It's not supposed to be a complete program.

>
> : In function 'main':
> :13:34: error: 'this_length' undeclared (first use in this function)
>13 |   malloc (sizeof (struct line) + this_length);
>   |  ^~~
>
>
> https://godbolt.org/z/PWEcWsrKv
>
> I probably the size of the struct? So that would be 4 bytes for me, as it is 
> just the int. That doesn't seem very useful. Maybe I am missing something.

Yes, you are. Look at how it's used: malloc is called to allocate
sizeof(struct line) + this_length bytes. Why would it be the size of
the struct?

It's the number of bytes that the zero-length contents array can hold.


Re: zero length array example does not compile

2023-04-25 Thread Jonathan Wakely via Gcc
On Tue, 25 Apr 2023 at 13:17, Jonathan Wakely wrote:
>
> On Tue, 25 Apr 2023 at 13:13, Jonny Grant wrote:
> >
> > Hello
> >
> > https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> >
> > I wondered 'this_length' refers to in that example, it doesn't compile.
>
> It's not supposed to be a complete program.
>
> >
> > : In function 'main':
> > :13:34: error: 'this_length' undeclared (first use in this function)
> >13 |   malloc (sizeof (struct line) + this_length);
> >   |  ^~~
> >
> >
> > https://godbolt.org/z/PWEcWsrKv
> >
> > I probably the size of the struct? So that would be 4 bytes for me, as it 
> > is just the int. That doesn't seem very useful. Maybe I am missing 
> > something.
>
> Yes, you are. Look at how it's used: malloc is called to allocate
> sizeof(struct line) + this_length bytes. Why would it be the size of
> the struct?
>
> It's the number of bytes that the zero-length contents array can hold.

Maybe this change would help:

--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -1705,6 +1705,9 @@ struct line *thisline = (struct line *)
thisline->length = this_length;
@end smallexample

+In this example, @code{thisline->contents} is an array of @code{char} that
+can hold up to @code{thisline->length} bytes.
+
Although the size of a zero-length array is zero, an array member of
this kind may increase the size of the enclosing type as a result of tail
padding.  The offset of a zero-length array member from the beginning


Re: a small C (naive) program faster with clang than with gcc

2023-04-25 Thread Andy via Gcc
I see it in godbolt
GCC compiles to:
movsx eax, BYTE PTR [rdi+2]
cmp al, 9
ja .L42
Clang:
movzx edx, byte ptr [rdi + 2]
cmp edx, 9
ja .LBB0_40


GCC extend with sign, Clang with zero.
cmp with 32 bit register is apparently faster than 8bit

pon., 24 kwi 2023 o 17:34 Basile Starynkevitch
 napisał(a):
>
> Hello all,
>
>
> Consider the naive program (GPLv3+) to solve the cryptaddition
>
> `NEUF` + `DEUX` = `ONZE`
>
> onhttps://github.com/bstarynk/misc-basile/blob/master/CryptArithm/neuf%2Bdeux%3Donze/naive0.c
>   (commit0d1bd0e
> )
>
>
> On Linux/x86-64 that source code compiled with gcc-12 -O3 is twice as
> slower as with clang -O3
>
> (Debian/Sid or Ubuntu/22/10)
>
> Feel free to add it to some testsuite!
>
>
> Thanks
>
>
> --
> Basile Starynkevitch
> (only mine opinions / les opinions sont miennes uniquement)
> 92340 Bourg-la-Reine, France
> web page: starynkevitch.net/Basile/ & refpersys.org


Re: zero length array example does not compile

2023-04-25 Thread Jonny Grant



On 25/04/2023 13:22, Jonathan Wakely wrote:
> On Tue, 25 Apr 2023 at 13:17, Jonathan Wakely wrote:
>>
>> On Tue, 25 Apr 2023 at 13:13, Jonny Grant wrote:
>>>
>>> Hello
>>>
>>> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
>>>
>>> I wondered 'this_length' refers to in that example, it doesn't compile.
>>
>> It's not supposed to be a complete program.
>>
>>>
>>> : In function 'main':
>>> :13:34: error: 'this_length' undeclared (first use in this function)
>>>13 |   malloc (sizeof (struct line) + this_length);
>>>   |  ^~~
>>>
>>>
>>> https://godbolt.org/z/PWEcWsrKv
>>>
>>> I probably the size of the struct? So that would be 4 bytes for me, as it 
>>> is just the int. That doesn't seem very useful. Maybe I am missing 
>>> something.
>>
>> Yes, you are. Look at how it's used: malloc is called to allocate
>> sizeof(struct line) + this_length bytes. Why would it be the size of
>> the struct?
>>
>> It's the number of bytes that the zero-length contents array can hold.
> 
> Maybe this change would help:
> 
> --- a/gcc/doc/extend.texi
> +++ b/gcc/doc/extend.texi
> @@ -1705,6 +1705,9 @@ struct line *thisline = (struct line *)
> thisline->length = this_length;
> @end smallexample
> 
> +In this example, @code{thisline->contents} is an array of @code{char} that
> +can hold up to @code{thisline->length} bytes.
> +
> Although the size of a zero-length array is zero, an array member of
> this kind may increase the size of the enclosing type as a result of tail
> padding.  The offset of a zero-length array member from the beginning

That looks like an improvement.
Doesn't need to be a complete program, feels like a complete example is better.

Adding this to the example would help:
size_t this_length = 10; /* line has capacity for 10 char */



Re: zero length array example does not compile

2023-04-25 Thread Jonathan Wakely via Gcc
On Tue, 25 Apr 2023 at 20:21, Jonny Grant  wrote:
>
>
>
> On 25/04/2023 13:22, Jonathan Wakely wrote:
> > On Tue, 25 Apr 2023 at 13:17, Jonathan Wakely wrote:
> >>
> >> On Tue, 25 Apr 2023 at 13:13, Jonny Grant wrote:
> >>>
> >>> Hello
> >>>
> >>> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> >>>
> >>> I wondered 'this_length' refers to in that example, it doesn't compile.
> >>
> >> It's not supposed to be a complete program.
> >>
> >>>
> >>> : In function 'main':
> >>> :13:34: error: 'this_length' undeclared (first use in this 
> >>> function)
> >>>13 |   malloc (sizeof (struct line) + this_length);
> >>>   |  ^~~
> >>>
> >>>
> >>> https://godbolt.org/z/PWEcWsrKv
> >>>
> >>> I probably the size of the struct? So that would be 4 bytes for me, as it 
> >>> is just the int. That doesn't seem very useful. Maybe I am missing 
> >>> something.
> >>
> >> Yes, you are. Look at how it's used: malloc is called to allocate
> >> sizeof(struct line) + this_length bytes. Why would it be the size of
> >> the struct?
> >>
> >> It's the number of bytes that the zero-length contents array can hold.
> >
> > Maybe this change would help:
> >
> > --- a/gcc/doc/extend.texi
> > +++ b/gcc/doc/extend.texi
> > @@ -1705,6 +1705,9 @@ struct line *thisline = (struct line *)
> > thisline->length = this_length;
> > @end smallexample
> >
> > +In this example, @code{thisline->contents} is an array of @code{char} that
> > +can hold up to @code{thisline->length} bytes.
> > +
> > Although the size of a zero-length array is zero, an array member of
> > this kind may increase the size of the enclosing type as a result of tail
> > padding.  The offset of a zero-length array member from the beginning
>
> That looks like an improvement.
> Doesn't need to be a complete program, feels like a complete example is 
> better.
>
> Adding this to the example would help:
> size_t this_length = 10; /* line has capacity for 10 char */

That seems to prompt more questions though. Why 10 not another number?
Why size_t not the same type as the line.length member? If you have a
hardcoded 10 why not just use an array or 10 char in the struct?

So I'm not convinced your change improves it at all. The specific
value and the specific type are irrelevant when what's needed is just
some number. It isn't actually declared in the example because it's
not actually relevant to the thing being demonstrated.


Re: a small C (naive) program faster with clang than with gcc

2023-04-25 Thread LIU Hao via Gcc

在 2023/4/26 00:01, Andy via Gcc 写道:

I see it in godbolt
GCC compiles to:
movsx eax, BYTE PTR [rdi+2]
cmp al, 9
ja .L42
Clang:
movzx edx, byte ptr [rdi + 2]
cmp edx, 9
ja .LBB0_40


GCC extend with sign, Clang with zero.
cmp with 32 bit register is apparently faster than 8bit


As for extension, it seems to make a difference only if the result is ever written back to memory. 
And for comparison, it makes completely no difference whether the operand is 32-bit or 8-bit, except 
when the operand is an 8-bit ?H register. [1]



[1] https://uops.info/table.html


--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature


Re: a small C (naive) program faster with clang than with gcc

2023-04-25 Thread Gabriel Paubert
On Tue, Apr 25, 2023 at 06:01:22PM +0200, Andy via Gcc wrote:
> I see it in godbolt
> GCC compiles to:
> movsx eax, BYTE PTR [rdi+2]
> cmp al, 9
> ja .L42
> Clang:
> movzx edx, byte ptr [rdi + 2]
> cmp edx, 9
> ja .LBB0_40
> 
> 
> GCC extend with sign, Clang with zero.
> cmp with 32 bit register is apparently faster than 8bit

What happens if you compile with -funsigned-char?

There may be also some alignment issue, after all cmp al,9 is 2 bytes
while cmp edx,9 is 6.

Gabriel

> 
> pon., 24 kwi 2023 o 17:34 Basile Starynkevitch
>  napisał(a):
> >
> > Hello all,
> >
> >
> > Consider the naive program (GPLv3+) to solve the cryptaddition
> >
> > `NEUF` + `DEUX` = `ONZE`
> >
> > onhttps://github.com/bstarynk/misc-basile/blob/master/CryptArithm/neuf%2Bdeux%3Donze/naive0.c
> >   (commit0d1bd0e
> >  >  >)
> >
> >
> > On Linux/x86-64 that source code compiled with gcc-12 -O3 is twice as
> > slower as with clang -O3
> >
> > (Debian/Sid or Ubuntu/22/10)
> >
> > Feel free to add it to some testsuite!
> >
> >
> > Thanks
> >
> >
> > --
> > Basile Starynkevitch
> > (only mine opinions / les opinions sont miennes uniquement)
> > 92340 Bourg-la-Reine, France
> > web page: starynkevitch.net/Basile/ & refpersys.org