Re: Import license issue

2020-09-21 Thread Andrew Stubbs

Ping.

On 14/09/2020 17:56, Andrew Stubbs wrote:

Hi All,

I need to update include/hsa.h to access some newer APIs. The existing 
file was created by copying from the user manual, thus side-stepping 
licensing issues, but the updated user manual omits some important 
details from the APIs I need (mostly the contents of structs and value 
of enums). Of course, I can go see those details in the source, but 
that's not the same thing.


So, what I would like to do is import the header files I need into the 
GCC sources; there's precedent for importing (unmodified) copyright 
files for libffi etc., AFAICT, but of course the license needs to be 
acceptable.


The relevant files are here:

https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa.h
https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h 

https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_image.h 



When I previously enquired about this on IRC I was advised that the 
Illinois license would be unacceptable because it contains an 
attribution clause that would require all binary distributors to credit 
AMD in their documentation, which seems like a reasonable position. I've 
requested that AMD provide a copy of these specific files with a more 
acceptable license, and I may yet be successful, but it's not that simple.


The problem is that GCC already has this exact same license in 
libsanitizer/LICENSE.TXT so, again reasonably, AMD want to know why that 
licence is acceptable and their license is not.


Looking at the files myself, there appears to be some kind of dual 
license thing going on, and the word "Illinois" doesn't actually appear 
in any libsanitizer source file (many of which contain an Apache license 
header). Does this mean that the Illinois license is not actually active 
here? Or is it that it is active and binary distributors really should 
be obeying this attribution clause already?


Can anybody help me untangle this, please?

Are the files acceptable, and if not, how is this different from the 
other cases?


Thanks very much

Andrew




Re: LTO slows down calculix by more than 10% on aarch64

2020-09-21 Thread Prathamesh Kulkarni via Gcc
On Fri, 4 Sep 2020 at 17:08, Alexander Monakov  wrote:
>
> > I obtained perf stat results for following benchmark runs:
> >
> > -O2:
> >
> > 7856832.692380  task-clock (msec) #1.000 CPUs utilized
> >   3758   context-switches  #0.000 K/sec
> > 40 cpu-migrations #0.000 
> > K/sec
> >  40847  page-faults   #0.005 
> > K/sec
> >  7856782413676  cycles   #1.000 GHz
> >  6034510093417  instructions   #0.77  insn per 
> > cycle
> >   363937274287   branches   #   46.321 M/sec
> >48557110132   branch-misses#   13.34% of all 
> > branches
>
> (ouch, 2+ hours per run is a lot, collecting a profile over a minute should be
> enough for this kind of code)
>
> > -O2 with orthonl inlined:
> >
> > 8319643.114380  task-clock (msec)   #1.000 CPUs utilized
> >   4285   context-switches #0.001 K/sec
> > 28 cpu-migrations#0.000 
> > K/sec
> >  40843  page-faults  #0.005 
> > K/sec
> >  8319591038295  cycles  #1.000 GHz
> >  6276338800377  instructions  #0.75  insn per 
> > cycle
> >   467400726106   branches  #   56.180 M/sec
> >45986364011branch-misses  #9.84% of all 
> > branches
>
> So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably 
> implying
> that extra instructions are appearing in this loop nest, but not in the 
> innermost
> loop. As a reminder for others, the innermost loop has only 3 iterations.
>
> > -O2 with orthonl inlined and PRE disabled (this removes the extra branches):
> >
> >8207331.088040  task-clock (msec)   #1.000 CPUs utilized
> >   2266   context-switches#0.000 K/sec
> > 32 cpu-migrations   #0.000 K/sec
> >  40846  page-faults #0.005 K/sec
> >  8207292032467  cycles #   1.000 GHz
> >  6035724436440  instructions #0.74  insn per cycle
> >   364415440156   branches #   44.401 M/sec
> >53138327276branch-misses #   14.58% of all branches
>
> This seems to match baseline in terms of instruction count, but without PRE
> the loop nest may be carrying some dependencies over memory. I would simply
> check the assembly for the entire 6-level loop nest in question, I hope it's
> not very complicated (though Fortran array addressing...).
>
> > -O2 with orthonl inlined and hoisting disabled:
> >
> >7797265.206850  task-clock (msec) #1.000 CPUs utilized
> >   3139  context-switches  #0.000 K/sec
> > 20cpu-migrations #0.000 
> > K/sec
> >  40846  page-faults  #0.005 
> > K/sec
> >  7797221351467  cycles  #1.000 GHz
> >  6187348757324  instructions  #0.79  insn per 
> > cycle
> >   461840800061   branches  #   59.231 M/sec
> >26920311761branch-misses #5.83% of all 
> > branches
>
> There's a 20e9 reduction in branch misses and a 500e9 reduction in cycle 
> count.
> I don't think the former fully covers the latter (there's also a 90e9 
> reduction
> in insn count).
>
> Given that the inner loop iterates only 3 times, my main suggestion is to
> consider how the profile for the entire loop nest looks like (it's 6 loops 
> deep,
> each iterating only 3 times).
>
> > Perf profiles for
> > -O2 -fno-code-hoisting and inlined orthonl:
> > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> >
> >   3196866 |1f04:ldur   d1, [x1, #-248]
> > 216348301800│addw0, w0, #0x1
> > 985098 |addx2, x2, #0x18
> > 216215999206│addx1, x1, #0x48
> > 215630376504│fmul   d1, d5, d1
> > 863829148015│fmul   d1, d1, d6
> > 864228353526│fmul   d0, d1, d0
> > 864568163014│fmadd  d2, d0, d16, d2
> > │ cmpw0, #0x4
> > 216125427594│  ↓ b.eq   1f34
> > 15010377│ ldur   d0, [x2, #-8]
> > 143753737468│  ↑ b  1f04
> >
> > -O2 with inlined orthonl:
> > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> >
> > 359871503840│ 1ef8:   ldur   d15, [x1, #-248]
> > 144055883055│addw0, w0, #0x1
> >   72262104254│addx2, x2, #0x18
> > 143991169721│addx1, x1

Re: desired behavior or missing warning?

2020-09-21 Thread Florian Weimer via Gcc
* Ulrich Drepper via Gcc:

> I found myself with code similar to this:
>
> struct base {
>   virtual void cb() = 0;
> };
>
> struct deriv final : public base {
>   void cb() final override { }
> };
>
>
> The question is about the second use of 'final'.  Because the entire
> class is declared final, should the individual function's annotation be
> flagged with a warning?  I personally think it should because it might
> distract from the final of the class itself.

It is not always redundant.  This is not expected to compile:

struct base {
  void cb();
};

struct deriv final : public base {
  void cb() final { }
};

I don't know why the standard requires this check for a virtual function
definition.  Knowing that would help to decide whether the new warning
makes sense.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: Expose 'array_length()' macro in or

2020-09-21 Thread Alejandro Colomar via Gcc

[[
CC += libc-co...@sourceware.org
CC += gcc@gcc.gnu.org
CC += libstd...@gcc.gnu.org
]]

Hi Florian,

On 2020-09-21 10:38, Florian Weimer wrote:
> * Alejandro Colomar via Libc-alpha:
>
>> I'd like to propose exposing the macro 'array_length()' as defined in
>> 'include/array_length.h' to the user.
>
> It would need a good C++ port, probably one for C++98 and another one
> for C++14 or later.

For C++, I use the following definition:


#include 
#include 
#include 


#define is_array__(a)   (std::is_array <__typeof__(a)>::value)

#define must_be_array(arr)  \
static_assert(is_array__(arr), "Must be an array !")


#define array_length(arr)   (   \
{   \
must_be_array(arr); \
__arraycount((arr));\
}   \
)


This solves the problem about G++ not having
__builtin_types_compatible_p().

However, there are a few problems:

1) This doesn't work for VLAs (GNU extension).
   I couldn't find a way to do it.  Maybe I should file a bug in GCC.

2) Also, this requires C++11; I don't know how to do it for older C++.
   Again, support from the compiler would be great.

3) The macro can't be used in the same places as the C version,
   because of the `({})`.
   The `0 * sizeof(struct{...})` trick doesn't work in C++ due to:
error: types may not be defined in 'sizeof' expressions

>
>> Libbsd provides '__arraycount()' in  and some BSDs provide
>> 'nitems()' in , so any of those 2 headers may be a good
>> place to do it.
>
> In this case, I would prefer nitems in , given that there
> is precedent for it.   seems to be a bit drastic for a new
> macro with such a common name; it would create widespread build
> breakage.

Ok.  I guess you would use 'nitems()' name, right?

>
> Maybe also ask on the libc-coord list.

Ok.  Added CCs.

>
> Thanks,
> Florian
>

Thanks,

Alex


Re: Expose 'array_length()' macro in or

2020-09-21 Thread Florian Weimer via Gcc
* Alejandro Colomar:

> [[
> CC += libc-co...@sourceware.org
> CC += gcc@gcc.gnu.org
> CC += libstd...@gcc.gnu.org
> ]]
>
> Hi Florian,
>
> On 2020-09-21 10:38, Florian Weimer wrote:
>> * Alejandro Colomar via Libc-alpha:
>>
>>> I'd like to propose exposing the macro 'array_length()' as defined in
>>> 'include/array_length.h' to the user.
>>
>> It would need a good C++ port, probably one for C++98 and another one
>> for C++14 or later.
>
> For C++, I use the following definition:
>
>
>   #include 
>   #include 
>   #include 
>
>
>   #define is_array__(a)   (std::is_array <__typeof__(a)>::value)

Should be decltype.

> However, there are a few problems:
>
> 1) This doesn't work for VLAs (GNU extension).
>I couldn't find a way to do it.  Maybe I should file a bug in GCC.

I do not think VLA support is critical.  C++ programmers will be used to
limited support in utility functions.

> 2) Also, this requires C++11; I don't know how to do it for older C++.
>Again, support from the compiler would be great.

I think limited C++98 support is possible using a function template,
where the array length N is a template parameter.  To enable use in
constant expressions, you can return a type of char[N], and the macro
wrapper should then apply sizeof to the function result.

> 3) The macro can't be used in the same places as the C version,
>because of the `({})`.
>The `0 * sizeof(struct{...})` trick doesn't work in C++ due to:
>   error: types may not be defined in 'sizeof' expressions

For C++11, you can use a constexpr function instead of a macro.

array_length should not be a macro in current C++ modes, so that we
retain compatibility if a future C++ standard adds array_length (or
nitems) on its own.  This is not a concern for legacy C++98 mode.

>> Maybe also ask on the libc-coord list.
>
> Ok.  Added CCs.

libc-coord is not hosted on sourceware:

  

The discussion here veered off into C++ territory anyway.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill



Re: Import license issue

2020-09-21 Thread Richard Biener via Gcc
On Mon, Sep 21, 2020 at 10:55 AM Andrew Stubbs  wrote:
>
> Ping.

Sorry, but you won't get any help resolving license issues from the
mailing list.
Instead you should eventually ask the SC to "resolve" this issue with the FSF.

Richard.

> On 14/09/2020 17:56, Andrew Stubbs wrote:
> > Hi All,
> >
> > I need to update include/hsa.h to access some newer APIs. The existing
> > file was created by copying from the user manual, thus side-stepping
> > licensing issues, but the updated user manual omits some important
> > details from the APIs I need (mostly the contents of structs and value
> > of enums). Of course, I can go see those details in the source, but
> > that's not the same thing.
> >
> > So, what I would like to do is import the header files I need into the
> > GCC sources; there's precedent for importing (unmodified) copyright
> > files for libffi etc., AFAICT, but of course the license needs to be
> > acceptable.
> >
> > The relevant files are here:
> >
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa.h
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_amd.h
> >
> > https://github.com/RadeonOpenCompute/ROCR-Runtime/blob/master/src/inc/hsa_ext_image.h
> >
> >
> > When I previously enquired about this on IRC I was advised that the
> > Illinois license would be unacceptable because it contains an
> > attribution clause that would require all binary distributors to credit
> > AMD in their documentation, which seems like a reasonable position. I've
> > requested that AMD provide a copy of these specific files with a more
> > acceptable license, and I may yet be successful, but it's not that simple.
> >
> > The problem is that GCC already has this exact same license in
> > libsanitizer/LICENSE.TXT so, again reasonably, AMD want to know why that
> > licence is acceptable and their license is not.
> >
> > Looking at the files myself, there appears to be some kind of dual
> > license thing going on, and the word "Illinois" doesn't actually appear
> > in any libsanitizer source file (many of which contain an Apache license
> > header). Does this mean that the Illinois license is not actually active
> > here? Or is it that it is active and binary distributors really should
> > be obeying this attribution clause already?
> >
> > Can anybody help me untangle this, please?
> >
> > Are the files acceptable, and if not, how is this different from the
> > other cases?
> >
> > Thanks very much
> >
> > Andrew
>


Re: Import license issue

2020-09-21 Thread Andrew Stubbs

On 21/09/2020 12:31, Richard Biener wrote:

On Mon, Sep 21, 2020 at 10:55 AM Andrew Stubbs  wrote:


Ping.


Sorry, but you won't get any help resolving license issues from the
mailing list.
Instead you should eventually ask the SC to "resolve" this issue with the FSF.


Agreed, I don't really expect legal advice, but I am hoping somebody on 
the list has some historical details that might help me.


Thanks

Andrew


Re: LTO slows down calculix by more than 10% on aarch64

2020-09-21 Thread Prathamesh Kulkarni via Gcc
On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
 wrote:
>
> On Fri, 4 Sep 2020 at 17:08, Alexander Monakov  wrote:
> >
> > > I obtained perf stat results for following benchmark runs:
> > >
> > > -O2:
> > >
> > > 7856832.692380  task-clock (msec) #1.000 CPUs utilized
> > >   3758   context-switches  #0.000 
> > > K/sec
> > > 40 cpu-migrations #0.000 
> > > K/sec
> > >  40847  page-faults   #0.005 
> > > K/sec
> > >  7856782413676  cycles   #1.000 GHz
> > >  6034510093417  instructions   #0.77  insn 
> > > per cycle
> > >   363937274287   branches   #   46.321 M/sec
> > >48557110132   branch-misses#   13.34% of all 
> > > branches
> >
> > (ouch, 2+ hours per run is a lot, collecting a profile over a minute should 
> > be
> > enough for this kind of code)
> >
> > > -O2 with orthonl inlined:
> > >
> > > 8319643.114380  task-clock (msec)   #1.000 CPUs utilized
> > >   4285   context-switches #0.001 K/sec
> > > 28 cpu-migrations#0.000 
> > > K/sec
> > >  40843  page-faults  #0.005 
> > > K/sec
> > >  8319591038295  cycles  #1.000 GHz
> > >  6276338800377  instructions  #0.75  insn per 
> > > cycle
> > >   467400726106   branches  #   56.180 M/sec
> > >45986364011branch-misses  #9.84% of all 
> > > branches
> >
> > So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably 
> > implying
> > that extra instructions are appearing in this loop nest, but not in the 
> > innermost
> > loop. As a reminder for others, the innermost loop has only 3 iterations.
> >
> > > -O2 with orthonl inlined and PRE disabled (this removes the extra 
> > > branches):
> > >
> > >8207331.088040  task-clock (msec)   #1.000 CPUs utilized
> > >   2266   context-switches#0.000 K/sec
> > > 32 cpu-migrations   #0.000 K/sec
> > >  40846  page-faults #0.005 K/sec
> > >  8207292032467  cycles #   1.000 GHz
> > >  6035724436440  instructions #0.74  insn per cycle
> > >   364415440156   branches #   44.401 M/sec
> > >53138327276branch-misses #   14.58% of all branches
> >
> > This seems to match baseline in terms of instruction count, but without PRE
> > the loop nest may be carrying some dependencies over memory. I would simply
> > check the assembly for the entire 6-level loop nest in question, I hope it's
> > not very complicated (though Fortran array addressing...).
> >
> > > -O2 with orthonl inlined and hoisting disabled:
> > >
> > >7797265.206850  task-clock (msec) #1.000 CPUs utilized
> > >   3139  context-switches  #0.000 K/sec
> > > 20cpu-migrations #0.000 
> > > K/sec
> > >  40846  page-faults  #0.005 
> > > K/sec
> > >  7797221351467  cycles  #1.000 GHz
> > >  6187348757324  instructions  #0.79  insn per 
> > > cycle
> > >   461840800061   branches  #   59.231 M/sec
> > >26920311761branch-misses #5.83% of all 
> > > branches
> >
> > There's a 20e9 reduction in branch misses and a 500e9 reduction in cycle 
> > count.
> > I don't think the former fully covers the latter (there's also a 90e9 
> > reduction
> > in insn count).
> >
> > Given that the inner loop iterates only 3 times, my main suggestion is to
> > consider how the profile for the entire loop nest looks like (it's 6 loops 
> > deep,
> > each iterating only 3 times).
> >
> > > Perf profiles for
> > > -O2 -fno-code-hoisting and inlined orthonl:
> > > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> > >
> > >   3196866 |1f04:ldur   d1, [x1, #-248]
> > > 216348301800│addw0, w0, #0x1
> > > 985098 |addx2, x2, #0x18
> > > 216215999206│addx1, x1, #0x48
> > > 215630376504│fmul   d1, d5, d1
> > > 863829148015│fmul   d1, d1, d6
> > > 864228353526│fmul   d0, d1, d0
> > > 864568163014│fmadd  d2, d0, d16, d2
> > > │ cmpw0, #0x4
> > > 216125427594│  ↓ b.eq   1f34
> > > 15010377│ ldur   d0, [x2, #-8]
> > > 143753737468│  ↑ b  1f04
> > >
> > > -O2 with

Re: Expose 'array_length()' macro in or

2020-09-21 Thread Alejandro Colomar via Gcc

[[ CC += libc-coord at lists.openwall.com ]]

On 2020-09-21 12:33, Florian Weimer wrote:

* Alejandro Colomar:


[[
CC += libc-coord at sourceware.org
CC += gcc at gcc.gnu.org
CC += libstdc++ at gcc.gnu.org
]]

Hi Florian,

On 2020-09-21 10:38, Florian Weimer wrote:

* Alejandro Colomar via Libc-alpha:


I'd like to propose exposing the macro 'array_length()' as defined in
'include/array_length.h' to the user.


It would need a good C++ port, probably one for C++98 and another one
for C++14 or later.


For C++, I use the following definition:


#include 
#include 
#include 


#define is_array__(a)   (std::is_array <__typeof__(a)>::value)


Should be decltype.


Thanks.




However, there are a few problems:

1) This doesn't work for VLAs (GNU extension).
I couldn't find a way to do it.  Maybe I should file a bug in GCC.


I do not think VLA support is critical.  C++ programmers will be used to
limited support in utility functions.


2) Also, this requires C++11; I don't know how to do it for older C++.
Again, support from the compiler would be great.


I think limited C++98 support is possible using a function template,
where the array length N is a template parameter.  To enable use in
constant expressions, you can return a type of char[N], and the macro
wrapper should then apply sizeof to the function result.


Sorry, I don't know much C++, and I don't know how to do this.




3) The macro can't be used in the same places as the C version,
because of the `({})`.
The `0 * sizeof(struct{...})` trick doesn't work in C++ due to:
error: types may not be defined in 'sizeof' expressions


For C++11, you can use a constexpr function instead of a macro.

array_length should not be a macro in current C++ modes, so that we
retain compatibility if a future C++ standard adds array_length (or
nitems) on its own.  This is not a concern for legacy C++98 mode.


See above.




Maybe also ask on the libc-coord list.


Ok.  Added CCs.


libc-coord is not hosted on sourceware:

   

The discussion here veered off into C++ territory anyway.


I added the correct list now.



Thanks,
Florian



Thanks,

Alex.


Re: Expose 'array_length()' macro in or

2020-09-21 Thread Jonathan Wakely via Gcc

On 21/09/20 12:33 +0200, Florian Weimer via Libstdc++ wrote:

* Alejandro Colomar:


[[
CC += libc-co...@sourceware.org
CC += gcc@gcc.gnu.org
CC += libstd...@gcc.gnu.org
]]

Hi Florian,

On 2020-09-21 10:38, Florian Weimer wrote:

* Alejandro Colomar via Libc-alpha:


I'd like to propose exposing the macro 'array_length()' as defined in
'include/array_length.h' to the user.


It would need a good C++ port, probably one for C++98 and another one
for C++14 or later.


For C++, I use the following definition:


#include 
#include 
#include 


#define is_array__(a)   (std::is_array <__typeof__(a)>::value)


Should be decltype.


And it's wrong for references to arrays, it should be
is_array::type>::value.


However, there are a few problems:

1) This doesn't work for VLAs (GNU extension).
   I couldn't find a way to do it.  Maybe I should file a bug in GCC.


I do not think VLA support is critical.  C++ programmers will be used to
limited support in utility functions.


2) Also, this requires C++11; I don't know how to do it for older C++.
   Again, support from the compiler would be great.


I think limited C++98 support is possible using a function template,
where the array length N is a template parameter.  To enable use in
constant expressions, you can return a type of char[N], and the macro
wrapper should then apply sizeof to the function result.


Right, it's trivial to write in any version of C++:

template
#if __cplusplus >= 201103L
constexpr
#endif
inline std::size_t
array_length(const T(&)[N])
#if __cplusplus >= 201103L
noexcept
#endif
{ return N; }


3) The macro can't be used in the same places as the C version,
   because of the `({})`.
   The `0 * sizeof(struct{...})` trick doesn't work in C++ due to:
error: types may not be defined in 'sizeof' expressions


For C++11, you can use a constexpr function instead of a macro.

array_length should not be a macro in current C++ modes, so that we
retain compatibility if a future C++ standard adds array_length (or
nitems) on its own.  This is not a concern for legacy C++98 mode.


A macro is 100% unacceptable for C++.

This function already exists anyway, see std::size:
https://en.cppreference.com/w/cpp/iterator/size





Re: [libc-coord] Re: Expose 'array_length()' macro in or

2020-09-21 Thread enh via Gcc
Why would C++ programmers need this given
https://en.cppreference.com/w/cpp/iterator/size ?

On Mon, Sep 21, 2020, 05:54 Alejandro Colomar 
wrote:

> [[ CC += libc-coord at lists.openwall.com ]]
>
> On 2020-09-21 12:33, Florian Weimer wrote:
> > * Alejandro Colomar:
> >
> >> [[
> >> CC += libc-coord at sourceware.org
> >> CC += gcc at gcc.gnu.org
> >> CC += libstdc++ at gcc.gnu.org
> >> ]]
> >>
> >> Hi Florian,
> >>
> >> On 2020-09-21 10:38, Florian Weimer wrote:
> >>> * Alejandro Colomar via Libc-alpha:
> >>>
>  I'd like to propose exposing the macro 'array_length()' as defined in
>  'include/array_length.h' to the user.
> >>>
> >>> It would need a good C++ port, probably one for C++98 and another one
> >>> for C++14 or later.
> >>
> >> For C++, I use the following definition:
> >>
> >>
> >>  #include 
> >>  #include 
> >>  #include 
> >>
> >>
> >>  #define is_array__(a)   (std::is_array <__typeof__(a)>::value)
> >
> > Should be decltype.
>
> Thanks.
>
> >
> >> However, there are a few problems:
> >>
> >> 1) This doesn't work for VLAs (GNU extension).
> >> I couldn't find a way to do it.  Maybe I should file a bug in GCC.
> >
> > I do not think VLA support is critical.  C++ programmers will be used to
> > limited support in utility functions.
> >
> >> 2) Also, this requires C++11; I don't know how to do it for older C++.
> >> Again, support from the compiler would be great.
> >
> > I think limited C++98 support is possible using a function template,
> > where the array length N is a template parameter.  To enable use in
> > constant expressions, you can return a type of char[N], and the macro
> > wrapper should then apply sizeof to the function result.
>
> Sorry, I don't know much C++, and I don't know how to do this.
>
> >
> >> 3) The macro can't be used in the same places as the C version,
> >> because of the `({})`.
> >> The `0 * sizeof(struct{...})` trick doesn't work in C++ due to:
> >>  error: types may not be defined in 'sizeof' expressions
> >
> > For C++11, you can use a constexpr function instead of a macro.
> >
> > array_length should not be a macro in current C++ modes, so that we
> > retain compatibility if a future C++ standard adds array_length (or
> > nitems) on its own.  This is not a concern for legacy C++98 mode.
>
> See above.
>
> >
> >>> Maybe also ask on the libc-coord list.
> >>
> >> Ok.  Added CCs.
> >
> > libc-coord is not hosted on sourceware:
> >
> >
> >
> > The discussion here veered off into C++ territory anyway.
>
> I added the correct list now.
>
> >
> > Thanks,
> > Florian
> >
>
> Thanks,
>
> Alex.
>


Re: Expose 'array_length()' macro in

2020-09-21 Thread Alejandro Colomar via Gcc
I have developed this draft code, the C++ part being based on what you 
wrote.


I am a C programmer, and my C++ is very basic, and I tend to write 
C-compatible code when I need C++, so I can't really write the C++ part.


I tested the code with all C versions (--std= {c89, c99, c11, c18, 
c2x}), and it worked for all of them (correctly returning 18 in all of 
them), and if I uncomment the part of the pointer, it has a nice error 
message.  I used `-Wall -Wextra -Werror -pedantic -Wno-vla 
-Wno-sizeof-pointer-div`.


However, the C++ part needs some work to be able to compile.

Would you mind finishing it?


Thanks,

Alex
--

#if defined(__cplusplus)
# include 

# if __cplusplus >= 201703L
#  include 
#  define array_length(arr) (std:size(arr))
# else

#  if __cplusplus >= 201103L
constexpr
#  endif
inline std::size_t
array_length(const T(&array)[N])
#  if __cplusplus >= 201103L
noexcept
#  endif
{
return  N;
}
# endif

# if __cplusplus >= 202002L
#  define array_slength(arr)(std:ssize(arr))
# else

#  if __cplusplus >= 201103L
constexpr
#  endif
inline std::ptrdiff_t
array_slength(const T(&array)[N])
#  if __cplusplus >= 201103L
noexcept
#  endif
{
return  N;
}
# endif


#else /* !defined(__cplusplus) */
#include 

# define __is_same_type(a, b)   \
__builtin_types_compatible_p(__typeof__(a), __typeof__(b))
# define __is_array(arr)(!__is_same_type((arr), &(arr)[0]))

# if __STDC_VERSION__ >= 201112L
#  define __must_be(e, msg) (   \
0 * (int)sizeof(\
struct {\
_Static_assert((e), msg);   \
char ISO_C_forbids_a_struct_with_no_members__;  \
}   \
)   \
)
# else
#  define __must_be(e, msg) (   \
0 * (int)sizeof(\
struct {\
int : (-!(e));  \
char ISO_C_forbids_a_struct_with_no_members__;  \
}   \
)   \
)
# endif

# define __must_be_array(arr)	__must_be(__is_array(arr), "Must be an 
array!")


# define __array_length(arr)(sizeof(arr) / sizeof((arr)[0]))
# define array_length(arr)  (__array_length(arr) + __must_be_array(arr))
# define array_slength(arr) ((ptrdiff_t)array_length(arr))
#endif


int main(void)
{
int a[5];
const int x = 6;
int b[x];
#if __cplusplus >= 201103L
constexpr
#endif
int y = 7;
int c[y];
int *p;
(void)p;

return  array_slength(a) + array_slength(b) +
array_length(c) /*+
array_length(p)*/;
}





Re: Expose 'array_length()' macro in

2020-09-21 Thread Jonathan Wakely via Gcc

On 21/09/20 23:52 +0200, Alejandro Colomar via Libstdc++ wrote:
I have developed this draft code, the C++ part being based on what you 
wrote.


I am a C programmer, and my C++ is very basic, and I tend to write 
C-compatible code when I need C++, so I can't really write the C++ 
part.


I tested the code with all C versions (--std= {c89, c99, c11, c18, 
c2x}), and it worked for all of them (correctly returning 18 in all of 
them), and if I uncomment the part of the pointer, it has a nice error 
message.  I used `-Wall -Wextra -Werror -pedantic -Wno-vla 
-Wno-sizeof-pointer-div`.


However, the C++ part needs some work to be able to compile.

Would you mind finishing it?


Thanks,

Alex
--

#if defined(__cplusplus)
# include 

# if __cplusplus >= 201703L
#  include 


That should be  not .


#  define array_length(arr) (std:size(arr))


C++ programmers will not accept a macro for this.


# else



You're missing "template" here.


#  if __cplusplus >= 201103L
constexpr
#  endif
inline std::size_t
array_length(const T(&array)[N])


Remove the name of the unused parameter.


#  if __cplusplus >= 201103L
noexcept
#  endif
{
return  N;
}
# endif

# if __cplusplus >= 202002L
#  define array_slength(arr)(std:ssize(arr))
# else

#  if __cplusplus >= 201103L
constexpr
#  endif
inline std::ptrdiff_t
array_slength(const T(&array)[N])
#  if __cplusplus >= 201103L
noexcept
#  endif
{
return  N;
}
# endif


#else /* !defined(__cplusplus) */
#include 

# define __is_same_type(a, b)   \
__builtin_types_compatible_p(__typeof__(a), __typeof__(b))
# define __is_array(arr)(!__is_same_type((arr), &(arr)[0]))

# if __STDC_VERSION__ >= 201112L
#  define __must_be(e, msg) (   \
0 * (int)sizeof(\
struct {\
_Static_assert((e), msg);   \
char ISO_C_forbids_a_struct_with_no_members__;  \
}   \
)   \
)
# else
#  define __must_be(e, msg) (   \
0 * (int)sizeof(\
struct {\
int : (-!(e));  \
char ISO_C_forbids_a_struct_with_no_members__;  \
}   \
)   \
)
# endif

# define __must_be_array(arr)	__must_be(__is_array(arr), "Must be an 
array!")


# define __array_length(arr)(sizeof(arr) / sizeof((arr)[0]))
# define array_length(arr)  (__array_length(arr) + __must_be_array(arr))
# define array_slength(arr) ((ptrdiff_t)array_length(arr))
#endif


int main(void)
{
int a[5];
const int x = 6;
int b[x];
#if __cplusplus >= 201103L
constexpr
#endif
int y = 7;
int c[y];
int *p;
(void)p;

return  array_slength(a) + array_slength(b) +
array_length(c) /*+
array_length(p)*/;
}







Re: Expose 'array_length()' macro in

2020-09-21 Thread Ville Voutilainen via Gcc
On Tue, 22 Sep 2020 at 01:07, Jonathan Wakely via Libstdc++
 wrote:
> >#  define array_length(arr)(std:size(arr))
>
> C++ programmers will not accept a macro for this.

..in other words, the C++17 version of it needs to be an inline
function that returns std::size of an array,
not a macro. All C++ versions need to be functions, and there should
not be any #defines in any of the
C++ code.

Why should this be array_length and not __array_length? This is a
vendor extension, so it should use
a name that is reserved for such?


Re: LTO slows down calculix by more than 10% on aarch64

2020-09-21 Thread Prathamesh Kulkarni via Gcc
On Mon, 21 Sep 2020 at 18:14, Prathamesh Kulkarni
 wrote:
>
> On Mon, 21 Sep 2020 at 15:19, Prathamesh Kulkarni
>  wrote:
> >
> > On Fri, 4 Sep 2020 at 17:08, Alexander Monakov  wrote:
> > >
> > > > I obtained perf stat results for following benchmark runs:
> > > >
> > > > -O2:
> > > >
> > > > 7856832.692380  task-clock (msec) #1.000 CPUs 
> > > > utilized
> > > >   3758   context-switches  #0.000 
> > > > K/sec
> > > > 40 cpu-migrations #
> > > > 0.000 K/sec
> > > >  40847  page-faults   #
> > > > 0.005 K/sec
> > > >  7856782413676  cycles   #1.000 GHz
> > > >  6034510093417  instructions   #0.77  insn 
> > > > per cycle
> > > >   363937274287   branches   #   46.321 M/sec
> > > >48557110132   branch-misses#   13.34% of all 
> > > > branches
> > >
> > > (ouch, 2+ hours per run is a lot, collecting a profile over a minute 
> > > should be
> > > enough for this kind of code)
> > >
> > > > -O2 with orthonl inlined:
> > > >
> > > > 8319643.114380  task-clock (msec)   #1.000 CPUs utilized
> > > >   4285   context-switches #0.001 
> > > > K/sec
> > > > 28 cpu-migrations#0.000 
> > > > K/sec
> > > >  40843  page-faults  #0.005 
> > > > K/sec
> > > >  8319591038295  cycles  #1.000 GHz
> > > >  6276338800377  instructions  #0.75  insn 
> > > > per cycle
> > > >   467400726106   branches  #   56.180 M/sec
> > > >45986364011branch-misses  #9.84% of all 
> > > > branches
> > >
> > > So +100e9 branches, but +240e9 instructions and +480e9 cycles, probably 
> > > implying
> > > that extra instructions are appearing in this loop nest, but not in the 
> > > innermost
> > > loop. As a reminder for others, the innermost loop has only 3 iterations.
> > >
> > > > -O2 with orthonl inlined and PRE disabled (this removes the extra 
> > > > branches):
> > > >
> > > >8207331.088040  task-clock (msec)   #1.000 CPUs utilized
> > > >   2266   context-switches#0.000 K/sec
> > > > 32 cpu-migrations   #0.000 K/sec
> > > >  40846  page-faults #0.005 K/sec
> > > >  8207292032467  cycles #   1.000 GHz
> > > >  6035724436440  instructions #0.74  insn per 
> > > > cycle
> > > >   364415440156   branches #   44.401 M/sec
> > > >53138327276branch-misses #   14.58% of all 
> > > > branches
> > >
> > > This seems to match baseline in terms of instruction count, but without 
> > > PRE
> > > the loop nest may be carrying some dependencies over memory. I would 
> > > simply
> > > check the assembly for the entire 6-level loop nest in question, I hope 
> > > it's
> > > not very complicated (though Fortran array addressing...).
> > >
> > > > -O2 with orthonl inlined and hoisting disabled:
> > > >
> > > >7797265.206850  task-clock (msec) #1.000 CPUs 
> > > > utilized
> > > >   3139  context-switches  #0.000 
> > > > K/sec
> > > > 20cpu-migrations #0.000 
> > > > K/sec
> > > >  40846  page-faults  #0.005 
> > > > K/sec
> > > >  7797221351467  cycles  #1.000 GHz
> > > >  6187348757324  instructions  #0.79  insn 
> > > > per cycle
> > > >   461840800061   branches  #   59.231 M/sec
> > > >26920311761branch-misses #5.83% of all 
> > > > branches
> > >
> > > There's a 20e9 reduction in branch misses and a 500e9 reduction in cycle 
> > > count.
> > > I don't think the former fully covers the latter (there's also a 90e9 
> > > reduction
> > > in insn count).
> > >
> > > Given that the inner loop iterates only 3 times, my main suggestion is to
> > > consider how the profile for the entire loop nest looks like (it's 6 
> > > loops deep,
> > > each iterating only 3 times).
> > >
> > > > Perf profiles for
> > > > -O2 -fno-code-hoisting and inlined orthonl:
> > > > https://people.linaro.org/~prathamesh.kulkarni/perf_O2_inline.data
> > > >
> > > >   3196866 |1f04:ldur   d1, [x1, #-248]
> > > > 216348301800│addw0, w0, #0x1
> > > > 985098 |addx2, x2, #0x18
> > > > 216215999206│addx1, x1, #0x48
> > > > 215630376504│fmul   d1, d5, d1
> > > > 863829148015│fmul