Re: Pro64-based GPLed compiler

2005-06-29 Thread Marc
Vladimir Makarov wrote:

> Marc Gonzalez-Sigler wrote:
> 
>> I've taken PathScale's source tree (they've removed the IA-64 code
>> generator, and added an x86/AMD64 code generator), and tweaked the
>> Makefiles.
>>
>> I thought some of you might want to take a look at the compiler.
>>
>> http://www-rocq.inria.fr/~gonzalez/vrac/open64-alchemy-src.tar.bz2
>
> This reference doesn't work.   The directory vrac looks empty.

Fixed. I'll never understand how AFS ACLs work ;-(


This message was sent using IMP, the Internet Messaging Program.


gcc plugin on MacOS failure

2021-07-22 Thread Marc
I15gimple_opt_pass in ccHhkWiv.o
  "__ZTV8opt_pass", referenced from:
  __ZN8opt_passD2Ev in ccHhkWiv.o
  NOTE: a missing vtable usually means the first non-inline virtual
member function has no definition.
  "_g", referenced from:
  __ZN12_GLOBAL__N_18afl_passC1Ebj in ccHhkWiv.o
 (maybe you meant:
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE8allocateEmPKv,
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE9constructIS7_JRKS7_EEEvPT_DpOT0_
,
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE10deallocateEPS8_m
, __ZN9__gnu_cxx17__is_null_pointerIKcEEbPT_ ,
__ZNSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE11_M_get_nodeEv
,
__ZN9__gnu_cxx16__aligned_membufINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc6_M_ptrEv
,
__ZSt19__iterator_categoryIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEENSt15iterator_traitsIT_E17iterator_categoryERKSB_
,
__ZNKSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE11_M_get_sizeEv
,
__ZNSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE21_M_get_Node_allocatorEv
,
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEC2Ev
, __ZN9__gnu_cxx14__alloc_traitsISaIcEcE17_S_select_on_copyERKS1_ ,
__ZNK9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc4baseEv
,
__ZSt11__remove_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_
, __ZNK3vecIP8edge_def5va_gc8vl_embedE6lengthEv ,
__ZN9__gnu_cxx11char_traitsIcE2eqERKcS3_ ,
__ZN9__gnu_cxx16__aligned_membufINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc7_M_addrEv
,
__ZN9__gnu_cxxneIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEbRKNS_17__normal_iteratorIT_T0_EESD_
,
__ZN9__gnu_cxx5__ops10_Iter_predIPFiiEEclINS_17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbT_
, __Z15vec_safe_lengthIP8edge_def5va_gcEjPK3vecIT_T0_8vl_embedE ,
__ZSt9remove_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEPFiiEET_SC_SC_T0_
, __ZN9__gnu_cxx5__ops11__pred_iterIPFiiEEENS0_10_Iter_predIT_EES5_ ,
__ZN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcC1IPcEERKNS0_IT_NS_11__enable_ifIXsrSt10__are_sameISC_SB_E7__valueES8_E6__typeEEE
,
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE7destroyIS7_EEvPT_
, __ZN3vecIP8edge_def5va_gc8vl_embedEixEj ,
__ZNK9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcdeEv
,
__ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_
, __ZN9__gnu_cxx11char_traitsIcE6lengthEPKc ,
__ZN9__gnu_cxxmiIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS_17__normal_iteratorIT_T0_E15difference_typeERKSB_SE_
,
__ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcED2Ev
,
__ZN9__gnu_cxxeqIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEbRKNS_17__normal_iteratorIT_T0_EESD_
,
__ZN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcppEv
, __ZN9__gnu_cxx5__ops10_Iter_predIPFiiEEC1ES3_ ,
__ZNK9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE11_M_max_sizeEv
,
__ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_St26random_access_iterator_tag
)
  "_global_options", referenced from:
  __ZN12_GLOBAL__N_18afl_pass21get_afl_prev_loc_declEv in ccHhkWiv.o
  "_global_trees", referenced from:
  __ZN12_GLOBAL__N_18afl_pass18get_afl_trace_declEv in ccHhkWiv.o
  __ZN12_GLOBAL__N_18afl_pass21get_afl_prev_loc_declEv in ccHhkWiv.o
  "_integer_types", referenced from:
  __ZN12_GLOBAL__N_18afl_pass21get_afl_area_ptr_declEv in ccHhkWiv.o
  "_plugin_default_version_check", referenced from:
  _plugin_init in ccHhkWiv.o
  "_register_callback", referenced from:
  _plugin_init in ccHhkWiv.o
  "_sizetype_tab", referenced from:
  __ZN12_GLOBAL__N_18afl_pass7executeEP8function in ccHhkWiv.o
  "_xrealloc", referenced from:
  __ZN7va_heap7reserveIP9tree_nodeEEvRP3vecIT_S_8vl_embedEjb in
ccHhkWiv.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status

When I then look who might be supplying "_plugin_default_version_check"
I only find
/usr/local/opt/gcc@11/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951
which is a program and no lib.

Anyone knows how this can be fixed?
Thank you!

Regards,
Marc

-- 
Marc Heuse
PGP: AF3D 1D4C D810 F0BB 977D  3807 C7EE D0A0 6BE9 F573


Re: gcc plugin on MacOS failure

2021-07-23 Thread Marc
Thank you so far, this got me (unsurprisingly) one step further, but
then the external function resolve error is moved to the library loading
stage:

~/afl++ $ g++-11 -Wl,-flat_namespace -Wl,-undefined,dynamic_lookup -g
-fPIC -std=c++11
-I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin/include
-I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin
-I/usr/local//Cellar/gmp/6.2.1/include -shared
instrumentation/afl-gcc-pass.so.cc -o afl-gcc-pass.so

=> compiles because the linker does not bail on functions it cannot
resolve at link time.

~/afl++ $ ./afl-gcc-fast -o test-instr test-instr.c
afl-cc ++3.15a by Michal Zalewski, Laszlo Szekeres, Marc Heuse - mode:
GCC_PLUGIN-DEFAULT
error: unable to load plugin './afl-gcc-pass.so':
'dlopen(./afl-gcc-pass.so, 9): Symbol not found:
__ZN8opt_pass14set_pass_paramEjb
  Referenced from: ./afl-gcc-pass.so
  Expected in: flat namespace
 in ./afl-gcc-pass.so'

Looking which library might be supplying this call does not show any
library:
~/afl++ $ egrep -ral __ZN8opt_pass14set_pass_paramEjb /usr/local/
/usr/local//var/homebrew/linked/gcc/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951
/usr/local//opt/gcc@11/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951
/usr/local//opt/gfortran/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951
/usr/local//opt/gcc/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951
/usr/local//Cellar/gcc/11.1.0_1/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951

(on the other hand it is the same on Linux, I cannot find a library that
actually supplies that function.

Thank you!

Regards,
Marc


On 22.07.21 22:16, Iain Sandoe wrote:
> 
> 
>> On 22 Jul 2021, at 20:41, Andrew Pinski via Gcc  wrote:
>>
>> On Thu, Jul 22, 2021 at 7:37 AM Marc  wrote:
>>>
> 
>>> I have a gcc plugin (for afl++,
>>> https://github.com/AFLplusplus/AFLplusplus) that works fine when
>>> compiled on Linux but when compiled on MacOS (brew install gcc) it fails:
>>>
>>> ~/afl++ $ g++-11 -g -fPIC -std=c++11
>>> -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin/include
>>> -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin
>>> -I/usr/local//Cellar/gmp/6.2.1/include -shared
>>> instrumentation/afl-gcc-pass.so.cc -o afl-gcc-pass.so
>>
>> A few things, You are not building the plugin with the correct options
>> for darwin.
>> Basically you need to allow undefined references
> 
> -Wl, -undefined, dynamic_lookup 
> 
> but if you expect those to bind to the main exe (e.g. cc1plus) at runtime, 
> then you will need to build that with dynamic export. (-export_dynamic)
> 
> These things will *not* transfer to arm64 macOS and they will probably 
> produce build warnings from newer linkers.
> 
> ===
> 
> I suspect that we will need to find a different recipe for that case 
> (possibly using the main exe as a "link library" on the plugin link line, I 
> guess).
> 
>> and then also use
>> dylib as the extension.
> 
> That’s a convention for shared libs but it won’t stop a plugin working (in 
> fact things like python use .so on macOS)
> 
>  for pluign modules, (e.g. Frameworks) even omitting the extension completely 
> has been done.
> 
> (so this is not the source of the problem)
> 
>> A few other things too.  I always forgot the exact options to use on
>> Darwin really.  GNU libtool can help with that.
> 
> perhaps, but I am not sure it’s maintained agressively .. so make sure to 
> check what you find is up to date.
> 
> cheers,
> Iain.
> 

-- 
Marc Heuse
www.mh-sec.de

PGP: AF3D 1D4C D810 F0BB 977D  3807 C7EE D0A0 6BE9 F573


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>I don't think doing any of both is a good idea.  Authors of the affected
>programs should adjust their makefiles instead - after all, the much more
>often reported problems are with -fstrict-aliasing, and this one also doesn't
>get any special treatment by autoconf.  Even though -fno-strict-aliasing 
>-fwrapv
>would be a valid, more forgiving default.  Also as ever, -O2 is what get's
>the most testing, so you are going to more likely run into compiler bugs
>with -fwrapv.

As a measure point, in the OpenBSD project, we have disabled
-fstrict-aliasing by default. The documentation to gcc local to our
systems duly notes this departure from the canonical release. We expect
to keep it that way about forever.

If/when we update to a version wher -fwrapv becomes an issue, we'll
probably do the same with it.

Specifically, because we value reliability over speed and strict
standard conformance...


Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Marc Espie
On Fri, Dec 29, 2006 at 06:46:09PM -0500, Richard Kenner wrote:
> > Specifically, because we value reliability over speed and strict
> > standard conformance...

> Seems to me that programs that strictly meet the standard of the language
> they are written in would be more reliable than programs that are written
> in some ill-defined language.

C has been a portable assembler for years before it got normalized and
optimizing compilers took over. There are still some major parts of the
network stack where you don't want to look, and that defy
-fstrict-aliasing

A lot of C programmers don't really understand aliasing rules. If this
wasn't deemed to be a problem, no-one would have even thought of adding
code to gcc so that i can warn about some aliasing violations. ;-)

If you feel like fixing this code, be my guest.


Re: We're out of tree codes; now what?

2007-03-23 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>On 19 Mar 2007 19:12:35 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote:
>> similar justifications for yet another small% of slowdown have been
>> given routinely for over 5 years now.  small% build up; and when they
>> build up, they don't not to be convincing ;-)
>
>But what is the solution? We can complain about performance all we
>want (and we all love to do this), but without a plan to fix it we're
>just wasting effort. Shall we reject every patch that causes a slow
>down? Hold up releases if they are slower than their predecessors?
>Stop work on extensions, optimizations, and bug fixes until we get our
>compile-time performance back to some predetermined level?

Simple sociology.

Working on new optimizations = sexy.
Trimming down excess weight = unsexy.

GCC being vastly a volunteer project, it's much easier to find people
who want to work on their pet project, and implement a recent
optimization they found in a nice paper (that will gain 0.5% in some
awkward case) than to try to track down speed-downs and to reverse them.

I remember back then when people started converting gcc from RTL to ssa,
I was truely excited. Finally doing the right thing, compile times 
are going to get back down where they belong.

And then disappointment, as the ssa stuff just got added on top of the
RTL stuff, and the RTL stuff that was supposed to vanish takes forever
to go away...

Parts of GCC are over-engineered. I used to be able to read the 
__attribute__ stuff, then it got refactored, and the new code looks like 
it's going to be about 3 or 4 times slower than it was.

At some point, it's going to be really attractive to start again from
scratch, without all the backends/frontend complexities and interactions
that make cleaning up stuff harder and harder...

Also, I have the feeling that quite a few of gcc sponsors are in it for
the publicity mostly (oh look, we're nice people giving money to gcc),
and new optimization passes that get 0.02% out of SPEC are better bang
for their money.

Kuddoes go to the people who actually manage to reverse some of the
excesses of the new passes.


Re: We're out of tree codes; now what?

2007-03-23 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>On Mar 20, 2007, at 11:23 PM, Alexandre Oliva wrote:
>> As for configure scripts...  autoconf -j is long overdue ;-)

>Is that the option to compile autoconf stuff into fast running  
>efficient code?  :-)

>But seriously, I think we need to press autoconf into generating 100x  
>faster code 90% of the time.  Maybe prebundling answers for the  
>common targets...

Doesn't win all that much.

Over in OpenBSD, we tried to speed up the build of various software by
using a site cache for most of the system stuff.  Makes some configure
scripts go marginally faster, but you won't gain back more than about 5%.

Due to autoconf design, there are lots of things you actually cannot
cache, because they interact in weird ways, and are used in strange

If you want to speed up autoconf, it needs to have some smarts about
what's going on. Not a huge list of generated variables, but what those
variables *mean*, semantically, and which value is linked to what
configuration option. You would then be able to avoid recomputing a lot
of things (and you would also probably be able to compress information
a lot... when I look at the list of configured stuff, there are so many
duplicates it's scary).  Autoconf also needs an actual database of
tests, so that people don't reinvent a square wheel all the time.

This would also solve the second autoconf plague: it's not only slow as
molasses, but it also `auto-detects' stuff when you don't want it to,
which leads to hard to reproduce builds, unless you start with an empty
machine every time (which has its own performance issue).

Even if it has lots of shortcomings still (large pile of C++ code to
compile first), I believe a replacement like cmake shows a lot of
promise there...

In my opinion, after spending years *fighting* configure issues in
making programs compile correctly under OpenBSD, I believe the actual
database of tests is the only thing worth saving in autoconf.
I don't know what the actual `good design' would be, but I'm convinced
using m4 as a lisp interpreter to generate shell scripts is a really bad
idea.


Re: Integer overflow in operator new

2007-04-08 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>On Fri, Apr 06, 2007 at 06:51:24PM -0500, Gabriel Dos Reis wrote:
>> David Daney <[EMAIL PROTECTED]> writes:
>> 
>> | One could argue that issuing some type of diagnostic (either at
>> | compile time or run time) would be helpful for people that don't
>> | remember to write correct code 100% of the time.
>> 
>> I raised this very issue a long time ago; a long-term GCC contributor
>> vocally opposed checking the possible overflow.  I hope something will
>> happen this time.
>
>I don't like slowing programs down, but I like security holes even less.
>
>If a check were to be implemented, the right thing to do would be to throw
>bad_alloc (for the default new) or return 0 (for the nothrow new).  If a
>process's virtual memory limit is 400M, we throw bad_alloc if we do new
>int[200], so we might as well do it if we do new int[20].
>There would be no reason to use a different reporting mechanism.
>
>There might be rare cases where the penalty for this check could have
>an impact, like for pool allocators that are otherwise very cheap.
>If so, there could be a flag to suppress the check.
>


Considering the issue is only for new [], I'd assume anyone who wants
less correct, but faster behavior would simply handle the computation
themselves, and deal with the overflow manually, then override whatever
operator/class they need to make things work.


Re: Integer overflow in operator new

2007-04-08 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>The assert should not overflow.  I suggest
>
>#include 
>#include 
>assert( n < SIZE_MAX / sizeof(int) );
>
>which requires two pieces of information that the programmer
>otherwise wouldn't need, SIZE_MAX and sizeof(type).
>
>Asking programmers to write extra code for rare events, has
>not been very successful.  It would be better if the compiler
>incorporated this check into operator new, though throwing
>an exception rather than asserting.  The compiler should be
>able to eliminate many of the conditionals.

The compiler and its runtime should be correct, and programmers should be 
able to depend on it.

When you read the documentation for new or calloc, there is no
mention of integer overflow, it is not expected that the programmer
has to know about that.

Adding an extra test in user code even  less sense than checking
that pointers are not null before calling free...


Re: GCC 4.1 Projects

2005-02-28 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:

>People do break Ada bootstrap because they don't configure and test Ada,
>they don't configure Ada because they complained about Ada build
>machinery being non standard, delaying Ada build machinery changes will
>only make things worse for Ada bootstrap statistics.

Keep in mind that ada testing *needs a working ada compiler* in the
first place.

If you don't have one, this can be rather painful. You need access to a
platform with a working ada compiler in order to build a cross-compiler,
and you may have to figure out all kinds of not so nice stuff about
cross-compilation.

Been there, done that for i386-OpenBSD. Probably going to do it for
sparc64 at some point in time, as well, but this is not exactly my cup
of tea...


Re: Extension compatibility policy

2005-02-28 Thread Marc Espie
On Mon, Feb 28, 2005 at 09:24:20AM -0500, Robert Dewar wrote:
> Not quite, Marc is suggesting that -pedantic be the default if I read
> the above statement correctly.

Yep.

Except it's probably too late for that, and there is stuff in -pedantic
that is downright obnoxious (because every C compiler I know does it)
and err... really pedantic, as opposed to actual warnings that help
finding out about obscure extensions.

In my opinion, this is just a case of a very bad design decision that
was taken years ago.

It took me years to grow a firm opinion about it too.

The basic issue I'm talking about is failure modes for software. There
are a few interesting error categories.
- stuff that is an error but that the program can recover from.
- stuff that is not really an error, but that the program can't find out
about.

The first class is interesting because it's a class we should not recover
from, ever, for programming tools: it causes all sorts of grief all the
time later down the line. Why ? because it's an ERROR, so it's not
actually specified formally, and recovering from it gracefully muddles
the semantics. Because some errors will be recovered, and some will not.
And this might change from release to release, allowing half-broken
software to grow and develop.

For instance, the extern/static inline stuff in gcc falls under that
line, in my book.  GCC was not designed to put any hard checks that the
inlined functions were also linked in when building the final executable,
and so people like Torvalds complained when a later version of gcc
did no longer inline the function and could not find it in a library.
At the time the complaint came up, I came on the side of the GCC
developers, that the extra feature was misused by linux developers...
Now, I'm not so sure. I think that there was a design step missed along
the guidelines of not allowing erroneous software to build.

The second class is interesting because it comes up all the time with
-Wall -Werror.   All the `variable not initialized stuff' (that one is
obvious). To a lesser extent, all the `comparison is always true due to
limited range of data type'.  Those warnings actually occur all the time
in portable code, and are very hard to get rid of (and it's probably not
a good idea to clean them all up). This makes -Wall -Werror much less
useful than it could be.

Forgive me if I'm reinventing the wheel (partly), but more and more, it
seems to me that there's a category of warnings missing: the stuff that
the compiler is sure about, and that cannot come from portability issues.
Say, the -Wsurething warnings.  If we could find reasonable semantics
for these (along with a -Wunreasonable-extension switch), then maybe we
would have something that -Werror could use.

As far as practical experience goes, I've spent enough time dealing
with OpenBSD kernel compilation (which does -Wall -Werror, btw) and with
cleaning up various old C sources (which invariably start with a
combination of warning switches, and then continues by reading pages of
inappropriate warnings to find the right ones) to be fairly certain
these kind of diagnostics could be useful...

Oh yes, and the change from the old preprocessor to the new and improved
cpplib took quite a long time to recover from too... You wouldn't believe
how many people misuse token pasting all the time. But I put the effort
because I think that's a good change: it takes unambiguously wrong code
out in the backyard and shoots it dead.


Re: Questions about trampolines

2005-03-14 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>Well as I said above, trampolines or an equivalent are currently critically
>needed by some front ends (and of course by anyone using the (very useful IMO)
>extension of nested functions in C).

This is your opinion, but I've yet to find an actual piece of code in a
real project that uses that extension.

On the other hand, non-executable stack is a feature used in the real
world. Quite in common use actually...

Think about that.


Re: Questions about trampolines

2005-03-14 Thread Marc Espie
On Mon, Mar 14, 2005 at 01:25:34PM +, Joseph S. Myers wrote:
> On Mon, 14 Mar 2005, Robert Dewar wrote:
> 
> > I have certainly seen it used, but you may well be right that it is
> > seldom used. It is certainly reasonable to consider removing this
> > extension from C and C++. Anyone using that feature? Or know anyone
> > who is.

> Nested functions are used in the glibc dynamic linker.  I'm not sure why, 
> and they may be inline nested functions whose addresses are never taken.

> The extension is not present in GNU C++, only in GNU C.

Well, Andreas Schwab seems to think this is no longer the case.

I don't want to dive into the glibc mess, thanks god, but if the dynamic
linker is implemented like dynamic linkers I know, it means any binary
using a dynamic linker that uses trampolines will lose any kind of stack
protection on some badly designed architectures, like say, i386...


Re: Questions about trampolines

2005-03-14 Thread Marc Espie
The thing I did for OpenBSD 3.7 is patch the gcc-3.3.x we use:

 -   On OpenBSD, by default, trampoline code generation is disabled in gcc
 3.3.5.  Code requiring trampolines will not compile without
 -ftrampolines.  The warning flag -Wtrampolines can be used to locate
 trampoline instances if trampoline generation is re-enabled.


that way, you still have trampolines available in C if you need them, but you
don't risk compiling dangerous code that disables executable stack protection
by mistake.

It's probably quite trivial to write a similar patch for gcc-current,
assuming you guys think it's the right design decision.

After enabling that patch, we recompiled the whole system, all of X, and the
3000 packages of third party sources.  

-ftrampolines was needed exactly 0 times.


Re: Merging calls to `abort'

2005-03-29 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>GCC's primary purpose is to be the compiler for the GNU system.  It is
>used for many other purposes too, and it is good for GCC to serve more
>purposes, but they're not as important for the GNU Project, even
>though they are all important for some users.

I'm a wee little bit fed up of that argument.
Having the compiler BE the compiler for the GNU system is cool.

But if you still want to have a thriving user community that willingly
contribute to it, just giving lip service to other purposes of GCC
is not always enough, Richard.

Specifically, there are a bunch of people who use GCC as a system
compiler on non GNU systems, that are currently willing to contribute
code, but would jump ship if the tide turns in the direction of a
`GNU system' too much.

Remember emacs vs. xemacs ?
Remember GCC vs. egcs ?

Sorry for making a pest of myself and trolling heavily (well, not that
heavily), but dismissing technical arguments on political grounds
doesn't quite cut it for me (even if you half acknowledge the existence
of other people, mostly to dismiss them off-hand).

And I'm sure there are OTHER people in my own little minority who are
very interested in the slant of your arguments.


Re: Merging calls to `abort'

2005-03-29 Thread Marc Espie
On Tue, Mar 29, 2005 at 09:27:32AM -0800, Joe Buck wrote:
> Or are you just way behind in your reading?
Way behind.

I've read the discussion, I've seen nothing looking like my argument,
so I posted my reply.


Re: GCC 4.1: Buildable on GHz machines only?

2005-05-02 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>The alternative of course is to do only crossbuilds.  Is it reasonable
>to say that, for platforms where a bootstrap is no longer feasible, a
>successful crossbuild is an acceptable test procedure to use instead?

No.

I've been playing enough with crossbuilds to know that a crossbuild will
show you bugs that do not exist in native builds, and VICE-VERSA.

Building a full system natively, compiler-included, is still one of the
best stress-test for an operating system.

This mind frame, that because the compiler is too slow, it's acceptable
to do cross-builds, is killing older systems.  Very quickly, you end up
with fast, robust systems that are heavily tested through the build of
lots of software, and with slow, untested systems that never see a
build, and are only tested casually by people running a handful of
specialized applications on them.

I'm speaking from experience: you wouldn't believe how many bugs we
tracked and fixed in OpenBSD on fringe platforms (arm, sparc64) simply
because we do native builds and see stuff people doing cross-builds
don't see.

This is not even the first time I talk about this on this list.
Except for embedded systems where memory and disk space don't make it
practical to compile anything natively, having a compiler so slow that
it makes it impossible to compile stuff natively kills old platforms.

Do you know why GCC4 is deprecated on sparc-openbsd ? It's simply
because no-one so far has been able to dedicate the CPU time to track
down the few bugs that prevented us from switching to gcc 3.x from 2.95.

That's right, I said CPU-time. It takes too long to bootstrap the compiler,
it takes too long to rebuild the whole system. And thus, it rots.



Re: GCC 4.1: Buildable on GHz machines only?

2005-05-02 Thread Marc Espie
How about replacing that piece of junk that is called libtool with
something else ?

Preferably something that works.

Between it's really poor quoting capabitilities, and the fact that
half the tests are done at configure time, and half the tests are done
at run-time, libtool is really poor engineering.   It's really atrocious
when you see operating systems tests all over the place *in the libtool
script* and not in the configure process in the first place.

Heck, last time I even tried to figure out some specs for libtool
options from  the script, I nearly went mad.

It won't be any use for GCC, but I ought to tell you that the OpenBSD
folks are seriously considering replacing libtool entirely with a
home-made perl script that would ONLY handle libtool stuff on OpenBSD
and nowhere else.  Between the fact that the description is too
low-level (very hard to move libraries around, or -L stuff that pops up
in the wrong order all the time and gets you to link with the wrong
version of the library), and that some of the assertions it makes are
downright bogus (hardcoding -R even when it's not needed, or being real
nosy about the C compiler in the wrong way and assuming that the default
set of libraries without options will be the same one as the set with
-fpic), it's getting to the point where it would be a real gain to just
reverse-engineer its features and rewrite it from scratch.


Re: Sine and Cosine Accuracy

2005-05-29 Thread Marc Espie
Sorry for chiming in after all this time, but I can't let this pass.

Scott, where on earth did you pick up your trig books ?

The mathematical functions sine and cosine are defined everywhere.
There is absolutely 0 identity involving them which doesn't apply all
over the real, or the complex plane.

It's also true for other trigonometric functions, like tangent, with the
obvious caveat that tangent goes to infinity when x-> pi/2 (or any
congruent number, periodically).

The infinite series for sine and cosine even converge all over the
complex plane, since n! >> x^n for a given x, with n big enough (okay,
the actual mathematical argument is a bit more complex, but that's the
idea, n! goes to infinity a heck of a lot faster than x^n).

I'm thinking you're confusing that stuff with either of two things:
- since the trig functions are periodic, the reverse functions are
obviously ambiguous, and you need some external input to solve the
ambiguity. This makes for arbitrary definitions, and lots of fun in
glueing the complex plane back together, and there's no way to avoid
that, since it's the whole basis for the very useful theory of
holomorphic functions and complex integration.  And the math library
usually has an atan2 function to take care of the ambiguity.
- most software implementation of trig functions use approximation
polynomial, usually a variation on Tchebichev polynomials, which
converge much faster than the complete series, but MUST be restricted
to a very small range, since they don't even converge to the right
value outside this range.

Now, the fact is that floating point arithmetic can be real tricky, and
it's often necessary to (gasp) rework the equations and think to get
some correct digits out of ill-applied trigonometric functions.

But I haven't seen it that often in text books outside of specialized
applied maths...


Re: GCC and Floating-Point

2005-05-29 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>  http://csdl.computer.org/dl/mags/co/2005/05/r5091.pdf
>  "An Open Question to Developers of Numerical Software", by
>  W. Kahan and D. Zuras

Doesn't look publically accessible from my machine...


Re: Sine and Cosine Accuracy

2005-05-29 Thread Marc Espie
On Sun, May 29, 2005 at 08:59:00PM +0200, Georg Bauhaus wrote:
> Marc Espie wrote:
> >Sorry for chiming in after all this time, but I can't let this pass.
> >
> >Scott, where on earth did you pick up your trig books ?
> 
> Sorry, too, but why one earth do modern time mathematics scholars
> think that sine and cosine are bound to have to do with an equally
> modern notion of real numbers that clearly exceed what a circle
> has to offer? What is a plain unit circle of a circumference that
> exceeds 2???
> How can a real mathematical circle of the normal kind have
> more than 360 non-fractional sections?
> By "real circle" I mean a thing that is not obfuscated by the useful
> but strange ways in which things are redefined by mathematicians;
> cf. Halmos for some humor.

Err, because it all makes sense ? Because there is no reason to do stuff
from 0 to 360 instead of -180 to 180 ?

> And yes, I know that all the other stuff mentioned in this thread
> explains very well that there exist useful definitions of sine for real
> numbers outside "(co)sine related ranges", and that these definitions
> are frequently used. Still, at what longitude does your your trip around
> the world start in Paris, at 2°20' or at 362°20', if you tell the story
> to a seaman? Cutting a pizza at 2.0^90. Huh?!

At 0.0. Did you know that, before Greenwhich, the meridian for the
origin of longitude was going through Paris ? Your idea would make some
sense if you talked about a latitude (well, even though the notion of
north pole is not THAT easy to define, and neither is the earth round).

Heck, I can plot trajectories on a sphere that do not follow great circles,
and that extend over 360 degrees in longitude.  I don't see why I should be
restricted from doing that.

> Have a look into e.g. "Mathematics for the Million" by Lancelot
> Hogben for an impression of how astounding works of architecture
> have been done without those weird ways of extending angle related
> computations into arbitrarily inflated numbers of which no one knows
> how to distinguish one from the other in sine (what you have dared to call
> "obvious", when it is just one useful convention. Apparently some
> applications derive from different conventions if I understand Scott's
> remarks correctly).

There are some arbitrary convenient definitions in modern mathematics.
The angle units have been chosen so that derivation of sine/cosine is 
obvious.  The definition of sine/cosine extends naturally to the whole
real axis which gives a sense to mechanics, rotation speeds, complex functions
and everything that's been done in mathematics over the last four centuries
or so.

You can decide to restrict this stuff to plain old 2D geometry, and this would
be fine for teaching in elementary school, but this makes absolutely 
no sense with respect to any kind of modern mathematics.

Maybe playing with modern mathematical notions for years has obfuscated
my mind ? or maybe I just find those definitions to be really obvious and
intuitive.   Actually, I would find arbitrary boundaries to be unintuitive.

There is absolutely nothing magical wrt trigonometric functions, if I
compare them to any other kind of floating point arithmetic: as soon as
you try to map `real' numbers into approximations, you have to be VERY wary
if you don't want to lose all precision.  There's nothing special, nor
conventional about sine and cosine.

Again, if you want ARBITRARY conventions, then look at reverse trig functions,
or at logarithms. There you will find arbitrary discontinuities 
that can't be avoided.


bug or not ? ada loop in gcc-4.1-20050528

2005-05-29 Thread Marc Espie
I've got my build on OpenBSD-i386 stuck in a loop compiling
stage2/xgcc -Bstage2/ -B/usr/local/i386-unknown-openbsd3.7/bin/ -c -O2 -g 
-fomit-frame-pointer  -gnatpg -gnata -I- -I. -Iada 
-I/spare/ports/lang/gcc/4.1/w-gcc-4.1-20050528/gcc-4.1-20050528/gcc/ada 
/spare/ports/lang/gcc/4.1/w-gcc-4.1-20050528/gcc-4.1-20050528/gcc/ada/ada.ads 
-o ada/ada.o

I'm using an ada compiler bootstrapped from 3.3.6...

My top says:
31002 espie 84   10   26M 2712K run  -  107:26 96.14% gnat1

so I have little hope this will end.

Does this ring a bell ? I assume some other people may have already run
into that issue on less uncommon platforms, otherwise, I'll investigate...


Re: Sine and Cosine Accuracy

2005-05-30 Thread Marc Espie
On Sun, May 29, 2005 at 05:52:11PM -0400, Scott Robert Ladd wrote:
> (I expect Gabriel dos Rios to respond with something pithy here; please
> don't disappoint me!)

Funny, I don't expect any message from that signature.

Gabriel dos Reis, on the other hand, may have something to say...


Re: Will Apple still support GCC development?

2005-06-06 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>Samuel Smythe wrote:
>> It is well-known that Apple has been a significant provider of GCC
>> enhancements. But it is also probably now well-known that they have
>> opted to drop the PPC architecture in favor of an x86-based
>> architecture. Will Apple continue to contribute to the PPC-related
>> componentry of GCC, or will such contributions be phased out as the
>> transition is made to the x86-based systems? In turn, will Apple be
>> providing more x86-related contributions to GCC?
>
>A better question might be: Has Intel provided Apple with an OS X
>version of their compiler? If so (and I think it very likely), Apple may
>have little incentive for supporting GCC, given how well Intel's
>compilers perform.

Oh sure, and Intel as an Obj-C++ compiler up their sleeve... right.

Speculations, speculations. Wait and see...


Re: signed is undefined and has been since 1992 (in GCC)

2005-07-14 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>Both OpenSSL and Apache programmers did this, in carefully reviewed
>code which was written in response to a security report.  They simply
>didn't know that there is a potential problem.  The reason for this
>gap in knowledge isn't quite clear to me.

Well, it's reasonably clear to me.

I've been reviewing code for the OpenBSD project, it's incredible the
number of errors you can find in code which is supposed to
- have been written by competent programmers;
- have been reviewed by tens of people.

Quite simply, formal code reviews in free software don't work. The `many
eyes' paradigm is a fallacy. Ten persons can look at the same code and
fail to notice a problem if they don't look for the right thing.

A lot of people don't even think about overflows when they look at
arithmetic, there are a lot of integer overflows out there.

I still routinely find off-by-one accesses in buffers, some of them
quite obvious. The only reasons I see them is because my malloc can put
allocations on page boundaries, and thus the program barfs here, and not
on other machines.

A lot of people don't know about the peculiarities of C signed
arithmetic.

A lot of `portable' code that uses C arithmetic buries such
peculiarities under tons of macros and typedefs such that it is really
hard to figure out what's going on even if you understand the issues.
>From past experience, both Apache and OpenSSL are very bad in that
regards.

Bottom-line is, if it passes tests on major architectures and major
OSes, it's very unlikely that someone will notice something is amiss,
and that the same someone will have the knowledge to fix it. If it
passes all practical tests, but is incorrect, from a language point of
view, it is even more unlikely.


Re: Warning C vs C++

2005-09-24 Thread Marc Espie
In article <[EMAIL PROTECTED]> you write:
>On Saturday 17 September 2005 17:45, you wrote:
>> That's a real misunderstanding. There are many warnings that are very
>> specialized, and if -Wall really turned on all warnings, it would be
>> essentially useless. The idea behind -Wall is that it represents a
>> comprehensive set of warnings that most/many programmers can live
>> with. To turn on all warnings would be the usability faux pas.
>Ok, sure. This option is also used by many developers to see all possible 
>problems in their code. And btw, signed/unsigned isn't a minor problem. 
>Majority of code giving such warning is exploitable (in the black-hackish 
>terms). 
>I am developer myself, but just using gcc, hence my user's opinion.

Typical black-hat attitude.  Band-aid problems instead of writing
correct code. Guessing at compiler's behavior instead of reading the
specs and writing robust portable code.

That's the big reason there are lots of security holes all over the
place. People keep guessing and learning by trial and error.



Re: [RFC] add push/pop pragma to control the scope of "using"

2020-01-15 Thread Marc Glisse

On Wed, 15 Jan 2020, 马江 wrote:


Hello,
 After  some google, I find there is no way to control the scope of
"using" for the moment.  This seems strange as we definitely need this
feature especially when writing inline member functions in c++
headers.

 Currently I am trying to build a simple class in a c++ header file
as following:

#include 
using namespace std;
class mytest
{
 string test_name;
 int test_val;
public:
 inline string & get_name () {return test_name;}
};


Why is mytest in the global namespace?


 As a experienced  C coder, I know that inline functions must be put
into headers or else users could only rely on LTO. And I know that to
use "using" in a header file is a bad idea as it might silently change
meanings of other codes. However, after I put all my inline functions
into the header file, I found I must write many "std::string" instead
of "string" which is totally a torture.
 Can we add something like "#pragma push_using"  (just like #pragma
pop_macro)? I believe it's feasible and probably not hard to
implement.


We try to avoid extensions in gcc, you may want to propose this to the C++ 
standard committee first. However, you should first check if modules 
(C++20) affect the issue.


--
Marc Glisse


Re: How to get the data dependency of GIMPLE variables?

2020-06-14 Thread Marc Glisse

On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote:


I am trying to analyze the following gimple statements, where the data
dependency of _23 is a tree, whose leave nodes are three constant values
{13, 4, 14}.

Could anyone shed some light on how such a backward traversal can be
implemented? Given _22 used in the last assignment, I have no idea of how
to trace back to its definition on the fourth statement... Thank you
very much!


SSA_NAME_DEF_STMT


_13 = 13;
_14 = _13 + 4;
_15 = 14;
_22 = (unsigned long) _15;
_23 = _22 + _14;


--
Marc Glisse


Re: How to get the data dependency of GIMPLE variables?

2020-06-15 Thread Marc Glisse

On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote:


Dear Marc,

Thank you very much! Just another quick question.. Can I iterate the
operands of a GIMPLE statement, like how I iterate a LLVM instruction in
the following way?

   Instruction* instr;
   for (size_t i=0; i< instr->getNumOperands();i++) {
instr->getOperand(i))
   }

Sorry for such naive questions.. I actually searched the documents and
GIMPLE pretty print for a while but couldn't find such a way of accessing
arbitrary numbers of operands...


https://gcc.gnu.org/onlinedocs/gccint/GIMPLE_005fASSIGN.html
or for lower level
https://gcc.gnu.org/onlinedocs/gccint/Logical-Operators.html#Operand-vector-allocation

But really you need to look at the code of gcc. Search for places that use 
SSA_NAME_DEF_STMT and see what they do with the result.


--
Marc Glisse


Re: Local optimization options

2020-07-05 Thread Marc Glisse

On Sun, 5 Jul 2020, Thomas König wrote:




Am 04.07.2020 um 19:11 schrieb Richard Biener :

On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König"  wrote:


What could be a preferred way to achieve that? Could optimization
options like -ffast-math be applied to blocks instead of functions?
Could we set flags on the TREE codes to allow certain optinizations?
Other things?


The middle end can handle those things on function granularity only.

Richard.


OK, so that will not work (or not without a disproportionate
amount of effort).  Would it be possible to set something like a
TREE_FAST_MATH flag on TREEs? An operation could then be
optimized according to these rules iff both operands
had that flag, and would also have it then.


In order to support various semantics on floating point operations, I was 
planning to replace some trees with internal functions, with an extra 
operand to specify various behaviors (rounding, exception, etc). Although 
at least in the beginning, I was thinking of only using those functions in 
safe mode, to avoid perf regressions.


https://gcc.gnu.org/pipermail/gcc-patches/2019-August/527040.html

This may never happen now, but it sounds similar to setting flags like 
TREE_FAST_MATH that you are suggesting. I was going with functions for 
more flexibility, and to avoid all the existing assumptions about trees. 
While I guess for fast-math, the worst the assumptions could do is clear 
the flag, which would make use optimize less than possible, not so bad.


--
Marc Glisse


Re: [RFC] Add new flag to specify output constraint in match.pd

2020-08-23 Thread Marc Glisse

On Fri, 21 Aug 2020, Feng Xue OS via Gcc wrote:


 There is a match-folding issue derived from pr94234.  A piece of code like:

 int foo (int n)
 {
int t1 = 8 * n;
int t2 = 8 * (n - 1);

return t1 - t2;
 }

It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and
be folded to constant "8". But this folding will fail if both v1 and v2 have
multiple uses, as the following code.

 int foo (int n)
 {
int t1 = 8 * n;
int t2 = 8 * (n - 1);

use_fn (t1, t2);
return t1 - t2;
 }

Given an expression with non-single-use operands, folding it will introduce
duplicated computation in most situations, and is deemed to be unprofitable.
But it is always beneficial if final result is a constant or existing SSA value.

And the rule is:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
/* If @1 +- @2 is constant require a hard single-use on either
   original operand (but not on both).  */
&& (single_use (@3) || single_use (@4)))   <- control whether match 
or not
   (mult (plusminus @1 @2) @0)))

Current matcher only provides a way to check something before folding,
but no mechanism to affect decision after folding. If has, for the above
case, we can let it go when we find result is a constant.


:s already has a counter-measure where it still folds if the output is at 
most one operation. So this transformation has a counter-counter-measure 
of checking single_use explicitly. And now we want a counter^3-measure...



Like the way to describe input operand using flags, we could also add
a new flag to specify this kind of constraint on output that we expect
it is a simple gimple value.

Proposed syntax is

 (opcode:v{ condition } )

The char "v" stands for gimple value, if more descriptive, other char is
preferred. "condition" enclosed by { } is an optional c-syntax condition
expression. If present, only when "condition" is met, matcher will check
whether folding result is a gimple value using
gimple_simplified_result_is_gimple_val ().

Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
not GENERIC-match.

With this syntax, the rule is changed to

#Form 1:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
  ( if (!single_use (@3) && !single_use (@4))
 (mult:v (plusminus @1 @2) @0)))
 (mult (plusminus @1 @2) @0)


That seems to match what you can do with '!' now (that's very recent).


#Form 2:
 (simplify
  (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
  (if ((!ANY_INTEGRAL_TYPE_P (type)
 || TYPE_OVERFLOW_WRAPS (type)
 || (INTEGRAL_TYPE_P (type)
 && tree_expr_nonzero_p (@0)
 && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
 (mult:v{ !single_use (@3) && !single_use (@4 } (plusminus @1 @2) @0


Indeed, something more flexible than '!' would be nice, but I am not so 
sure about this version. If we are going to allow inserting code after 
resimplification and before validation, maybe we should go even further 
and let people insert arbitrary code there...


--
Marc Glisse


Re: [RFC] Add new flag to specify output constraint in match.pd

2020-09-02 Thread Marc Glisse

On Wed, 2 Sep 2020, Richard Biener via Gcc wrote:


On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc  wrote:



  There is a match-folding issue derived from pr94234.  A piece of code like:

  int foo (int n)
  {
 int t1 = 8 * n;
 int t2 = 8 * (n - 1);

 return t1 - t2;
  }

 It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", 
and
 be folded to constant "8". But this folding will fail if both v1 and v2 have
 multiple uses, as the following code.

  int foo (int n)
  {
 int t1 = 8 * n;
 int t2 = 8 * (n - 1);

 use_fn (t1, t2);
 return t1 - t2;
  }

 Given an expression with non-single-use operands, folding it will introduce
 duplicated computation in most situations, and is deemed to be unprofitable.
 But it is always beneficial if final result is a constant or existing SSA 
value.

 And the rule is:
  (simplify
   (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
   (if ((!ANY_INTEGRAL_TYPE_P (type)
|| TYPE_OVERFLOW_WRAPS (type)
|| (INTEGRAL_TYPE_P (type)
&& tree_expr_nonzero_p (@0)
&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
   /* If @1 +- @2 is constant require a hard single-use on either
  original operand (but not on both).  */
   && (single_use (@3) || single_use (@4)))   <- control whether match 
or not
(mult (plusminus @1 @2) @0)))

 Current matcher only provides a way to check something before folding,
 but no mechanism to affect decision after folding. If has, for the above
 case, we can let it go when we find result is a constant.


:s already has a counter-measure where it still folds if the output is at
most one operation. So this transformation has a counter-counter-measure
of checking single_use explicitly. And now we want a counter^3-measure...


Counter-measure is key factor to matching-cost.  ":s" seems to be somewhat
coarse-grained. And here we do need more control over it.

But ideally, we could decouple these counter-measures from definitions of
match-rule, and let gimple-matcher get a more reasonable match-or-not
decision based on these counters. Anyway, it is another story.


 Like the way to describe input operand using flags, we could also add
 a new flag to specify this kind of constraint on output that we expect
 it is a simple gimple value.

 Proposed syntax is

  (opcode:v{ condition } )

 The char "v" stands for gimple value, if more descriptive, other char is
 preferred. "condition" enclosed by { } is an optional c-syntax condition
 expression. If present, only when "condition" is met, matcher will check
 whether folding result is a gimple value using
 gimple_simplified_result_is_gimple_val ().

 Since there is no SSA concept in GENERIC, this is only for GIMPLE-match,
 not GENERIC-match.

 With this syntax, the rule is changed to

 #Form 1:
  (simplify
   (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2))
   (if ((!ANY_INTEGRAL_TYPE_P (type)
|| TYPE_OVERFLOW_WRAPS (type)
|| (INTEGRAL_TYPE_P (type)
&& tree_expr_nonzero_p (@0)
&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type))
   ( if (!single_use (@3) && !single_use (@4))
  (mult:v (plusminus @1 @2) @0)))
  (mult (plusminus @1 @2) @0)


That seems to match what you can do with '!' now (that's very recent).


It's also what :s does but a slight bit more "local".  When any operand is
marked :s and it has more than a single-use we only allow simplifications
that do not require insertion of extra stmts.  So basically the above pattern
doesn't behave any different than if you omit your :v.  Only if you'd
place :v on an inner expression there would be a difference.  Correlating
the inner expression we'd not want to insert new expressions for with
a specific :s (or multiple ones) would be a more natural extension of what
:s provides.

Thus, for the above case (Form 1), you do not need :v at all and :s works.


Let's consider that multiplication is expensive. We have code like 
5*X-3*X, which can be simplified to 2*X. However, if both 5*X and 3*X have 
other uses, that would increase the number of multiplications. :s would 
not block a simplification to 2*X, which is a single stmt. So the existing 
transformation has extra explicit checks for single_use. And those extra 
checks block the transformation even for 5*X-4*X -> X which does not 
increase the number of multiplications. Which is where '!' (or :v here) 
comes in.


Or we could decide that the extra multiplication is not that bad if it 
saves an addition, simplifies the expression, possibly gains more insn 
parallelism, etc, in which case we could just drop the existing hard 
single_use check...


--
Marc Glisse


Re: A couple GIMPLE questions

2020-09-05 Thread Marc Glisse

On Sat, 5 Sep 2020, Gary Oblock via Gcc wrote:


First off one of the questions just me being curious but
second is quite serious. Note, this is GIMPLE coming
into my optimization and not something I've modified.

Here's the C code:

type_t *
do_comp( type_t *data, size_t len)
{
 type_t *res;
 type_t *x = min_of_x( data, len);
 type_t *y = max_of_y( data, len);

 res = y;
 if ( x < y ) res = 0;
 return res;
}

And here's the resulting GIMPLE:

;; Function do_comp.constprop (do_comp.constprop.0, funcdef_no=5, 
decl_uid=4392, cgraph_uid=3, symbol_order=68) (executed once)

do_comp.constprop (struct type_t * data)
{
 struct type_t * res;
 struct type_t * x;
 struct type_t * y;
 size_t len;

  [local count: 1073741824]:

  [local count: 1073741824]:
 x_2 = min_of_x (data_1(D), 1);
 y_3 = max_of_y (data_1(D), 1);
 if (x_2 < y_3)
   goto ; [29.00%]
 else
   goto ; [71.00%]

  [local count: 311385128]:

  [local count: 1073741824]:
 # res_4 = PHI 
 return res_4;

}

The silly question first. In the "if" stmt how does GCC
get those probabilities? Which it shows as 29.00% and
71.00%. I believe they should both be 50.00%.


See the profile_estimate pass dump. One branch makes the function return 
NULL, which makes gcc guess that it may be a bit less likely than the 
other. Those are heuristics, which are tuned to help on average, but of 
course they are sometimes wrong.



The serious question is what is going on with this phi?
   res_4 = PHI 

This makes zero sense practicality wise to me and how is
it supposed to be recognized and used? Note, I really do
need to transform the "0B" into something else for my
structure reorganization optimization.


That's not a question? Are you asking why PHIs exist at all? They are the 
standard way to represent merging in SSA representations. You can iterate 
on the PHIs of a basic block, etc.



CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for 
the sole use of the intended recipient(s) and contains information that is 
confidential and proprietary to Ampere Computing or its subsidiaries. It is to 
be used solely for the purpose of furthering the parties' business 
relationship. Any unauthorized review, copying, or distribution of this email 
(or any attachments thereto) is strictly prohibited. If you are not the 
intended recipient, please contact the sender immediately and permanently 
delete the original and any copies of this email and any attachments thereto.


Could you please get rid of this when posting on public mailing lists?

--
Marc Glisse


Re: Installing a generated header file

2020-11-12 Thread Marc Glisse

On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote:

Hi!  I'm working on a project where it's desirable to generate a 
target-specific header file while building GCC, and install it with the 
rest of the target-specific headers (i.e., in 
lib/gcc//11.0.0/include).  Today it appears that only those 
headers listed in "extra_headers" in config.gcc will be placed there, 
and those are assumed to be found in gcc/config/.  In my case, 
the header file will end up in my build directory instead.


Questions:

* Has anyone tried something like this before?  I didn't find anything.
* If so, can you please point me to an example?
* Otherwise, I'd be interested in advice about providing new infrastructure to 
support
 this.  I'm a relative noob with respect to the configury code, and I'm sure my
 initial instincts will be wrong. :)


Does the i386 mm_malloc.h file match your scenario?

--
Marc Glisse


Re: Reassociation and trapping operations

2020-11-24 Thread Marc Glisse

On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote:


I have a C floating point comparison (a <= b && a >= b), which
test_for_singularity turns into (a <= b && a == b) and vectorizer turns
into ((a <= b) & (a == b)).  So far so good.

eliminate_redundant_comparison, however, turns it into just (a == b).
I don't think this is correct, because (a <= b) traps and (a == b)
doesn't.



Hello,

let me just mention the old
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806

There has been some debate about the exact meaning of -ftrapping-math, but 
don't let that stop you.


--
Marc Glisse


Re: The conditions when convert from double to float is permitted?

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Xionghu Luo via Gcc wrote:


I have a maybe silly question about whether there is any *standard*
or *options* (like -ffast-math) for GCC that allow double to float
demotion optimization?  For example,

1) from PR22326:

#include 

float foo(float f, float x, float y) {
return (fabs(f)*x+y);
}

The fabs will return double result but it could be demoted to float
actually since the function returns float finally.


With fp-contract, this is (float)fma((double)f,(double)x,(double)y). This 
could almost be transformed into fmaf(f,x,y), except that the double 
rounding may not be strictly equivalent. Still, that seems like it would 
be no problem with -funsafe-math-optimizations, just like turning 
(float)((double)x*(double)y) into x*y, as long as it is a single operation 
with casts on all inputs and output. Whether there are cases that can be 
optimized without -funsafe-math-optimizations is harder to tell.


--
Marc Glisse


Re: Integer division on x86 -m32

2020-12-10 Thread Marc Glisse

On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote:


when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to
__divdi3 is always output, even though it seems the use of the idiv
instruction could be faster.


IIRC, idiv requires that the quotient fit in 32 bits, while your C code 
doesn't. (1LL << 60) / 3 would cause an error with idiv.


It would be possible to use idiv in some cases, if the compiler can prove 
that variables are in the right range, but that's not so easy. You can use 
inline asm to force the use of idiv if you know it is safe for your case, 
the most common being modular arithmetic: if you know that uint32_t a, b, 
c, d are smaller than m (and m!=0), you can compute a*b+c+d in uint64_t, 
then use div to compute that modulo m.


--
Marc Glisse


Re: What is the type of vector signed + vector unsigned?

2020-12-29 Thread Marc Glisse

On Tue, 29 Dec 2020, Richard Sandiford via Gcc wrote:


Any thoughts on what f should return in the following testcase, given the
usual GNU behaviour of treating signed >> as arithmetic shift right?

   typedef int vs4 __attribute__((vector_size(16)));
   typedef unsigned int vu4 __attribute__((vector_size(16)));
   int
   f (void)
   {
 vs4 x = { -1, -1, -1, -1 };
 vu4 y = { 0, 0, 0, 0 };
 return ((x + y) >> 1)[0];
   }

The C frontend takes the type of x+y from the first operand, so x+y
is signed and f returns -1.


Symmetry is an important property of addition in C/C++.


The C++ frontend applies similar rules to x+y as it would to scalars,
with unsigned T having a higher rank than signed T, so x+y is unsigned
and f returns 0x7fff.


That looks like the most natural choice.


FWIW, Clang treats x+y as signed, so f returns -1 for both C and C++.


I think clang follows gcc and uses the type of the first operand.

--
Marc Glisse


Re: bug in DSE?

2021-02-12 Thread Marc Glisse

On Fri, 12 Feb 2021, Andrew MacLeod via Gcc wrote:

I dont't want to immediately open a PR,  so I'll just ask about 
testsuite/gcc.dg/pr83609.c.


the compilation string  is
  -O2 -fno-tree-forwprop -fno-tree-ccp -fno-tree-fre -fno-tree-pre 
-fno-code-hoisting


Which passes as is.

if I however add   -fno-tree-vrp   as well, then it looks like dead store 
maybe does something wong...


with EVRP running, we translate function foo() from


complex float foo ()
{
  complex float c;
  complex float * c.0_1;
  complex float _4;

   :
  c.0_1 = &c;
  MEM[(long long unsigned int *)c.0_1] = 1311768467463790320;
  _4 = c;


Isn't that a clear violation of strict aliasing?

--
Marc Glisse


Re: Possible issue with ARC gcc 4.8

2015-07-05 Thread Marc Glisse

On Mon, 6 Jul 2015, Vineet Gupta wrote:


It is the C language standard that says that shifts like this invoke
undefined behavior.


Right, but the compiler is a program nevertheless and it knows what to do when 
it
sees 1 << 62
It's not like there is an uninitialized variable or something which will provide
unexpected behaviour.
More importantly, the question is can ports define a specific behaviour for such
cases and whether that would be sufficient to guarantee the semantics.

The point being ARC ISA provides a neat feature where core only considers lower 
5
bits of bitpos operands. Thus we can make such behaviour not only deterministic 
in
the context of ARC, but also optimal, eliding the need for doing specific
masking/clamping to 5 bits.


IMO, writing a << (b & 31) instead of a << b has only advantages. It 
documents the behavior you are expecting. It makes the code 
standard-conformant and portable. And the back-ends can provide patterns 
for exactly this so they generate a single insn (the same as for a << b).


When I see x << 1024, 0 is the only value that makes sense to me, and I'd 
much rather get undefined behavior (detected by sanitizers) than silently 
get 'x' back.


--
Marc Glisse


Re: [RFH] Move some flag_unsafe_math_optimizations using simplify and match

2015-08-11 Thread Marc Glisse

On Fri, 7 Aug 2015, Hurugalawadi, Naveen wrote:


Please find attached the patch "simplify-1.patch" that moves some
"flag_unsafe_math_optimizations" from fold-const.c to simplify and match.


Some random comments (not a review).

First, patches go to gcc-patc...@gcc.gnu.org.


 /* fold_builtin_logarithm */
 (if (flag_unsafe_math_optimizations)


Please indent everything below by one space.


+
+/* Simplify sqrt(x) * sqrt(x) -> x.  */
+(simplify
+ (mult:c (SQRT @0) (SQRT @0))


(mult (SQRT@1 @0) @1)


+ (if (!HONOR_SNANS (element_mode (type)))


You don't need element_mode here, HONOR_SNANS (type) should do the right
thing.


+  @0))
+
+/* Simplify root(x) * root(y) -> root(x*y).  */
+/* FIXME : cbrt ICE's with AArch64.  */
+(for root (SQRT CBRT)


Indent below.


+(simplify
+ (mult:c (root @0) (root @1))


No need to commute, it yields the same pattern. On the other hand, you
may want root:s since if the roots are going to be computed anyway, a
multiplication is cheaper than computing yet another root (I didn't
check what the existing code does).
(this applies to several other patterns)


+  (root (mult @0 @1
+
+/* Simplify expN(x) * expN(y) -> expN(x+y). */
+(for exps (EXP EXP2)
+/* FIXME : exp2 ICE's with AArch64.  */
+(simplify
+ (mult:c (exps @0) (exps @1))
+  (exps (plus @0 @1


I am wondering if we should handle mixed operations (say
expf(x)*exp2(y)), for this pattern and others, but that's not a
prerequisite.


+
+/* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */
+(simplify
+ (mult:c (POW @0 @1) (POW @0 @2))
+  (POW @0 (plus @1 @2)))
+
+/* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */
+(simplify
+ (mult:c (POW @0 @1) (POW @2 @1))
+  (POW (mult @0 @2) @1))
+
+/* Simplify tan(x) * cos(x) -> sin(x). */
+(simplify
+ (mult:c (TAN @0) (COS @0))
+  (SIN @0))


Since this will only trigger for the same version of cos and tan (say cosl 
with tanl or cosf with tanf), I am wondering if we get smaller code with a 
linear 'for' or with a quadratic 'for' which shares the same tail (I 
assume the above is quadratic, I did not check). This may depend on 
Richard's latest patches.



+
+/* Simplify x * pow(x,c) -> pow(x,c+1). */
+(simplify
+ (mult:c @0 (POW @0 @1))
+ (if (TREE_CODE (@1) == REAL_CST
+  && !TREE_OVERFLOW (@1))
+  (POW @0 (plus @1 { build_one_cst (type); }
+
+/* Simplify sin(x) / cos(x) -> tan(x). */
+(simplify
+ (rdiv (SIN @0) (COS @0))
+  (TAN @0))
+
+/* Simplify cos(x) / sin(x) -> 1 / tan(x). */
+(simplify
+ (rdiv (COS @0) (SIN @0))
+  (rdiv { build_one_cst (type); } (TAN @0)))
+
+/* Simplify sin(x) / tan(x) -> cos(x). */
+(simplify
+ (rdiv (SIN @0) (TAN @0))
+ (if (! HONOR_NANS (@0)
+  && ! HONOR_INFINITIES (element_mode (@0)))
+  (cos @0)))
+
+/* Simplify tan(x) / sin(x) -> 1.0 / cos(x). */
+(simplify
+ (rdiv (TAN @0) (SIN @0))
+ (if (! HONOR_NANS (@0)
+  && ! HONOR_INFINITIES (element_mode (@0)))
+  (rdiv { build_one_cst (type); } (COS @0
+
+/* Simplify pow(x,c) / x -> pow(x,c-1). */
+(simplify
+ (rdiv (POW @0 @1) @0)
+ (if (TREE_CODE (@1) == REAL_CST
+  && !TREE_OVERFLOW (@1))
+  (POW @0 (minus @1 { build_one_cst (type); }
+
+/* Simplify a/root(b/c) into a*root(c/b).  */
+/* FIXME : cbrt ICE's with AArch64.  */
+(for root (SQRT CBRT)
+(simplify
+ (rdiv @0 (root (rdiv @1 @2)))
+  (mult @0 (root (rdiv @2 @1)
+
+/* Simplify x / expN(y) into x*expN(-y). */
+/* FIXME : exp2 ICE's with AArch64.  */
+(for exps (EXP EXP2)
+(simplify
+ (rdiv @0 (exps @1))
+  (mult @0 (exps (negate @1)
+
+/* Simplify x / pow (y,z) -> x * pow(y,-z). */
+(simplify
+ (rdiv @0 (POW @1 @2))
+  (mult @0 (POW @1 (negate @2
+
  /* Special case, optimize logN(expN(x)) = x.  */
  (for logs (LOG LOG2 LOG10)
   exps (EXP EXP2 EXP10)


--
Marc Glisse


Re: Replacing malloc with alloca.

2015-09-14 Thread Marc Glisse

On Sun, 13 Sep 2015, Ajit Kumar Agarwal wrote:


The replacement of malloc with alloca can be done on the following analysis.

If the lifetime of an object does not stretch beyond the immediate scope. In 
such cases the malloc can be replaced with alloca.
This increases the performance to a great extent.

Inlining helps to a great extent the scope of lifetime of an object doesn't 
stretch the immediate scope of an object.
And the scope of replacing malloc with alloca can be identified.

I am wondering what phases of our optimization pipeline the malloc is replaced 
with alloca and what analysis is done to transform
The malloc with alloca. This greatly increases the performance of the 
benchmarks? Is the analysis done through Escape Analysis?

If yes, then what data structure is used for the abstract execution 
interpretation?


Did you try it? I don't think gcc ever replaces malloc with alloca. The 
only optimization we do with malloc/free is removing it when it is 
obviously unused. There are several PRs open about possible optimizations 
(19831 for instance).


I posted a WIP patch a couple years ago to replace some malloc+free with 
local arrays (fixed length) but never had time to finish it.

https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03108.html

--
Marc Glisse


Re: Multiprecision Arithmetic Builtins

2015-09-21 Thread Marc Glisse

On Mon, 21 Sep 2015, Florian Weimer wrote:


On 09/21/2015 08:09 AM, Oleg Endo wrote:

Hi all,

I was thinking of adding some SH specific builtin functions for the
addc, subc and negc instructions.

Are there any plans to add clang's target independent multiprecision
arithmetic builtins (http://clang.llvm.org/docs/LanguageExtensions.html)
to GCC?


Do you mean these?

<https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html>

Is there something else that is missing?


http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins

Those that take a carryin argument.

--
Marc Glisse


Re: avoiding recursive calls of calloc due to optimization

2015-09-21 Thread Marc Glisse

On Mon, 21 Sep 2015, Daniel Gutson wrote:


This is derived from https://gcc.gnu.org/ml/gcc-help/2015-03/msg00091.html

Currently, gcc provides an optimization that transforms a call to
malloc and a call to memset into a call to calloc.
This is fine except when it takes place within the calloc() function
implementation itself, causing a recursive call.
Two alternatives have been proposed: -fno-malloc-builtin and disable
optimizations in calloc().
I think the former is suboptimal since it affects all the code just
because of the implementation of one function (calloc()),
whereas the latter is suboptimal too since it disables the
optimizations in the whole function (calloc too).
I think of two alternatives: either make -fno-calloc-builtin to
disable the optimization, or make the optimization aware of the
function context where it is operating and prevent it to do the
transformation if the function is calloc().

Please help me to find the best alternative so we can implent it.


You may want to read this PR for more context

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888#c27

--
Marc Glisse


Re: complex support when using -std=c++11

2015-11-12 Thread Marc Glisse

On Thu, 12 Nov 2015, D Haley wrote:

I am currently trying to understand an issue to do with complex number 
support in gcc.


Consider the following code:

#include 
int main()
{
   float _Complex  a = _Complex_I;

}

Attempting to compile this with  these commands is fine:
$ g++ tmp.cpp -std=gnu++11
$ g++ tmp.cpp

Clang is also fine:
$ clang tmp.cpp -std=c++11


Not here, I am getting the same error with clang (or "use of undeclared 
identifier '_Complex_I'" with libc++). This probably depends more on your 
libc.



Attempting to compile with c++11 is not:
$ g++ tmp.cpp -std=c++11
In file included from /usr/include/c++/5/complex.h:36:0,
from tmp.cpp:2:
tmp.cpp: In function ‘int main()’:
tmp.cpp:5:29: error: unable to find numeric literal operator ‘operator""iF’
float _Complex  a = _Complex_I;
^
tmp.cpp:5:29: note: use -std=gnu++11 or -fext-numeric-literals to enable more 
built-in suffixes


I'm using debian testing's gcc:
$ gcc --version
gcc (Debian 5.2.1-17) 5.2.1 20150911
...


I discussed this on #gcc, and it was suggested (or I misunderstood) that this 
is intentional, and the library should not support c-type C++ primitives - 
however I can find no deprecation notice for this, nor does it appear that 
the c++11 standard (as far as I can see from a quick skim) has changed the 
behaviour in this regard.


Is this intended behaviour, or is this a bug? This behaviour was noticed when 
troubleshooting compilation behaviours in mathgl.


https://groups.google.com/forum/?_escaped_fragment_=topic/mathgl/cl4uYygPmOU#!topic/mathgl/cl4uYygPmOU


C++11, for some unknown reason, decided to hijack the C header complex.h 
and make it equivalent to the C++ header complex. The fact that you are 
still getting _Complex_I defined is already a gcc extension, as is 
providing _Complex in C++.


The C++ standard introduced User Defined Literals, which prevents the 
compiler from recognizing extra suffixes like iF in standard mode (why are 
so many people using c++11 and not gnu++11?).


Our support for complex.h in C++11 in gcc is kind of best-effort. In this 
case, I can think of a couple ways we could improve this


* _Complex_I is defined as (__extension__ 1.0iF). Maybe __extension__ 
could imply -fext-numeric-literals?


* glibc could define _Complex_I some other way, or libstdc++ could 
redefine it to some other safer form (for some reason __builtin_complex is 
currently C-only).


--
Marc Glisse


Re: GCC 5.4 Status report (2015-12-04)

2015-12-04 Thread Marc Glisse

On Fri, 4 Dec 2015, NightStrike wrote:


Will there be another 4.9 release, too?  I'm really hoping that branch
can stay open a bit, since I can't upgrade to the new std::string
implementation yet.


Uh? The new ABI in libstdc++ is supposed to be optional, you can still use 
the old std::string in gcc-5, can't you?


--
Marc Glisse


RE: GCC Front-End Questions

2015-12-08 Thread Marc Glisse

On Tue, 8 Dec 2015, Jodi A. Miller wrote:

One algebraic simplification we are seeing is particularly interesting.  
Given the following code snippet intended to check for buffer overflow, 
which is actually undefined behavior in C++, we expected to maybe see 
the if check optimized away entirely.




char buffer[100];
int length;  //value received through argument or command line
.
.
If (buffer + length < buffer)
{
    cout << "Overflow" << endl;
}


Instead, our assembly code showed that the conditional was changed to 
length < 0, which is not what was intended at all.  Again, this showed 
up in the first IR file generated with g++ so we are thinking it 
happened in the compiler front-end, which is surprising.  Any thoughts 
on this?  In addition, when the above conditional expression is not used 
as part of an if check (e.g., assigned to a Boolean), it is not 
simplified.


Those optimizations during parsing exist mostly for historical reasons, 
and we are slowly moving away from them. You can look for any function 
call including "fold" in its name in the front-end. They work on 
expressions and mostly consist of matching patterns (described in 
fold-const.c and match.pd), like p + n < p in this case.


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Dominik Vogt wrote:


This snippet ist from the Plumhall 2014 xvs test suite:

 #if CXX03 || CXX11 || CXX14
 static float (*p1_)(float) = abs;
 ...
 checkthat(__LINE__, p1_ != 0);
 #endif

(With the testsuite specific macros doing the obvious).  abs() is
declared as:

 int abs(int j)

Am I missing some odd C++ feature or is that part of the test just
plain wrong?  I don't know where to look in the C++ standard; is
this supposed to compile (with or without a warning?) or generate
an error or is it just undefined?

 error: invalid conversion from ‘int (*)(int) throw ()’ to ‘float (*)(float)’ 
[-fpermissive]

(Of course even with -fpermissive this won't work because (at
least on my platform) ints are passed in different registers than
floats.)


There are other overloads of 'abs' declared in math.h / cmath (only in 
namespace std in the second case, and there are bugs (or standard issues) 
about having them in the global namespace for the first one).


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Jonathan Wakely wrote:


There are other overloads of 'abs' declared in math.h / cmath (only in
namespace std in the second case, and there are bugs (or standard issues)
about having them in the global namespace for the first one).


That's not quite accurate, C++11 was altered slightly to reflect reality.

 is required to declare std::abs and it's unspecified whether
it also declares it as ::abs.

 is required to declare ::abs and it's unspecified whether it
also declares it as std::abs.


$ cat a.cc
#include 
int main(){
  abs(3.5);
}

$ g++-snapshot a.cc -c -Wall -W
a.cc: In function 'int main()':
a.cc:3:10: error: 'abs' was not declared in this scope
   abs(3.5);
  ^

That's what I called "bug" in my message (there are a few bugzilla PRs for 
this). It would probably work on Solaris.


And I seem to remember there are at least 2 open LWG issues on the topic, 
one saying that the C++11 change didn't go far enough to match reality, 
since it still documents C headers differently from the C standard, and 
one saying that all overloads of abs should be declared as soon as one is 
(yes, they contradict each other).


--
Marc Glisse


Re: Strange C++ function pointer test

2015-12-31 Thread Marc Glisse

On Thu, 31 Dec 2015, Dominik Vogt wrote:


The minimal failing program is

-- abs.C --
#include 
static float (*p1_)(float) = abs;
-- abs.C --


This is allowed to fail. If you include math.h (in addition or instead of 
stdlib.h), it has to work (gcc bug if it doesn't).


See also
http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2294

--
Marc Glisse


Re: getting bugzilla access for my account

2016-01-02 Thread Marc Glisse

On Sat, 2 Jan 2016, Mike Frysinger wrote:


seeing as how i have commit access to the gcc tree, could i have
my bugzilla privs extended as well ?  atm i only have normal ones
which means i only get to edit my own bugs ... can't dupe/update
other ones people have filed.  couldn't seem to find docs for how
to request this, so spamming this list.

my account on gcc.gnu.org/bugzilla is "vap...@gentoo.org".


Permissions are automatic for @gcc addresses, you should create a new 
account with that one (you can make it follow the old account, etc).


--
Marc Glisse


Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct

2016-02-20 Thread Marc Glisse

On Sat, 20 Feb 2016, H.J. Lu wrote:


On Fri, Feb 19, 2016 at 1:07 PM, Richard Smith  wrote:

On Fri, Feb 19, 2016 at 5:35 AM, Michael Matz  wrote:

Hi,

On Thu, 18 Feb 2016, Richard Smith wrote:


An empty type is a type where it and all of its subobjects
(recursively) are of class, structure, union, or array type.  No
memory slot nor register should be used to pass or return an object
of empty type.


The trivially copyable is gone again.  Why is it not necessary?


The C++ ABI doesn't defer to the C psABI for types that aren't
trivially-copyable. See
http://mentorembedded.github.io/cxx-abi/abi.html#normal-call


Hmm, yes, but we don't want to define something for only C and C++, but
language independend (so far as possible).  And given only the above
language I think this type:

struct S {
  S() {something();}
};

would be an empty type, and that's not what we want.


Yes it is. Did you mean to give S a copy constructor, copy assignment
operator, or destructor instead?


"Trivially copyable"
is a reasonably common abstraction (if in doubt we could even define it in
the ABI), and captures the idea that we need well (namely that a bit-copy
is enough).


In this case:

struct dummy0
{
};

struct dummy
{
 dummy0 d[20];

 dummy0 * foo (int i);
};

dummy0 *
dummy::foo (int i)
{
 return &d[i];
}

dummy0 *
bar (dummy d, int i)
{
 return d.foo (i);
}

dummy shouldn't be passed as empty type.


Why not?

We need to have a clear definition for what kinds of member functions 
are allowed in an empty type.


--
Marc Glisse


Re: Subtyping support in GCC?

2016-03-23 Thread Marc Glisse

On Wed, 23 Mar 2016, Jason Chagas wrote:


The the ARM compiler (armcc) provides a subtyping ($Sub/$Super)
mechanism useful as a patching technique (see links below for
details). Can someone tell me if GCC has similar support? If so, where
can I learn more about it?

FYI, before posting this question here, I researched the web
extensivelly on this topic. There seems to be some GNU support for
subtyping in C++.  But I had no luck finding any information
specifically for 'C'.

Thanks,

Jason

How to use $Super$$ and $Sub$$ for patching data?:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15416.html

Using $Super$$ and $Sub$$ to patch symbol definitions:
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Chdefdce.html


(the best list would have been gcc-h...@gcc.gnu.org)

GNU ld has an option --wrap=symbol. Does that roughly match your need?

--
Marc Glisse


Re: Constexpr in intrinsics?

2016-03-27 Thread Marc Glisse

On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote:


Would it be possible to add constexpr to the intrinsics headers?

For instance _mm_set_XX and _mm_setzero intrinsics.


Already suggested here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197

A patch would be welcome (I started doing it at some point, I don't 
remember if it was functional, the patch is attached).



Ideally it could also be added all intrinsics that can be evaluated at compile
time, but it is harder to tell which those are.

Does gcc have a C extension we can use to set constexpr?


What for?

--
Marc GlisseIndex: gcc/config/i386/avx2intrin.h
===
--- gcc/config/i386/avx2intrin.h(revision 223886)
+++ gcc/config/i386/avx2intrin.h(working copy)
@@ -93,41 +93,45 @@ _mm256_packus_epi32 (__m256i __A, __m256
   return (__m256i)__builtin_ia32_packusdw256 ((__v8si)__A, (__v8si)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_packus_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i)__builtin_ia32_packuswb256 ((__v16hi)__A, (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qu)__A + (__v32qu)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hu)__A + (__v16hu)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi32 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v8su)__A + (__v8su)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_add_epi64 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4du)__A + (__v4du)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_adds_epi8 (__m256i __A, __m256i __B)
@@ -167,20 +171,21 @@ _mm256_alignr_epi8 (__m256i __A, __m256i
 }
 #else
 /* In that case (__N*8) will be in vreg, and insn will not be matched. */
 /* Use define instead */
 #define _mm256_alignr_epi8(A, B, N)   \
   ((__m256i) __builtin_ia32_palignr256 ((__v4di)(__m256i)(A), \
(__v4di)(__m256i)(B),  \
(int)(N) * 8))
 #endif
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_and_si256 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4du)__A & (__v4du)__B);
 }
 
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_andnot_si256 (__m256i __A, __m256i __B)
@@ -219,69 +224,77 @@ _mm256_blend_epi16 (__m256i __X, __m256i
   return (__m256i) __builtin_ia32_pblendw256 ((__v16hi)__X,
  (__v16hi)__Y,
   __M);
 }
 #else
 #define _mm256_blend_epi16(X, Y, M)\
   ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \
(__v16hi)(__m256i)(Y), (int)(M)))
 #endif
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qi)__A == (__v32qi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hi)__A == (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi32 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v8si)__A == (__v8si)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpeq_epi64 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v4di)__A == (__v4di)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi8 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v32qi)__A > (__v32qi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi16 (__m256i __A, __m256i __B)
 {
   return (__m256i) ((__v16hi)__A > (__v16hi)__B);
 }
 
+__GCC_X86_CONSTEXPR11
 extern __inline __m256i
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_cmpgt_epi32 (__m256i __A, __m256i __B)
 {
   

Re: Constexpr in intrinsics?

2016-03-28 Thread Marc Glisse

On Mon, 28 Mar 2016, Allan Sandfeld Jensen wrote:


On Sunday 27 March 2016, Marc Glisse wrote:

On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote:

Would it be possible to add constexpr to the intrinsics headers?

For instance _mm_set_XX and _mm_setzero intrinsics.


Already suggested here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197

A patch would be welcome (I started doing it at some point, I don't
remember if it was functional, the patch is attached).


That looks very similar to the patch I experimented with, and that at least
works for using them in C++11 constexpr functions.


Ideally it could also be added all intrinsics that can be evaluated at
compile time, but it is harder to tell which those are.

Does gcc have a C extension we can use to set constexpr?


What for?


To have similar functionality in C. For instance to explicitly allow those
functions to be evaluated at compile time, and values with similar attributes
be optimized completely out.


Those intrinsics that are implemented without builtins can already be 
evaluated at compile time.


#include 

__m128d f(){
  __m128d a=_mm_set_pd(1,2);
  __m128d b=_mm_setr_pd(4,3);
  return _mm_add_pd(a, b);
}

The generated asm is just

movapd  .LC0(%rip), %xmm0
ret

For the more esoteric intrinsics, what is missing is not in the parser, it 
is a folder that understands the behavior of each particular intrinsic.



And of course avoid using precompiler noise, in
shared C/C++ headers like these are.


--
Marc Glisse


Re: Updating the GCC 6 release notes

2016-05-03 Thread Marc Glisse

On Tue, 3 May 2016, Damian Rouson wrote:


Could someone please tell me how to edit or submit edits for the GCC 6 release 
notes at https://gcc.gnu.org/gcc-6/changes.html?  Specially, the listed Fortran 
improvements are missing several significant items.   I signed the copyright 
assignment in case hat helps.


https://gcc.gnu.org/about.html#cvs

You can send a diff to gcc-patc...@gcc.gnu.org to propose a patch 
(possibly Cc: the fortran mailing-list if your patch is related), same as 
code changes.


--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-25 Thread Marc Glisse

On Thu, 26 May 2016, martin krastev wrote:


Hello,

I've been scratching my head over an implicit conversion issue,
depicted in the following code:


typedef __attribute__ ((vector_size(4 * sizeof(int int generic_int32x4;

struct Foo {
   Foo() {
   }
   Foo(const generic_int32x4& src) {
   }
   operator generic_int32x4() const {
   return (generic_int32x4){ 42 };
   }
};

struct Bar {
   Bar() {
   }
   Bar(const int src) {
   }
   operator int() const {
   return 42;
   }
};

int main(int, char**) {

   const Bar b = Bar() + Bar();
   const generic_int32x4 v = (generic_int32x4){ 42 } + (generic_int32x4){ 42 };
   const Foo e = generic_int32x4(Foo()) + generic_int32x4(Foo());
   const Foo f = Foo() + Foo();
   const Foo g = (generic_int32x4){ 42 } + Foo();
   const Foo h = Foo() + (generic_int32x4){ 42 };
   return 0;
}

In the above, the initialization expression for local 'b' compiles as
expected, and so do the expressions for locals 'v' and 'e'. The
initializations of locals 'f', 'g' and 'h', though, fail to compile
(under g++-6.1.1, likewise under 5.x and 4.x) with:

$ g++-6 xxx.cpp
xxx.cpp: In function ‘int main(int, char**)’:
xxx.cpp:28:22: error: no match for ‘operator+’ (operand types are
‘Foo’ and ‘Foo’)
 const Foo f = Foo() + Foo();
   ~~^~~
xxx.cpp:29:40: error: no match for ‘operator+’ (operand types are
‘generic_int32x4 {aka __vector(4) int}’ and ‘Foo’)
 const Foo g = (generic_int32x4){ 42 } + Foo();
~~~^~~
xxx.cpp:30:22: error: no match for ‘operator+’ (operand types are
‘Foo’ and ‘generic_int32x4 {aka __vector(4) int}’)
 const Foo h = Foo() + (generic_int32x4){ 42 };
   ~~^

Apparently there is some implicit conversion rule that stops g++ from
doing the expected implicit conversions, but I can't figure out which
rule that is. The fact clang handles the code without an issue does
not help either. Any help will be appreciated.


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57572

--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-26 Thread Marc Glisse

On Thu, 26 May 2016, martin krastev wrote:


Thank you for the reply. So it's a known g++ issue with a candidate
patch. Looking at the patch, I was wondering, what precludes the
generic vector types form being proper arithmetic types?


In some cases vectors act like arithmetic types (operator+, etc), and in 
others they don't (conversions in general). We have scalarish_type_p for 
things that are scalars or vectors, we could add arithmeticish_type_p ;-)


(I think the name arithmetic comes directly from the standard, so we don't 
want to change its meaning)


--
Marc Glisse


Re: Implicit conversion to a generic vector type

2016-05-27 Thread Marc Glisse

On Fri, 27 May 2016, martin krastev wrote:


A new arithmeticish type would take more effort, I understand. Marc,
are there plans to incorporate your patch, perhaps in an extended
form, in a release any time soon?


There is no plan either way. When someone is motivated enough (I am not, 
currently), they will submit a patch to gcc-patc...@gcc.gnu.org, which 
will be reviewed. Note that a patch needs to include testcases (see the 
files in gcc/testsuite/g++.dg for examples). If you are interested, you 
could give it a try...


--
Marc Glisse


Re: An issue with GCC 6.1.0's make install?

2016-06-04 Thread Marc Glisse

On Sat, 4 Jun 2016, Ethin Probst wrote:


Yesterday I managed to successfully build GCC and all of the
accompanying languages that it supports by default (Ada, C, C++,
Fortran, Go, Java, Objective-C, Objective-C++, and Link-time
Optimization (LTO)). I did not build JIT support because I have not
herd if it is stable or not.
Anyways, seeing as I didn't (and still do not) want to wait another 12
hours for that to build, I compressed it into a .tar.bz2 archive,


Did you use "make -j 8" (where 8 is roughly how many CPUs you have in your 
server)? 12 hours seems excessive.



copied it over to another server, decompressed it, and here's when the


Did you copy it to exactly the same path as on the original server, 
preserving time stamps, and do both servers have identical systems?



problems start. Keep in mind that I did ensure that all files were
compressed and extracted.
When I go into my build subdirectory build tree, and type "make
install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and
gcj, but it errors out (and, subsequently, bales out) and says the
following:
Making install in tools
make[3]: *** [install-recursive] Error 1
make[2]: *** [install-recursive] Error 1
make[1]: *** [install-target-libjava] Error 2
make: *** [install] Error 2
And then:
$ gcj
gcj: error: libgcj.spec: No such file or directory


A more common approach would be to run "make install DESTDIR=/some/where", 
tar that directory, copy this archive to other servers, and untar it in 
the right location. That's roughly what linux distributions do.



I'm considering the test suite, but until it installs, I'm not sure if
executing the test suite would be very wise at this point. To get it
to say that no input file was specified, I have to manually run the
following commands:
$ cd x86_64-pc-linux-gnu/libjava
$ cp libgcj.spec /usr/bin


That seems like a strange location for this file.


Has the transportation of the source code caused the build tree to be
messed up? I know that it works perfectly fine on my other server.
Running make install without the -s command line parameter yields
nothing. Have I done something wrong?


"nothing" is not very helpful... Surely it gave some error message.

--
Marc Glisse


Re: [RFC][Draft patch] Introduce IntegerSanitizer in GCC.

2016-07-04 Thread Marc Glisse

On Mon, 4 Jul 2016, Maxim Ostapenko wrote:


Is community interested in such a tool?


On the one hand, it is clearly useful since you found bugs thanks to it.

On the other hand:

1) I hope we never reach the situation caused by Microsoft's infamous
warning C4146 (which is even an error if you enable "secure" mode),
where projects writing perfectly legal bignum code keep getting
misguided reports by users who see those warnings.

2) This kind of encourages people to keep using unsigned types for 
non-negative integers, whereas they would be better reserved to bignum and 
bitfields (sadly, the standards make it hard to avoid unsigned types...).


--
Marc Glisse


Vector unaligned load/store x86 intrinsics

2016-08-25 Thread Marc Glisse

Hello,

I was considering changing the implementation of _mm_loadu_pd in x86's 
emmintrin.h to avoid a builtin. Here are 3 versions:


typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__));
typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, 
aligned(1)));

__m128d f (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}

__m128d g (double const *__P)
{
  return *(__m128d_u*)(__P);
}

__m128d h (double const *__P)
{
  __m128d __r;
  __builtin_memcpy (&__r, __P, 16);
  return __r;
}


f is what we have currently. f and g generate the same code. h also 
generates the same code except at -O0 where it is slightly longer.


(note that I haven't regtested either version yet)

1) I don't have any strong preference between g and h, is there a reason 
to pick one over the other? I may have a slight preference for g, which 
expands to


  __m128d _3;
  _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];

while h yields

  __int128 unsigned _3;
  _3 = MEM[(char * {ref-all})__P_2(D)];
  _4 = VIEW_CONVERT_EXPR(_3);


2) Reading Intel's doc for movupd, it says: "If alignment checking is 
enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check 
exception (#AC) may or may not be generated (depending on processor 
implementation) when the operand is not aligned on an 8-byte boundary." 
Since we generate movupd for memcpy even when the alignment is presumably 
only 1 byte, I assume that this alignment-check stuff is not supported by 
gcc?


--
Marc Glisse


Re: Vector unaligned load/store x86 intrinsics

2016-08-26 Thread Marc Glisse

On Fri, 26 Aug 2016, Richard Biener wrote:


On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse  wrote:

Hello,

I was considering changing the implementation of _mm_loadu_pd in x86's
emmintrin.h to avoid a builtin. Here are 3 versions:

typedef double __m128d __attribute__ ((__vector_size__ (16),
__may_alias__));
typedef double __m128d_u __attribute__ ((__vector_size__ (16),
__may_alias__, aligned(1)));

__m128d f (double const *__P)
{
  return __builtin_ia32_loadupd (__P);
}

__m128d g (double const *__P)
{
  return *(__m128d_u*)(__P);
}

__m128d h (double const *__P)
{
  __m128d __r;
  __builtin_memcpy (&__r, __P, 16);
  return __r;
}


f is what we have currently. f and g generate the same code. h also
generates the same code except at -O0 where it is slightly longer.

(note that I haven't regtested either version yet)

1) I don't have any strong preference between g and h, is there a reason to
pick one over the other? I may have a slight preference for g, which expands
to

  __m128d _3;
  _3 = MEM[(__m128d_u * {ref-all})__P_2(D)];

while h yields

  __int128 unsigned _3;
  _3 = MEM[(char * {ref-all})__P_2(D)];
  _4 = VIEW_CONVERT_EXPR(_3);


I prefer 'g' which is just more natural.


Ok, thanks.

Note that the C language requires that __P be aligned to alignof 
(double)  (not sure what the Intel intrinsic specs say here), and thus 
it doesn't allow arbitrary misalignment.  This means that you could use 
a slightly better aligned type with aligned(alignof(double)).


I had thought about it, but since we already generate movupd with 
aligned(1), it didn't really seem worth the trouble for this prototype.


Or to be conforming the parameter should not be double const * but a 
double type variant with alignment 1 ...


Yeah, those intrinsics have issues:

__m128i _mm_loadu_si128 (__m128i const* mem_addr)
"mem_addr does not need to be aligned on any particular boundary."

that doesn't really make sense.

I may try to experiment with your suggestion, see if it breaks anything. 
Gcc seems happy to ignore those alignment differences when casting 
function pointers, so it should be fine.



2) Reading Intel's doc for movupd, it says: "If alignment checking is
enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check
exception (#AC) may or may not be generated (depending on processor
implementation) when the operand is not aligned on an 8-byte boundary."
Since we generate movupd for memcpy even when the alignment is presumably
only 1 byte, I assume that this alignment-check stuff is not supported by
gcc?


Huh, never heard of this.  Does this mean that mov_u_XX do alignment-check
exceptions?  I believe this would break almost all code (glibc memcpy, GCC
generated code, etc).  Thus it would require kernel support, emulating
the unaligned ops to still work (but record them somehow).


Elsewhere ( 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_pd&expand=3106,3115,3106,3124,3106&techs=SSE2 
) Intel doesn't mention this at all, it just says: "mem_addr does not need 
to be aligned on any particular boundary." So it might be a provision in 
the spec that was added just in case, but never implemented...


--
Marc Glisse


Re: Is this FE bug or am I missing something?

2016-09-12 Thread Marc Glisse

On Sun, 11 Sep 2016, Igor Shevlyakov wrote:


Small sample below fails (at least on 6.1) for multiple targets. The
difference between two functions start at the very first tree pass...


You are missing -fsanitize=undefined (and #include ).

Please use the mailing list gcc-h...@gcc.gnu.org next time.

--
Marc Glisse


Re: Is this FE bug or am I missing something?

2016-09-13 Thread Marc Glisse

On Mon, 12 Sep 2016, Igor Shevlyakov wrote:


Well, my concern is not what happens with overflow (which in second
case -fsanitize=undefined will address), but rather consistency of
that 2 cases.

p[x+1] generates RTL which leads to better generated code at the
expense of leading to overflow, while p[1+x] never overflows but leads
to worse code.
It would be beneficial to make the behaviour consistent between those 2 cases.


True. Your example with undefined behavior confused me as to what your 
point was.


For

int* f1(int* p, int x) { return &p[x + 1]; }
int* f2(int* p, int x) { return &p[1 + x]; }

we get in the gimple dump

  _1 = (sizetype) x;
  _2 = _1 + 1;
vs
  _1 = x + 1;
  _2 = (long unsigned int) _1;

The second one is a better starting point (it has more information about 
potential overflow), but the first one has the advantage that all numbers 
have the same size, which saves an instruction in the end


movslq  %esi, %rsi
leaq4(%rdi,%rsi,4), %rax
vs
addl$1, %esi
movslq  %esi, %rsi
leaq(%rdi,%rsi,4), %rax

We regularly discuss the potential benefits of a pass that would try to 
uniformize integer sizes...


In the mean time, I agree that gimplifying x+1 and 1+x differently makes 
little sense, you could file a PR about that.


--
Marc Glisse


Re: how to check if target supports andnot instruction ?

2016-10-12 Thread Marc Glisse

On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:


I was having a look at PR71636 and added the following pattern to match.pd:
x & ((1U << b) - 1) -> x & ~(~0U << b)
However the transform is useful only if the target supports "andnot"
instruction.


rth was selling the transformation as a canonicalization, which is 
beneficial when there is an andnot instruction, and neutral otherwise, so 
it could be done always.



As pointed out by Marc in PR for -march=core2, lhs generates worse
code than rhs,
so we shouldn't do the transform if target doesn't support andnot insn.
(perhaps we could do the reverse transform for target not supporting andnot?)


Rereading my comment in the PR, I pointed out that instead of being 
neutral, the transformation was very slightly detrimental in one case (one 
extra mov) because of a RA issue. That doesn't mean we should avoid the 
transformation, just that we should fix the RA issue (by the way, if you 
have time to file a separate PR for the RA issue, that would be great, 
otherwise I'll try to do it at some point...).


However it seems andnot isn't a standard pattern name, so am not sure 
how to check if target supports andnot insn ?


--
Marc Glisse


Re: how to check if target supports andnot instruction ?

2016-10-13 Thread Marc Glisse

On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote:


On 12 October 2016 at 14:43, Richard Biener  wrote:

On Wed, 12 Oct 2016, Marc Glisse wrote:


On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:


I was having a look at PR71636 and added the following pattern to match.pd:
x & ((1U << b) - 1) -> x & ~(~0U << b)
However the transform is useful only if the target supports "andnot"
instruction.


rth was selling the transformation as a canonicalization, which is beneficial
when there is an andnot instruction, and neutral otherwise, so it could be
done always.


Well, its three instructions to three instructions and a more expensive
constant(?).  ~0U might not be available as immediate for the shift
instruction and 1U << b might be available as a bit-set instruction ...
(vs. the andnot).


True, I hadn't thought of bit-set.


So yes, we might decide to canonicalize to andnot (and decide that
three binary to two binary and one unary op is "better").

So no excuse to explore the target specific .pd fragment idea ... :/

Hi,
I have attached patch that adds the transform.
Does that look OK ?


Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested 
in the PR? If we only do the transformation when (1<bit_and, then we probably want to require that it has a single use (maybe 
even the shift).



I am not sure how to write test-cases for it though.
For the test-case:
unsigned f(unsigned x, unsigned b)
{
 unsigned t1 = 1U << b;
 unsigned t2 = t1 - 1;
 unsigned t3 = x & t2;
 return t3;
}

forwprop dump shows:
Applying pattern match.pd:523, gimple-match.c:47419
gimple_simplified to _6 = 4294967295 << b_1(D);
_8 = ~_6;
t3_5 = x_4(D) & _8;

I could scan for "_6 = 4294967295 << b_1(D);"  however I suppose
~0 would depend on width of int and not always be 4294967295 ?
Or should I scan for "_6 = 4294967295 << b_1(D);"
and add /* { dg-require-effective int32 } */  to the test-case ?


You could check that you have ~, or that you don't have " 1 << ".

--
Marc Glisse


Re: GCC 6.2.0 : What does the undocumented -r option ?

2016-11-07 Thread Marc Glisse

On Mon, 7 Nov 2016, Emmanuel Charpentier wrote:


The Sage project (http://www.sagemath.org) has recently hit an
interesting snag : its developers using Debian testing began to
encounter difficulties compiling the flint package (http://groups.googl
e.co.uk/group/flint-devel) with gcc 2.6.0.

One of us found (see https://groups.google.com/d/msg/sage-devel/TduebNo
ZuBE/sEULolL0BQAJ) that this was bound to a conflict between the -pie
option (now default) and an undocumented -r option.

We would like to know what is this -r option, what it does and why it
is undocumented.


(the mailing list you are looking for is gcc-h...@gcc.gnu.org)

As can be seen in the first message of the conversation you link to
"/usr/bin/ld: -r and -pie may not be used together"

The option -r is passed to ld, so you have to look for it in ld's manual 
where it is clearly documented.


(that hardening stuff is such a pain...)

--
Marc Glisse


Re: Need some help with a possible bug

2014-04-23 Thread Marc Glisse

(should have been gcc-h...@gcc.gnu.org, please send any follow-ups there)

On Wed, 23 Apr 2014, George R Goffe wrote:


I'm trying to build the latest gcc


Do you really need gcj? If not, please disable java.

and am getting a message from the 
process "collect2: error: ld returned 1 exit status" for this library 
/usr/lsd/Linux/lib/libgmp.so. Here's the full msg: 
"/usr/lsd/Linux/lib/libgmp.so: could not read symbols: File in wrong 
format"


You are doing a multilib build (--disable-multilib if you don't want 
that), so it tries to build both a 64 bit and a 32 bit versions of 
libjavamath.so, both of which want to link to GMP. So you need both 
versions of GMP installed as well.


I thought the configure script in classpath would detect your missing 32 
bit GMP and disable use of GMP in that case, but apparently not... You may 
want to file a PR in bugzilla about that if there isn't one already. But 
you'll need to provide more info there: your configure command line, the 
file config.log in the 32 bit version of classpath, etc.


--
Marc Glisse


Re: RTL representation of i386 shrdl instruction is incorrect?

2014-06-05 Thread Marc Glisse

On Thu, 5 Jun 2014, Niranjan Hasabnis wrote:


Thanks for your reply. I looked into some of the details of how that
particular RTL template is used. It seems to me that the particular
RTL template is used only when shifting 64-bit data type on a 32-bit
machine. This is the underlying assumption encoded in i386.c file
which generates that particular RTL only when instruction mode is
DImode. If that is the case, then it won't matter whether one uses
arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte
value. In other words, the mapping between RTL template and shrdl
is incorrect, but the underlying assumption in i386.c guards the bug.


This is still a bug, please file a PR. The use of (match_dup 0) apparently 
prevents combine from matching the insn (that's just a guess from my notes 
in PR 55583, I don't have access to my gcc machine right now to check), 
but that doesn't mean we shouldn't fix things.


--
Marc Glisse


Re: What is "fnspec function type attribute"?

2014-06-06 Thread Marc Glisse

On Fri, 6 Jun 2014, FX wrote:


In fortran/trans-decl.c, we have a comment above the code building function 
decls, saying:


   The SPEC parameter specifies the function argument and return type
   specification according to the fnspec function type attribute.  */


I was away from GCC development for some time, so this is news to me. The 
syntax is not immediately clear, and neither a Google search nor a grep of the 
trunk’s numerous .texi files reveals any information. I’m creating new decls, 
what I am to do with it?


You can look at the 2 functions in gimple.c that use gimple_call_fnspec, 
and refer to tree-core.h for the meaning of EAF_*, etc. A string like 
"2x." means:
'2': the first letter is about the return, here we are returning the 
second argument

'x': the first argument is ignored
'.': not saying anything about the second argument.

--
Marc Glisse


Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM

2014-06-25 Thread Marc Glisse

On Wed, 25 Jun 2014, Vladimir Makarov wrote:

Maybe.  But in this case LLVM did a right thing.  The variable addressing was 
through a restrict pointer.


Ah, gcc implements (on purpose?) a weak version of restrict, where it only 
considers that 2 restrict pointers don't alias, whereas all other 
compilers assume that restrict pointers don't alias other non-derived 
pointers (see several PRs in bugzilla). I believe Richard recently added 
code that would make implementing the strong version of restrict easier. 
Maybe that's what is missing here?


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Jeff Law wrote:


On 07/01/14 13:27, Tom de Vries wrote:

Vladimir,

There are a few patterns which use both the read/write constraint
modifier (+) and the earlyclobber constraint modifier (&):
...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
  [(set (match_operand: 0 "register_operand" "+&w")
(vec_concat:
  (truncate: (match_operand:VQN 1 "register_operand"
"w"))
  (truncate: (match_operand:VQN 2 "register_operand"
"w"]
...

The documentation (
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:
...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction
is finished using the input operands'. AFAIU that would indeed exclude
the possibility that the earlyclobber operand is an input/output operand
it self, but perhaps I misunderstand.

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?
I don't think we can define any reasonable semantics for &+.  My 
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber operand" and goes on to explain why that is useful. It avoids 
using the same register for other input when they are identical.


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Tom de Vries wrote:


On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber
operand" and goes on to explain why that is useful. It avoids using the 
same

register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
 [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%0,r")))]
 "TARGET_32BIT && !arm_arch6"
 "mul%?\\t%0, %2, %1"
 [(set_attr "type" "mul")
  (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.


I think it could have used (match_dup 0) instead of operand 1, if there 
had been only the first alternative. And then the constraint would have 
been +&.


AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber 
output operand 0 using the '0' matching constraint.


Having said that, I don't understand the comment, AFAIU it should be: 'Use 
'0' to make sure operands 0 and 1 are the same, and use '&' to make sure 
operands 0 and 2 are not the same.


Well, yeah, the comment doesn't seem completely in sync with the code.

In the first example you gave, looking at the pattern (no match_dup, 
setting the full register), it seems that it may have wanted "=&" instead 
of "+&".


(by the way, in the same aarch64-simd.md file, I noticed some 
define_expand with constraints, that looks strange)


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Marc Glisse

On Wed, 2 Jul 2014, Tom de Vries wrote:


On 02-07-14 08:23, Marc Glisse wrote:
I think it could have used (match_dup 0) instead of operand 1, if there 
had been only the first alternative. And then the constraint would have 
been +&.


isn't that explicitly listed as unsupported here ( 
https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 
):

...
Note that match_dup should not be used to tell the compiler that a particular 
register is being used for two operands (example: add that adds one register 
to another; the second register is both an input operand and the output 
operand). Use a matching constraint (see Simple Constraints) for those. 
match_dup is for the cases where one operand is used in two places in the 
template, such as an instruction that computes both a quotient and a 
remainder, where the opcode takes two input operands but the RTL template has 
to refer to each of those twice; once for the quotient pattern and once for 
the remainder pattern.

...
?


Well, looking for instance at x86_shrd... Ok, I didn't know it wasn't 
supported (though I did suggest using match_operand and "0" at some 
point).


Still, the meaning of +&, in inline asm for instance, seems relatively 
clear, no?


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-02 Thread Marc Glisse

On Wed, 2 Jul 2014, Tom de Vries wrote:


On 02-07-14 09:02, Marc Glisse wrote:
Still, the meaning of +&, in inline asm for instance, seems relatively 
clear, no?


I can't find any testsuite examples using this construct.

Furthermore, I'd expect the same semantics and restrictions for constraints 
in rtl templates and inline asm.


So I'm not sure what you mean.


Coming back to your original question:

An earlyclobber operand is defined as 'modified before the instruction is 
finished using the input operands'. AFAIU that would indeed exclude the 
possibility that the earlyclobber operand is an input/output operand it 
self, but perhaps I misunderstand.


So my question is: is the combination of '&' and '+' supported ? If so, 
what is the exact semantics ? If not, should we warn or give an error ?


An earlyclobber operand X prevents *other* input operands from using the 
same register, but that does not include X itself (if it is using +) or 
operands explicitly using a matching constraint for X. At least that's how 
I understand it.


--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Jakub Jelinek wrote:


- libstdc++ ABI changes


It seems unlikely to be in the next release, it is too late in the cycle. 
Chances to break the ABI don't come often, and rushing one at the end of 
stage1 would be wasting a good opportunity.


--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Richard Biener wrote:


It's an ABI change for all modes (but not a SONAME change because the
old and new definitions will both be present in the .so).


Ugh.  That's going to be a nightmare to support.


Yes. And IMO a waste of effort compared to a clean .so.7 break, but 
well...



 Is there a configure
switch to change the default ABI used?  That is, on a legacy system
can I upgrate to 5.0 and get code that interoperates fine with code
built with 4.8?  (including ABI boundaries using the affected classes?
I suspect APIs with std::string passing are _very_ common, not
sure about std::list)

What's the failure mode the user will see when linking against a
4.8 compiled library with a std::string interface using 5.0?


In good cases, a linker error about a missing symbol (different mangling). 
In less good cases, a warning at compile-time about using a class marked 
with abi_tag in a class not marked with it. In worse cases (passing 
through void* for instance), a runtime crash.



And how do libraries with such an API avoid silently changing their
ABI dependent on the compiler used to compile them?  That is,
I suppose those need to change their SONAME dependent on
the compiler version used?!


Yes, just like a move to .so.7 would entail.

--
Marc Glisse


Re: GCC version bikeshedding

2014-08-06 Thread Marc Glisse

On Wed, 6 Aug 2014, Jakub Jelinek wrote:


On Wed, Aug 06, 2014 at 12:31:57PM +0200, Richard Biener wrote:

Ok, so the problematical case is

struct X { std::string s; };
void foo (X&);


Yeah.


then.  OTOH I remember that then mangling of X changes as well?


Only if you add abi_tag attribute to X.


Note that -Wabi-tag can tell you where it is needed.

struct __attribute__((abi_tag("marc"))) X {};
struct Y { X x; };

a.cc:2:8: warning: 'Y' does not have the "marc" abi tag that 'X' (used in 
the type of 'Y::x') has [-Wabi-tag]

 struct Y { X x; };
^
a.cc:2:14: note: 'Y::x' declared here
 struct Y { X x; };
  ^
a.cc:1:41: note: 'X' declared here
 struct __attribute__((abi_tag("marc"))) X {};
 ^


I hope the libstdc++ folks will add some macro which will
include the right abi_tag attribute for the std::list/std::string
cases, so you'd in the end just add
#ifndef _GLIBCXX_ABI_TAG_SOMETHING
#define _GLIBCXX_ABI_TAG_SOMETHING
#endif
...
struct X _GLIBCXX_ABI_TAG_SOMETHING { std::string s; };
void foo (X&);
or similar.


So we only need to patch every project out there...



A clean .so.7 break would be significantly worse nightmare.  We've been
there many years ago, e.g. 3.2/3.3 vs. 3.4, there has been significantly
fewer C++ plugins etc. in packages and it still it was unsolvable.
With the abi_tag stuff, you have the option to make stuff interoperable
when mixing compiler, either with no effort at all, or some limited
effort.  With .so.7, you have no option, nothing will be interoperable.


I disagree that it is worse, but you have more experience, I guess we
will see the results in a few years...

--
Marc Glisse


Re: Where does GCC pick passes for different opt. levels

2014-08-11 Thread Marc Glisse

On Mon, 11 Aug 2014, Steve Ellcey  wrote:


I have a basic question about optimization selection in GCC.  There used to
be some code in GCC (passes.c?) that would set various optimize pass flags
depending on if the 'optimize' flag was > 0, > 1, or > 2; later I think
there may have been a table.


There is still a table in opts.c, with entries that look like:

{ OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 },



This code seems gone now and I can't figure
out how GCC is selecting what optimization passes to run at what optimization
levels (-O1 vs. -O2 vs. -O3).  How is this handled in the top-of-tree GCC code?

I see passes.def but there doesn't seem to be anything in there to tie
specific passes to specific optimization levels.  Likewise in common.opt
I see flags for various optimization passes but nothing to tie them to
-O1 or -O2, etc.

I'm probably missing something obvious, but a pointer would be much
appreciated.


--
Marc Glisse


Re: Conditional negation elimination in tree-ssa-phiopt.c

2014-08-12 Thread Marc Glisse

On Mon, 11 Aug 2014, Kyrill Tkachov wrote:


The aarch64 target has a conditional negation instruction
CSNEG Rd, Rs1, Rs2, cond

with semantics Rd = if cond then Rs1 else -Rs2.

This, however doesn't get end up getting matched for code such as:
int
foo2 (unsigned a, unsigned b)
{
 int r = 0;
 r = a & b;
 if (a & b)
   return -r;
 return r;
}


Note that in this particular case, we should just return -(a&b) like llvm 
does.


--
Marc Glisse


Re: gcc parallel make check

2014-09-03 Thread Marc Glisse

On Wed, 3 Sep 2014, VandeVondele  Joost wrote:


I've noticed that

make -j -k check-fortran

results in a serialized checking, while

make -j32 -k check-fortran

goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check 
target, while the other targets seem to do just fine. Is that a feature, or 
should I file a PR for that... ?


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155

--
Marc Glisse


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-17 Thread Marc Glisse

On Wed, 17 Sep 2014, Ian Grant wrote:


And is there any way to disable the Intel library?


--disable-libcilkrts (same as the other libs)
If it explicitly doesn't support your system, I am a bit surprised it 
isn't disabled automatically, that seems like a bug.


Please don't call it "the Intel library", that doesn't mean anything.

--
Marc Glisse


Re: Fwd: Building gcc-4.9 on OpenBSD

2014-09-17 Thread Marc Glisse

On Wed, 17 Sep 2014, Ian Grant wrote:


On Wed, Sep 17, 2014 at 1:36 PM, Marc Glisse  wrote:

On Wed, 17 Sep 2014, Ian Grant wrote:


And is there any way to disable the Intel library?



--disable-libcilkrts (same as the other libs)
If it explicitly doesn't support your system, I am a bit surprised it isn't
disabled automatically, that seems like a bug.


Not necessarily a bug, but it would have been good if the --help
option had mentioned it. I looked, really. Perhaps I missed it though.
So many options for disabling one thing or another 


https://gcc.gnu.org/install/configure.html
lists a number of others but not this one, maybe it should be added.


Please don't call it "the Intel library", that doesn't mean anything.


Doesn't it? How did you know what 'it' was then? Or is that a stupid
question? This identity concept is much slipperier than it seems at
first, isn't it?


You included error messages...


How about my question about the size of the binaries? Is that 60+MB
what other systems show?


I still see <20M here, but I don't know if there are reasons for what you 
are seeing. Are you maybe using different options? (debug information, 
optimization, lto, etc)


--
Marc Glisse


Re: How to identify the type of the object being created using the new operator?

2014-10-06 Thread Marc Glisse

On Mon, 6 Oct 2014, Swati Rathi wrote:


Statement : A *a = new B;

gets translated in GIMPLE as
1. void * D.2805;
2. struct A * a;
3. D.2805 = operator new (20);
4. a = D.2805;

A is the base class and B is the derived class.
In statement 3, new operator is creating an object of derived class B.
By analyzing the RHS of the assignment statement 3, how can we identify the 
type (in this case B) of the object being created?


I strongly doubt you can. It is calling B's constructor that will turn 
this memory region into a B, operator new is the same as malloc, it 
only returns raw memory.


(If A and B don't have the same size, the argument 20 can be a hint)

--
Marc Glisse


Re: volatile access optimization (C++ / x86_64)

2014-12-26 Thread Marc Glisse

On Fri, 26 Dec 2014, Matt Godbolt wrote:


I'm investigating ways to have single-threaded writers write to memory
areas which are then (very infrequently) read from another thread for
monitoring purposes. Things like "number of units of work done".

I initially modeled this with relaxed atomic operations. This
generates a "lock xadd" style instruction, as I can't convey that
there are no other writers.

As best I can tell, there's no memory order I can use to explain my
usage characteristics. Giving up on the atomics, I tried volatiles.
These are less than ideal as their power is less expressive, but in my
instance I am not trying to fight the ISA's reordering; just prevent
the compiler from eliding updates to my shared metrics.

GCC's code generation uses a "load; add; store" for volatiles, instead
of a single "add 1, [metric]".


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677

--
Marc Glisse


Re: C++ Standard Question

2015-01-22 Thread Marc Glisse

On Thu, 22 Jan 2015, Joel Sherrill wrote:


I think this is a glibc issue but since this method is defined in the C++
standards, I thought there were plenty of language lawyers here. :)


s/glibc/libstdc++/ and they have their own ML.





That's deprecated, isn't it?


  class strstreambuf : public basic_streambuf >
  ISSUE > int pcount() const;   <= ISSUE

My reading of the C++03 and draft C++14 says that the int pcount() method
in this class is not const. glibc has it const in the glibc shipped with
Fedora 20
and CentOS 6.

This is a simple test case:

   #include 

   int main() {
   int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount;
/*-- pcount is conformant --*/
   return 0;
   }

What's the consensus?


The exact signature of member functions is not mandated by the standard, 
implementations are allowed to make the function const if that works (or 
provide both a const and a non-const version). Your code is not guaranteed 
to work. Lambdas usually provide a fine workaround.


--
Marc Glisse


Re: unfused fma question

2015-02-23 Thread Marc Glisse

On Mon, 23 Feb 2015, Jeff Law wrote:


On 02/23/15 11:38, Joseph Myers wrote:


(I wonder if convert_mult_to_fma is something that should move to
match-and-simplify infrastructure.)

Yea, it probably should.


Currently, it happens in a pass that is quite late. If it moves to 
match-and-simplify, I am afraid it might inhibit some other optimizations 
(we can turn plus+mult to fma but not the reverse), unless we use some way 
to inhibit some patterns until a certain pass (possibly a simple "if", if 
that's not too costly). Such "time-restricted" patterns might be useful 
for other purposes: don't introduce complicated vector/complex operations 
after the corresponding lowering passes, do narrowing until a certain 
point but then prefer fast integer sizes, etc (I haven't thought about 
those particular examples, they are only an illustration).


--
Marc Glisse


Re: A bug (?) with inline functions at O0: undefined reference

2015-03-06 Thread Marc Glisse

On Fri, 6 Mar 2015, Ilya Verbin wrote:


I've discovered a strange behaviour on trunk gcc, here is the reproducer:

inline int foo ()
{
 return 0;
}

int main ()
{
 return foo ();
}

$ gcc main.c
/tmp/ccD1LeXo.o: In function `main':
main.c:(.text+0xa): undefined reference to `foo'
collect2: error: ld returned 1 exit status

Is this a bug?  If yes, is it known?
GCC 4.8.3 works fine though.


Not a bug, that's what inline means in C99 and later.

--
Marc Glisse


RE: PR65416, alloca on xtensa

2015-03-13 Thread Marc Gauthier
augustine.sterl...@gmail.com wrote:
> On Fri, Mar 13, 2015 at 7:54 AM, Max Filippov  wrote:
[...]
> > 2. alloca seems to make an additional 16-bytes padding to each stack
> >   allocation: alloca(1) results in moving sp down by 32 bytes,
> >   alloca(17)
> >   moves it by 48 bytes, etc. This padding looks unnecessary to me:
> >   either
> >   this space is not used (previous register frame is not spilled), or
> >   alloca
> >   exception handler will take care about reloading or moving spilled
> >   registers
> >   to a new location. In both cases after movsp this space is just
> >   wasted.
> >   Do you know why this padding may be needed?
> 
> Answering this question definitively requires some time with the ABI
> manual, which I don't have. You may be right, but I would check what
> XCC does in this case. It is far better tested.

Other than the required 16-byte stack alignment, there's nothing in
the ABI that requires these extra 16 bytes.  Perhaps there was a bad
implementation of the alloca exception handler at some point a long
time ago that prompted the extra 16 bytes?

Today XCC doesn't add the extra 16 bytes.  alloca(n) with n in a2
comes out as this:

   0x6490 <+12>:movi.n  a8, -16
   0x6492 <+14>:addi.n  a3, a2, 15
   0x6494 <+16>:and a3, a3, a8
   0x6497 <+19>:sub a3, a1, a3
   0x649a <+22>:movsp   a1, a3

which just rounds up to 16 bytes.

-Marc


Re: Named parameters

2015-03-16 Thread Marc Glisse

On Mon, 16 Mar 2015, David Brown wrote:


In a discussion on comp.lang.c, the subject of "named parameters" (or
"designated parameters") has come up again.  This is a feature that some
of us feel would be very useful in C (and in C++).  I think it would be
possible to include it in the language without leading to any conflicts
with existing code - it is therefore something that could be made as a
gcc extension, with a hope of adding it to the standards for a later C
standards revision.

I wanted to ask opinions on the mailing list as to the feasibility of
the idea - there is little point in my cluttering up bugzilla with an
enhancement request if the gcc developers can spot obvious flaws in the
idea.


Filing a report in bugzilla would be quite useless: language extensions 
are now almost automatically rejected unless they come with a proposal 
that has already been favorably seen by the standardization committee.


On the other hand, implementing the feature (in your own fork) is almost a 
requirement if you intend to propose this for standardization. And it 
should not be too hard.



Basically, the idea is this:

int foo(int a, int b, int c);

void bar(void) {
foo(1, 2, 3);   // Normal call
foo(.a = 1, .b = 2, .c = 3) // Same as foo(1, 2, 3)
foo(.c = 3, .b = 2, .a = 1) // Same as foo(1, 2, 3)
}


struct foo_args {
  int a, b, c;
};
void foo(struct foo_args);
#define foo(...) foo((struct foo_args){__VA_ARGS__})
void g(){
  foo(1,2,3);
  foo(.c=3,.b=2);
}

In C++ you could almost get away without the macro, calling f({1,2,3}), 
but f({.c=3}) currently gives "sorry, unimplemented". Maybe you would like 
to work on that?



If only the first variant is allowed (with the named parameters in the
order declared in the prototype), then this would not affect code
generation at all - the designators could only be used for static error
checking.

If the second variant is allowed, then the parameters could be re-ordered.


The aim of this is to make it easier and safer to call functions with a
large number of parameters.  The syntax is chosen to match that of
designated initialisers - that should be clearer to the programmer, and
hopefully also make implementation easier.

If there is more than one declaration of the function, then the
designators used should follow the most recent in-scope declaration.


An error may be safer, you would at least want a warning.


This feature could be particularly useful when combined with default
arguments in C++, as it would allow the programmer to override later
default arguments without specifying all earlier arguments.


C++ is always more complicated (so many features can interact in strange 
ways), I suggest you start with C.



At the moment, I am not asking for an implementation, or even /how/ it
might be implemented (perhaps a MELT plugin?) - I would merely like
opinions on whether it would be a useful and practical enhancement.


This is not such a good list for that, comp.lang.c is better suited. This 
will be a good list if you have technical issues implementing the feature.


--
Marc Glisse


Re: -Wno-c++11-extensions addition

2015-03-25 Thread Marc Glisse

On Wed, 25 Mar 2015, Jack Howarth wrote:


On Wed, Mar 25, 2015 at 12:41 PM, Jonathan Wakely  wrote:

On 25 March 2015 at 16:16, Jack Howarth wrote:

Does anyone remember which FSF gcc release first added the
-Wno-c++11-extensions option for g++? I know it exists in 4.6.3


Are you sure? It doesn't exist for 4.6.4 or anything later.

Are you thinking of -Wc++0x-compat ?


On x86_64 Fedora 15...

$ /usr/bin/g++ --version
g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /usr/bin/g++ -Wno-c++11-extensions hello.cc
$

So gcc 4.6.3 appears to at least tolerate that warning without
claiming that it is unknown.


https://gcc.gnu.org/wiki/FAQ#The_warning_.22unrecognized_command-line_option.22_is_not_given_for_-Wno-foo

--
Marc Glisse


Re: [i386] Scalar DImode instructions on XMM registers

2015-04-24 Thread Marc Glisse

On Fri, 24 Apr 2015, Uros Bizjak wrote:


Please try to generate paradoxical subreg (V2DImode subreg of V1DImode
pseudo). IIRC, there is some functionality in the compiler that is
able to tell if the highpart of the paradoxical register is zeroed.


Those are not currently legal (I tried to change that)
https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html
https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html

In this case, a subreg:V2DI of DImode should work.

--
Marc Glisse


Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?

2018-10-12 Thread Marc Glisse

On Fri, 12 Oct 2018, Thomas Schwinge wrote:


Hmm, and without any OpenACC/OpenMP etc., actually the same problem is
also present when running the following code through the vectorizer:

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 {
   int j = tmp / N_I;
   int i = tmp % N_I;
   a[j][i] = 0;
 }

... whereas the following variant (obviously) does vectorize:

   int a[NJ * NI];

   for (int tmp = 0; tmp < N_J * N_I; ++tmp)
 a[tmp] = 0;


I had a quick look at the difference, and a[j][i] remains in this form 
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get


  j_10 = tmp_17 / 1025;
  i_11 = tmp_17 % 1025;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1025;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

or for a power of 2

  j_10 = tmp_17 >> 10;
  i_11 = tmp_17 & 1023;
  _1 = (long unsigned int) j_10;
  _2 = _1 * 1024;
  _3 = (sizetype) i_11;
  _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least 
I think that's true).


So there are missing match.pd transformations in addition to whatever 
scev/ivdep/other work is needed.


--
Marc Glisse


Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)

2018-11-04 Thread Marc Glisse

(resent because of mail issues on my end)

On Mon, 22 Oct 2018, Thomas Schwinge wrote:


I had a quick look at the difference, and a[j][i] remains in this form
throughout optimization. If I write instead *((*(a+j))+i) = 0; I get

   j_10 = tmp_17 / 1025;
   i_11 = tmp_17 % 1025;
   _1 = (long unsigned int) j_10;
   _2 = _1 * 1025;
   _3 = (sizetype) i_11;
   _4 = _2 + _3;

or for a power of 2

   j_10 = tmp_17 >> 10;
   i_11 = tmp_17 & 1023;
   _1 = (long unsigned int) j_10;
   _2 = _1 * 1024;
   _3 = (sizetype) i_11;
   _4 = _2 + _3;

and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least
I think that's true).

So there are missing match.pd transformations in addition to whatever
scev/ivdep/other work is needed.


With a very simplistic "match.pd" rule (not yet any special cases
checking etc.):

diff --git gcc/match.pd gcc/match.pd
index b36d7ccb5dc3..4c23116308da 100644
--- gcc/match.pd
+++ gcc/match.pd
@@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
{ wide_int_to_tree (sizetype, off); })
  { swap_p ? @0 : @2; }))
{ rhs_tree; })
+
+/* Given:
+
+   j = in / N_I
+   i = in % N_I
+
+   ..., fold:
+
+   out = j * N_I + i
+
+   ..., into:
+
+   out = in
+*/
+
+/* As long as only considering N_I being INTEGER_CST (which are always second
+   argument?), probably don't need ":c" variants?  */
+
+(simplify
+ (plus:c
+  (mult:c
+   (trunc_div @0 INTEGER_CST@1)
+   INTEGER_CST@1)
+  (trunc_mod @0 INTEGER_CST@1))
+ (convert @0))


You should only specify INTEGER_CST@1 on the first occurence, the others
can be just @1. (you may be interested in @@1 at some point, but that
gets tricky)


..., the original code:

   int f1(int in)
   {
 int j = in / N_I;
 int i = in % N_I;

 int out = j * N_I + i;

 return out;
   }

... gets simplified from ("div-mod-0.c.027t.objsz1"):

   f1 (int in)
   {
 int out;
 int i;
 int j;
 int _1;
 int _6;

  :
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_return <_6>

   }

... to ("div-mod-0.c.028t.ccp1"):

   f1 (int in)
   {
 int out;
 int i;
 int j;
 int _1;

  :
 gimple_assign 
 gimple_assign 
 gimple_assign 
 gimple_return 

   }

(The three dead "gimple_assign"s get eliminated later on.)

So, that works.

However, it doesn't work yet for the original construct that I'd ran
into, which looks like this:

   [...]
   int i;
   int j;
   [...]
   signed int .offset.5_2;
   [...]
   unsigned int .offset.7_23;
   unsigned int .iter.0_24;
   unsigned int _25;
   unsigned int _26;
   [...]
   unsigned int .iter.0_32;
   [...]

:
   # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)>
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   [...]

Resolving the "a[j][i] = 123" we'll need to look into later.

As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we
get:

   [...]
   int i;
   int j;
   long unsigned int _1;
   long unsigned int _2;
   sizetype _3;
   sizetype _4;
   sizetype _5;
   int * _6;
   [...]
   signed int .offset.5_8;
   [...]
   unsigned int .offset.7_29;
   unsigned int .iter.0_30;
   unsigned int _31;
   unsigned int _32;
   [...]

:
   # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)>
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   gimple_assign 
   [...]

Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30",
but "match.pd" doesn't agree yet.  Note the many "nop_expr"s here, which
I have not yet figured out how to handle, I suppose?  I tried some things
but couldn't get it to work.  Apparently the existing instances of
"(match (nop_convert @0)" and "Basic strip-useless-type-conversions /
strip_nops" rule also don't handle these; should they?  Or, are in fact
here the types mixed up too much?


"(match (nop_convert @0)" defines a shortcut so some transformations can
use nop_convert to detect some specific conversions, but it doesn't do
anything by itself. "Basic strip-useless-type-conversions" strips
conversions that are *useless*, essentially from a type to the same
type. If you want to handle true conversions, you need to do that
explicitly, see the many transformations that use convert? convert1?
convert2? and specify for which particular conversions the
transformation is valid.  Finding out the right conditions to detect
these conversions is often the most painful part of writing a match.pd
transformation.


I hope to get some time again soon to continue looking into this, but if
anybody got any ideas, I'm all ears.


--
Marc Glisse


Re: [RFC] -Weverything

2019-01-22 Thread Marc Glisse

On Tue, 22 Jan 2019, Thomas Koenig wrote:


Hi,

What would people think about a -Weverything option which turns on
every warning there is?

I think that could be quite useful in some circumstances, especially
to find potential bugs with warnings that people, for some reason
or other, found too noisy for -Wextra.

The name could be something else, of course. In the best GNU tradition,
-Wkitchen-sink could be another option :-)


https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573 and duplicates already 
list quite a few arguments. Basically, it could be useful for debugging 
gcc or to discover warnings, but gcc devs fear that users will actually 
use it for real.


--
Marc Glisse


Re: [RFC] -Weverything

2019-01-23 Thread Marc Glisse

On Wed, 23 Jan 2019, Jakub Jelinek wrote:


We have that, gcc -Q --help=warning
Of course, for warnings which do require arguments (numerical, or
enumeration/string), one still needs to pick up his choices of those
arguments; no idea what -Weverything would do here, while some warnings
have different levels where a higher (or lower) level is a superset of
another level, what numbers would you pick for e.g. warnings where the
argument is bytes?


For most of them, there is a value that maximizes the number of warnings, 
so the same superset argument applies. -Wframe-larger-than=0 so it shows 
the estimated frame size on every function, -Walloca-larger-than=0 so it 
is equivalent to -Walloca, etc.


--
Marc Glisse


named address space problem

2019-02-13 Thread Marc Poulhies
Hi !

While porting a GCC 4.9 private port to GCC 7, I've encountered an issue with 
named address space support.

I have defined the following target macros:

#define K1_ADDR_SPACE_UNCACHED 1
#define K1_ADDR_SPACE_CONVERT 2

 TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P (returns false for CONVERT, regular 
legitimate hook for other as)
 TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS (raises an error if using CONVERT as or 
calls regular legitimize_address hook)
 TARGET_ADDR_SPACE_SUBSET_P (always true)
 TARGET_ADDR_SPACE_CONVERT (emits a warning if not to/from CONVERT as and 
always returns first operand)

#define REGISTER_TARGET_PRAGMAS() do {   \
   c_register_addr_space ("__uncached", K1_ADDR_SPACE_UNCACHED); \
   c_register_addr_space ("__convert", K1_ADDR_SPACE_CONVERT); \
 } while (0)

The usage is very basic and is used to drive the insn selection to use 
cached/uncached variants for load/store.
Pointers are declared with `__uncached` to use uncached variants and 
`__convert` is used when converting pointers to/from this uncached space.
It works as expected on GCC 4.9.

On our current port on GCC 7 (using latest gcc-7-branch branch), we have an 
issue with simple code:

```
typedef struct {
 unsigned long count;
} foo_t;  

unsigned long foobar(foo_t *cond, int bar)
{
  if (bar == 1 ) {
  }
  __uncached foo_t *ucond = cond;
  return ucond->count;
}
```

Raises the following error:

```
: In function 'foobar':
:9:3: error: unknown type name '__uncached'
   __uncached foo_t *ucond = cond;
   ^~

:9:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before 
'*' token
   __uncached foo_t *ucond = cond;
^
:10:10: error: 'ucond' undeclared (first use in this function); did you 
mean 'cond'?
   return ucond->count;
  ^
  cond
:10:10: note: each undeclared identifier is reported only once for each 
function it appears in
Compiler returned: 1
```

The following changes make the code compile as expected:

 - moving the variable declaration at the beginning of the block
 - opening a block before the declaration and closing it after the return stmt.

I could not find a matching PR in bugzilla.
Do you know of any issue with this ? Maybe this has been fixed in later 
versions.

Thanks,
Marc



Re: On-Demand range technology [2/5] - Major Components : How it works

2019-06-04 Thread Marc Glisse

On Tue, 4 Jun 2019, Martin Sebor wrote:


On 5/31/19 9:40 AM, Andrew MacLeod wrote:

On 5/29/19 7:15 AM, Richard Biener wrote:
On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod  
wrote:

On 5/27/19 9:02 AM, Richard Biener wrote:
On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod  
wrote:
The above suggests that iff this is done at all it is not in GORI 
because

those are not conditional stmts or ranges from feeding those.  The
machinery doing the use-def walking from stmt context also cannot
come along these so I have the suspicion that Ranger cannot handle
telling us that for the stmt following above, for example

    if (_5 != 0)

that _5 is not zero?

Can you clarify?

So there are 2 aspects to this.    the range-ops code for DIV_EXPR, if
asked for the range of op2 () would return ~[0,0] for _5.
But you are also correct in that the walk backwards would not find 
this.


This is similar functionality to how null_derefs are currently handled,
and in fact could probably be done simultaneously using the same code
base.   I didn't bring null derefs up, but this is a good time :-)

There is a separate class used by the gori-cache which tracks the
non-nullness property at the block level.    It has a single API:
non_null_deref_p (name, bb)    which determines whether the is a
dereference in any BB for NAME, which indicates whether the range has 
an

implicit ~[0,0] range in that basic block or not.

So when we then have

   _1 = *_2; // after this _2 is non-NULL
   _3 = _1 + 1; // _3 is non-NULL
   _4 = *_3;
...

when a on-demand user asks whether _3 is non-NULL at the
point of _4 = *_3 we don't have this information?  Since the
per-BB caching will only say _1 is non-NULL after the BB.
I'm also not sure whether _3 ever gets non-NULL during
non-NULL processing of the block since walking immediate uses
doesn't really help here?

presumably _3 is globally non-null due to the definition being (pointer
+ x)  ... ie, _3 has a global range o f ~[0,0] ?

No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and
you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer.


I'm confused.

_1 was loaded from _2 (thus asserting _2 is non-NULL).  but we have no idea 
what the range of _1 is, so  how do you assert _1 is [~0,0] ?
The only way I see to determine _3 is non-NULL  is through the _4 = *_3 
statement.


In the first two statements from the above (where _1 is a pointer):

 _1 = *_2;
 _3 = _1 + 1;

_1 must be non-null because C/C++ define pointer addition only for
non-null pointers, and therefore so must _3.


(int*)0+0 is well-defined, so this uses the fact that 1 is non-null. This 
is all well done in extract_range_from_binary_expr already, although it 
seems to miss the (dangerous) optimization NULL + unknown == NULL.


Just in case, a quote:

"When an expression J that has integral type is added to or subtracted
from an expression P of pointer type, the result has the type of P.
(4.1) — If P evaluates to a null pointer value and J evaluates to 0, the
result is a null pointer value.
(4.2) — Otherwise, if P points to element x[i] of an array object x with
n elements, 80 the expressions P + J and J + P (where J has the value j)
point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n
and the expression P - J points to the (possibly-hypothetical) element
x[i − j] if 0 ≤ i − j ≤ n.
(4.3) — Otherwise, the behavior is undefined"


Or does the middle-end allow arithmetic on null pointers?


When people use -fno-delete-null-pointer-checks because their (embedded) 
platform has important stuff at address 0, they also want to be able to do 
arithmetic there.


--
Marc Glisse


Re: Testsuite not passing and problem with xgcc executable

2019-06-08 Thread Marc Glisse

On Sat, 8 Jun 2019, Jonathan Wakely wrote:


You can see which tests failed by looking in the .log files in the
testsuite directories,


There are .sum files for a quick summary.


or by running the contrib/test_summary script.


There is also contrib/compare_tests, although running it globally has been 
failing for a long time now, and running it for individual .sum files 
fails for jit and libphobos. Other scripts in contrib/ may be relevant.


--
Marc Glisse


  1   2   3   >