Re: Pro64-based GPLed compiler
Vladimir Makarov wrote: > Marc Gonzalez-Sigler wrote: > >> I've taken PathScale's source tree (they've removed the IA-64 code >> generator, and added an x86/AMD64 code generator), and tweaked the >> Makefiles. >> >> I thought some of you might want to take a look at the compiler. >> >> http://www-rocq.inria.fr/~gonzalez/vrac/open64-alchemy-src.tar.bz2 > > This reference doesn't work. The directory vrac looks empty. Fixed. I'll never understand how AFS ACLs work ;-( This message was sent using IMP, the Internet Messaging Program.
gcc plugin on MacOS failure
I15gimple_opt_pass in ccHhkWiv.o "__ZTV8opt_pass", referenced from: __ZN8opt_passD2Ev in ccHhkWiv.o NOTE: a missing vtable usually means the first non-inline virtual member function has no definition. "_g", referenced from: __ZN12_GLOBAL__N_18afl_passC1Ebj in ccHhkWiv.o (maybe you meant: __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE8allocateEmPKv, __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE9constructIS7_JRKS7_EEEvPT_DpOT0_ , __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE10deallocateEPS8_m , __ZN9__gnu_cxx17__is_null_pointerIKcEEbPT_ , __ZNSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE11_M_get_nodeEv , __ZN9__gnu_cxx16__aligned_membufINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc6_M_ptrEv , __ZSt19__iterator_categoryIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEENSt15iterator_traitsIT_E17iterator_categoryERKSB_ , __ZNKSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE11_M_get_sizeEv , __ZNSt7__cxx1110_List_baseINS_12basic_stringIcSt11char_traitsIcESaIcEEESaIS5_EE21_M_get_Node_allocatorEv , __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEC2Ev , __ZN9__gnu_cxx14__alloc_traitsISaIcEcE17_S_select_on_copyERKS1_ , __ZNK9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc4baseEv , __ZSt11__remove_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_ , __ZNK3vecIP8edge_def5va_gc8vl_embedE6lengthEv , __ZN9__gnu_cxx11char_traitsIcE2eqERKcS3_ , __ZN9__gnu_cxx16__aligned_membufINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc7_M_addrEv , __ZN9__gnu_cxxneIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEbRKNS_17__normal_iteratorIT_T0_EESD_ , __ZN9__gnu_cxx5__ops10_Iter_predIPFiiEEclINS_17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbT_ , __Z15vec_safe_lengthIP8edge_def5va_gcEjPK3vecIT_T0_8vl_embedE , __ZSt9remove_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEPFiiEET_SC_SC_T0_ , __ZN9__gnu_cxx5__ops11__pred_iterIPFiiEEENS0_10_Iter_predIT_EES5_ , __ZN9__gnu_cxx17__normal_iteratorIPKcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcC1IPcEERKNS0_IT_NS_11__enable_ifIXsrSt10__are_sameISC_SB_E7__valueES8_E6__typeEEE , __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE7destroyIS7_EEvPT_ , __ZN3vecIP8edge_def5va_gc8vl_embedEixEj , __ZNK9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcdeEv , __ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_ , __ZN9__gnu_cxx11char_traitsIcE6lengthEPKc , __ZN9__gnu_cxxmiIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS_17__normal_iteratorIT_T0_E15difference_typeERKSB_SE_ , __ZN9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcED2Ev , __ZN9__gnu_cxxeqIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEbRKNS_17__normal_iteratorIT_T0_EESD_ , __ZN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcppEv , __ZN9__gnu_cxx5__ops10_Iter_predIPFiiEEC1ES3_ , __ZNK9__gnu_cxx13new_allocatorISt10_List_nodeINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcE11_M_max_sizeEv , __ZSt9__find_ifIN9__gnu_cxx17__normal_iteratorIPcNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcENS0_5__ops10_Iter_predIPFiiT_SF_SF_T0_St26random_access_iterator_tag ) "_global_options", referenced from: __ZN12_GLOBAL__N_18afl_pass21get_afl_prev_loc_declEv in ccHhkWiv.o "_global_trees", referenced from: __ZN12_GLOBAL__N_18afl_pass18get_afl_trace_declEv in ccHhkWiv.o __ZN12_GLOBAL__N_18afl_pass21get_afl_prev_loc_declEv in ccHhkWiv.o "_integer_types", referenced from: __ZN12_GLOBAL__N_18afl_pass21get_afl_area_ptr_declEv in ccHhkWiv.o "_plugin_default_version_check", referenced from: _plugin_init in ccHhkWiv.o "_register_callback", referenced from: _plugin_init in ccHhkWiv.o "_sizetype_tab", referenced from: __ZN12_GLOBAL__N_18afl_pass7executeEP8function in ccHhkWiv.o "_xrealloc", referenced from: __ZN7va_heap7reserveIP9tree_nodeEEvRP3vecIT_S_8vl_embedEjb in ccHhkWiv.o ld: symbol(s) not found for architecture x86_64 collect2: error: ld returned 1 exit status When I then look who might be supplying "_plugin_default_version_check" I only find /usr/local/opt/gcc@11/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 which is a program and no lib. Anyone knows how this can be fixed? Thank you! Regards, Marc -- Marc Heuse PGP: AF3D 1D4C D810 F0BB 977D 3807 C7EE D0A0 6BE9 F573
Re: gcc plugin on MacOS failure
Thank you so far, this got me (unsurprisingly) one step further, but then the external function resolve error is moved to the library loading stage: ~/afl++ $ g++-11 -Wl,-flat_namespace -Wl,-undefined,dynamic_lookup -g -fPIC -std=c++11 -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin/include -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin -I/usr/local//Cellar/gmp/6.2.1/include -shared instrumentation/afl-gcc-pass.so.cc -o afl-gcc-pass.so => compiles because the linker does not bail on functions it cannot resolve at link time. ~/afl++ $ ./afl-gcc-fast -o test-instr test-instr.c afl-cc ++3.15a by Michal Zalewski, Laszlo Szekeres, Marc Heuse - mode: GCC_PLUGIN-DEFAULT error: unable to load plugin './afl-gcc-pass.so': 'dlopen(./afl-gcc-pass.so, 9): Symbol not found: __ZN8opt_pass14set_pass_paramEjb Referenced from: ./afl-gcc-pass.so Expected in: flat namespace in ./afl-gcc-pass.so' Looking which library might be supplying this call does not show any library: ~/afl++ $ egrep -ral __ZN8opt_pass14set_pass_paramEjb /usr/local/ /usr/local//var/homebrew/linked/gcc/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 /usr/local//opt/gcc@11/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 /usr/local//opt/gfortran/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 /usr/local//opt/gcc/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 /usr/local//Cellar/gcc/11.1.0_1/libexec/gcc/x86_64-apple-darwin20/11.1.0/f951 (on the other hand it is the same on Linux, I cannot find a library that actually supplies that function. Thank you! Regards, Marc On 22.07.21 22:16, Iain Sandoe wrote: > > >> On 22 Jul 2021, at 20:41, Andrew Pinski via Gcc wrote: >> >> On Thu, Jul 22, 2021 at 7:37 AM Marc wrote: >>> > >>> I have a gcc plugin (for afl++, >>> https://github.com/AFLplusplus/AFLplusplus) that works fine when >>> compiled on Linux but when compiled on MacOS (brew install gcc) it fails: >>> >>> ~/afl++ $ g++-11 -g -fPIC -std=c++11 >>> -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin/include >>> -I/usr/local/Cellar/gcc/11.1.0_1/lib/gcc/11/gcc/x86_64-apple-darwin20/11.1.0/plugin >>> -I/usr/local//Cellar/gmp/6.2.1/include -shared >>> instrumentation/afl-gcc-pass.so.cc -o afl-gcc-pass.so >> >> A few things, You are not building the plugin with the correct options >> for darwin. >> Basically you need to allow undefined references > > -Wl, -undefined, dynamic_lookup > > but if you expect those to bind to the main exe (e.g. cc1plus) at runtime, > then you will need to build that with dynamic export. (-export_dynamic) > > These things will *not* transfer to arm64 macOS and they will probably > produce build warnings from newer linkers. > > === > > I suspect that we will need to find a different recipe for that case > (possibly using the main exe as a "link library" on the plugin link line, I > guess). > >> and then also use >> dylib as the extension. > > That’s a convention for shared libs but it won’t stop a plugin working (in > fact things like python use .so on macOS) > > for pluign modules, (e.g. Frameworks) even omitting the extension completely > has been done. > > (so this is not the source of the problem) > >> A few other things too. I always forgot the exact options to use on >> Darwin really. GNU libtool can help with that. > > perhaps, but I am not sure it’s maintained agressively .. so make sure to > check what you find is up to date. > > cheers, > Iain. > -- Marc Heuse www.mh-sec.de PGP: AF3D 1D4C D810 F0BB 977D 3807 C7EE D0A0 6BE9 F573
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
In article <[EMAIL PROTECTED]> you write: >I don't think doing any of both is a good idea. Authors of the affected >programs should adjust their makefiles instead - after all, the much more >often reported problems are with -fstrict-aliasing, and this one also doesn't >get any special treatment by autoconf. Even though -fno-strict-aliasing >-fwrapv >would be a valid, more forgiving default. Also as ever, -O2 is what get's >the most testing, so you are going to more likely run into compiler bugs >with -fwrapv. As a measure point, in the OpenBSD project, we have disabled -fstrict-aliasing by default. The documentation to gcc local to our systems duly notes this departure from the canonical release. We expect to keep it that way about forever. If/when we update to a version wher -fwrapv becomes an issue, we'll probably do the same with it. Specifically, because we value reliability over speed and strict standard conformance...
Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."
On Fri, Dec 29, 2006 at 06:46:09PM -0500, Richard Kenner wrote: > > Specifically, because we value reliability over speed and strict > > standard conformance... > Seems to me that programs that strictly meet the standard of the language > they are written in would be more reliable than programs that are written > in some ill-defined language. C has been a portable assembler for years before it got normalized and optimizing compilers took over. There are still some major parts of the network stack where you don't want to look, and that defy -fstrict-aliasing A lot of C programmers don't really understand aliasing rules. If this wasn't deemed to be a problem, no-one would have even thought of adding code to gcc so that i can warn about some aliasing violations. ;-) If you feel like fixing this code, be my guest.
Re: We're out of tree codes; now what?
In article <[EMAIL PROTECTED]> you write: >On 19 Mar 2007 19:12:35 -0500, Gabriel Dos Reis <[EMAIL PROTECTED]> wrote: >> similar justifications for yet another small% of slowdown have been >> given routinely for over 5 years now. small% build up; and when they >> build up, they don't not to be convincing ;-) > >But what is the solution? We can complain about performance all we >want (and we all love to do this), but without a plan to fix it we're >just wasting effort. Shall we reject every patch that causes a slow >down? Hold up releases if they are slower than their predecessors? >Stop work on extensions, optimizations, and bug fixes until we get our >compile-time performance back to some predetermined level? Simple sociology. Working on new optimizations = sexy. Trimming down excess weight = unsexy. GCC being vastly a volunteer project, it's much easier to find people who want to work on their pet project, and implement a recent optimization they found in a nice paper (that will gain 0.5% in some awkward case) than to try to track down speed-downs and to reverse them. I remember back then when people started converting gcc from RTL to ssa, I was truely excited. Finally doing the right thing, compile times are going to get back down where they belong. And then disappointment, as the ssa stuff just got added on top of the RTL stuff, and the RTL stuff that was supposed to vanish takes forever to go away... Parts of GCC are over-engineered. I used to be able to read the __attribute__ stuff, then it got refactored, and the new code looks like it's going to be about 3 or 4 times slower than it was. At some point, it's going to be really attractive to start again from scratch, without all the backends/frontend complexities and interactions that make cleaning up stuff harder and harder... Also, I have the feeling that quite a few of gcc sponsors are in it for the publicity mostly (oh look, we're nice people giving money to gcc), and new optimization passes that get 0.02% out of SPEC are better bang for their money. Kuddoes go to the people who actually manage to reverse some of the excesses of the new passes.
Re: We're out of tree codes; now what?
In article <[EMAIL PROTECTED]> you write: >On Mar 20, 2007, at 11:23 PM, Alexandre Oliva wrote: >> As for configure scripts... autoconf -j is long overdue ;-) >Is that the option to compile autoconf stuff into fast running >efficient code? :-) >But seriously, I think we need to press autoconf into generating 100x >faster code 90% of the time. Maybe prebundling answers for the >common targets... Doesn't win all that much. Over in OpenBSD, we tried to speed up the build of various software by using a site cache for most of the system stuff. Makes some configure scripts go marginally faster, but you won't gain back more than about 5%. Due to autoconf design, there are lots of things you actually cannot cache, because they interact in weird ways, and are used in strange If you want to speed up autoconf, it needs to have some smarts about what's going on. Not a huge list of generated variables, but what those variables *mean*, semantically, and which value is linked to what configuration option. You would then be able to avoid recomputing a lot of things (and you would also probably be able to compress information a lot... when I look at the list of configured stuff, there are so many duplicates it's scary). Autoconf also needs an actual database of tests, so that people don't reinvent a square wheel all the time. This would also solve the second autoconf plague: it's not only slow as molasses, but it also `auto-detects' stuff when you don't want it to, which leads to hard to reproduce builds, unless you start with an empty machine every time (which has its own performance issue). Even if it has lots of shortcomings still (large pile of C++ code to compile first), I believe a replacement like cmake shows a lot of promise there... In my opinion, after spending years *fighting* configure issues in making programs compile correctly under OpenBSD, I believe the actual database of tests is the only thing worth saving in autoconf. I don't know what the actual `good design' would be, but I'm convinced using m4 as a lisp interpreter to generate shell scripts is a really bad idea.
Re: Integer overflow in operator new
In article <[EMAIL PROTECTED]> you write: >On Fri, Apr 06, 2007 at 06:51:24PM -0500, Gabriel Dos Reis wrote: >> David Daney <[EMAIL PROTECTED]> writes: >> >> | One could argue that issuing some type of diagnostic (either at >> | compile time or run time) would be helpful for people that don't >> | remember to write correct code 100% of the time. >> >> I raised this very issue a long time ago; a long-term GCC contributor >> vocally opposed checking the possible overflow. I hope something will >> happen this time. > >I don't like slowing programs down, but I like security holes even less. > >If a check were to be implemented, the right thing to do would be to throw >bad_alloc (for the default new) or return 0 (for the nothrow new). If a >process's virtual memory limit is 400M, we throw bad_alloc if we do new >int[200], so we might as well do it if we do new int[20]. >There would be no reason to use a different reporting mechanism. > >There might be rare cases where the penalty for this check could have >an impact, like for pool allocators that are otherwise very cheap. >If so, there could be a flag to suppress the check. > Considering the issue is only for new [], I'd assume anyone who wants less correct, but faster behavior would simply handle the computation themselves, and deal with the overflow manually, then override whatever operator/class they need to make things work.
Re: Integer overflow in operator new
In article <[EMAIL PROTECTED]> you write: >The assert should not overflow. I suggest > >#include >#include >assert( n < SIZE_MAX / sizeof(int) ); > >which requires two pieces of information that the programmer >otherwise wouldn't need, SIZE_MAX and sizeof(type). > >Asking programmers to write extra code for rare events, has >not been very successful. It would be better if the compiler >incorporated this check into operator new, though throwing >an exception rather than asserting. The compiler should be >able to eliminate many of the conditionals. The compiler and its runtime should be correct, and programmers should be able to depend on it. When you read the documentation for new or calloc, there is no mention of integer overflow, it is not expected that the programmer has to know about that. Adding an extra test in user code even less sense than checking that pointers are not null before calling free...
Re: GCC 4.1 Projects
In article <[EMAIL PROTECTED]> you write: >People do break Ada bootstrap because they don't configure and test Ada, >they don't configure Ada because they complained about Ada build >machinery being non standard, delaying Ada build machinery changes will >only make things worse for Ada bootstrap statistics. Keep in mind that ada testing *needs a working ada compiler* in the first place. If you don't have one, this can be rather painful. You need access to a platform with a working ada compiler in order to build a cross-compiler, and you may have to figure out all kinds of not so nice stuff about cross-compilation. Been there, done that for i386-OpenBSD. Probably going to do it for sparc64 at some point in time, as well, but this is not exactly my cup of tea...
Re: Extension compatibility policy
On Mon, Feb 28, 2005 at 09:24:20AM -0500, Robert Dewar wrote: > Not quite, Marc is suggesting that -pedantic be the default if I read > the above statement correctly. Yep. Except it's probably too late for that, and there is stuff in -pedantic that is downright obnoxious (because every C compiler I know does it) and err... really pedantic, as opposed to actual warnings that help finding out about obscure extensions. In my opinion, this is just a case of a very bad design decision that was taken years ago. It took me years to grow a firm opinion about it too. The basic issue I'm talking about is failure modes for software. There are a few interesting error categories. - stuff that is an error but that the program can recover from. - stuff that is not really an error, but that the program can't find out about. The first class is interesting because it's a class we should not recover from, ever, for programming tools: it causes all sorts of grief all the time later down the line. Why ? because it's an ERROR, so it's not actually specified formally, and recovering from it gracefully muddles the semantics. Because some errors will be recovered, and some will not. And this might change from release to release, allowing half-broken software to grow and develop. For instance, the extern/static inline stuff in gcc falls under that line, in my book. GCC was not designed to put any hard checks that the inlined functions were also linked in when building the final executable, and so people like Torvalds complained when a later version of gcc did no longer inline the function and could not find it in a library. At the time the complaint came up, I came on the side of the GCC developers, that the extra feature was misused by linux developers... Now, I'm not so sure. I think that there was a design step missed along the guidelines of not allowing erroneous software to build. The second class is interesting because it comes up all the time with -Wall -Werror. All the `variable not initialized stuff' (that one is obvious). To a lesser extent, all the `comparison is always true due to limited range of data type'. Those warnings actually occur all the time in portable code, and are very hard to get rid of (and it's probably not a good idea to clean them all up). This makes -Wall -Werror much less useful than it could be. Forgive me if I'm reinventing the wheel (partly), but more and more, it seems to me that there's a category of warnings missing: the stuff that the compiler is sure about, and that cannot come from portability issues. Say, the -Wsurething warnings. If we could find reasonable semantics for these (along with a -Wunreasonable-extension switch), then maybe we would have something that -Werror could use. As far as practical experience goes, I've spent enough time dealing with OpenBSD kernel compilation (which does -Wall -Werror, btw) and with cleaning up various old C sources (which invariably start with a combination of warning switches, and then continues by reading pages of inappropriate warnings to find the right ones) to be fairly certain these kind of diagnostics could be useful... Oh yes, and the change from the old preprocessor to the new and improved cpplib took quite a long time to recover from too... You wouldn't believe how many people misuse token pasting all the time. But I put the effort because I think that's a good change: it takes unambiguously wrong code out in the backyard and shoots it dead.
Re: Questions about trampolines
In article <[EMAIL PROTECTED]> you write: >Well as I said above, trampolines or an equivalent are currently critically >needed by some front ends (and of course by anyone using the (very useful IMO) >extension of nested functions in C). This is your opinion, but I've yet to find an actual piece of code in a real project that uses that extension. On the other hand, non-executable stack is a feature used in the real world. Quite in common use actually... Think about that.
Re: Questions about trampolines
On Mon, Mar 14, 2005 at 01:25:34PM +, Joseph S. Myers wrote: > On Mon, 14 Mar 2005, Robert Dewar wrote: > > > I have certainly seen it used, but you may well be right that it is > > seldom used. It is certainly reasonable to consider removing this > > extension from C and C++. Anyone using that feature? Or know anyone > > who is. > Nested functions are used in the glibc dynamic linker. I'm not sure why, > and they may be inline nested functions whose addresses are never taken. > The extension is not present in GNU C++, only in GNU C. Well, Andreas Schwab seems to think this is no longer the case. I don't want to dive into the glibc mess, thanks god, but if the dynamic linker is implemented like dynamic linkers I know, it means any binary using a dynamic linker that uses trampolines will lose any kind of stack protection on some badly designed architectures, like say, i386...
Re: Questions about trampolines
The thing I did for OpenBSD 3.7 is patch the gcc-3.3.x we use: - On OpenBSD, by default, trampoline code generation is disabled in gcc 3.3.5. Code requiring trampolines will not compile without -ftrampolines. The warning flag -Wtrampolines can be used to locate trampoline instances if trampoline generation is re-enabled. that way, you still have trampolines available in C if you need them, but you don't risk compiling dangerous code that disables executable stack protection by mistake. It's probably quite trivial to write a similar patch for gcc-current, assuming you guys think it's the right design decision. After enabling that patch, we recompiled the whole system, all of X, and the 3000 packages of third party sources. -ftrampolines was needed exactly 0 times.
Re: Merging calls to `abort'
In article <[EMAIL PROTECTED]> you write: >GCC's primary purpose is to be the compiler for the GNU system. It is >used for many other purposes too, and it is good for GCC to serve more >purposes, but they're not as important for the GNU Project, even >though they are all important for some users. I'm a wee little bit fed up of that argument. Having the compiler BE the compiler for the GNU system is cool. But if you still want to have a thriving user community that willingly contribute to it, just giving lip service to other purposes of GCC is not always enough, Richard. Specifically, there are a bunch of people who use GCC as a system compiler on non GNU systems, that are currently willing to contribute code, but would jump ship if the tide turns in the direction of a `GNU system' too much. Remember emacs vs. xemacs ? Remember GCC vs. egcs ? Sorry for making a pest of myself and trolling heavily (well, not that heavily), but dismissing technical arguments on political grounds doesn't quite cut it for me (even if you half acknowledge the existence of other people, mostly to dismiss them off-hand). And I'm sure there are OTHER people in my own little minority who are very interested in the slant of your arguments.
Re: Merging calls to `abort'
On Tue, Mar 29, 2005 at 09:27:32AM -0800, Joe Buck wrote: > Or are you just way behind in your reading? Way behind. I've read the discussion, I've seen nothing looking like my argument, so I posted my reply.
Re: GCC 4.1: Buildable on GHz machines only?
In article <[EMAIL PROTECTED]> you write: >The alternative of course is to do only crossbuilds. Is it reasonable >to say that, for platforms where a bootstrap is no longer feasible, a >successful crossbuild is an acceptable test procedure to use instead? No. I've been playing enough with crossbuilds to know that a crossbuild will show you bugs that do not exist in native builds, and VICE-VERSA. Building a full system natively, compiler-included, is still one of the best stress-test for an operating system. This mind frame, that because the compiler is too slow, it's acceptable to do cross-builds, is killing older systems. Very quickly, you end up with fast, robust systems that are heavily tested through the build of lots of software, and with slow, untested systems that never see a build, and are only tested casually by people running a handful of specialized applications on them. I'm speaking from experience: you wouldn't believe how many bugs we tracked and fixed in OpenBSD on fringe platforms (arm, sparc64) simply because we do native builds and see stuff people doing cross-builds don't see. This is not even the first time I talk about this on this list. Except for embedded systems where memory and disk space don't make it practical to compile anything natively, having a compiler so slow that it makes it impossible to compile stuff natively kills old platforms. Do you know why GCC4 is deprecated on sparc-openbsd ? It's simply because no-one so far has been able to dedicate the CPU time to track down the few bugs that prevented us from switching to gcc 3.x from 2.95. That's right, I said CPU-time. It takes too long to bootstrap the compiler, it takes too long to rebuild the whole system. And thus, it rots.
Re: GCC 4.1: Buildable on GHz machines only?
How about replacing that piece of junk that is called libtool with something else ? Preferably something that works. Between it's really poor quoting capabitilities, and the fact that half the tests are done at configure time, and half the tests are done at run-time, libtool is really poor engineering. It's really atrocious when you see operating systems tests all over the place *in the libtool script* and not in the configure process in the first place. Heck, last time I even tried to figure out some specs for libtool options from the script, I nearly went mad. It won't be any use for GCC, but I ought to tell you that the OpenBSD folks are seriously considering replacing libtool entirely with a home-made perl script that would ONLY handle libtool stuff on OpenBSD and nowhere else. Between the fact that the description is too low-level (very hard to move libraries around, or -L stuff that pops up in the wrong order all the time and gets you to link with the wrong version of the library), and that some of the assertions it makes are downright bogus (hardcoding -R even when it's not needed, or being real nosy about the C compiler in the wrong way and assuming that the default set of libraries without options will be the same one as the set with -fpic), it's getting to the point where it would be a real gain to just reverse-engineer its features and rewrite it from scratch.
Re: Sine and Cosine Accuracy
Sorry for chiming in after all this time, but I can't let this pass. Scott, where on earth did you pick up your trig books ? The mathematical functions sine and cosine are defined everywhere. There is absolutely 0 identity involving them which doesn't apply all over the real, or the complex plane. It's also true for other trigonometric functions, like tangent, with the obvious caveat that tangent goes to infinity when x-> pi/2 (or any congruent number, periodically). The infinite series for sine and cosine even converge all over the complex plane, since n! >> x^n for a given x, with n big enough (okay, the actual mathematical argument is a bit more complex, but that's the idea, n! goes to infinity a heck of a lot faster than x^n). I'm thinking you're confusing that stuff with either of two things: - since the trig functions are periodic, the reverse functions are obviously ambiguous, and you need some external input to solve the ambiguity. This makes for arbitrary definitions, and lots of fun in glueing the complex plane back together, and there's no way to avoid that, since it's the whole basis for the very useful theory of holomorphic functions and complex integration. And the math library usually has an atan2 function to take care of the ambiguity. - most software implementation of trig functions use approximation polynomial, usually a variation on Tchebichev polynomials, which converge much faster than the complete series, but MUST be restricted to a very small range, since they don't even converge to the right value outside this range. Now, the fact is that floating point arithmetic can be real tricky, and it's often necessary to (gasp) rework the equations and think to get some correct digits out of ill-applied trigonometric functions. But I haven't seen it that often in text books outside of specialized applied maths...
Re: GCC and Floating-Point
In article <[EMAIL PROTECTED]> you write: > http://csdl.computer.org/dl/mags/co/2005/05/r5091.pdf > "An Open Question to Developers of Numerical Software", by > W. Kahan and D. Zuras Doesn't look publically accessible from my machine...
Re: Sine and Cosine Accuracy
On Sun, May 29, 2005 at 08:59:00PM +0200, Georg Bauhaus wrote: > Marc Espie wrote: > >Sorry for chiming in after all this time, but I can't let this pass. > > > >Scott, where on earth did you pick up your trig books ? > > Sorry, too, but why one earth do modern time mathematics scholars > think that sine and cosine are bound to have to do with an equally > modern notion of real numbers that clearly exceed what a circle > has to offer? What is a plain unit circle of a circumference that > exceeds 2??? > How can a real mathematical circle of the normal kind have > more than 360 non-fractional sections? > By "real circle" I mean a thing that is not obfuscated by the useful > but strange ways in which things are redefined by mathematicians; > cf. Halmos for some humor. Err, because it all makes sense ? Because there is no reason to do stuff from 0 to 360 instead of -180 to 180 ? > And yes, I know that all the other stuff mentioned in this thread > explains very well that there exist useful definitions of sine for real > numbers outside "(co)sine related ranges", and that these definitions > are frequently used. Still, at what longitude does your your trip around > the world start in Paris, at 2°20' or at 362°20', if you tell the story > to a seaman? Cutting a pizza at 2.0^90. Huh?! At 0.0. Did you know that, before Greenwhich, the meridian for the origin of longitude was going through Paris ? Your idea would make some sense if you talked about a latitude (well, even though the notion of north pole is not THAT easy to define, and neither is the earth round). Heck, I can plot trajectories on a sphere that do not follow great circles, and that extend over 360 degrees in longitude. I don't see why I should be restricted from doing that. > Have a look into e.g. "Mathematics for the Million" by Lancelot > Hogben for an impression of how astounding works of architecture > have been done without those weird ways of extending angle related > computations into arbitrarily inflated numbers of which no one knows > how to distinguish one from the other in sine (what you have dared to call > "obvious", when it is just one useful convention. Apparently some > applications derive from different conventions if I understand Scott's > remarks correctly). There are some arbitrary convenient definitions in modern mathematics. The angle units have been chosen so that derivation of sine/cosine is obvious. The definition of sine/cosine extends naturally to the whole real axis which gives a sense to mechanics, rotation speeds, complex functions and everything that's been done in mathematics over the last four centuries or so. You can decide to restrict this stuff to plain old 2D geometry, and this would be fine for teaching in elementary school, but this makes absolutely no sense with respect to any kind of modern mathematics. Maybe playing with modern mathematical notions for years has obfuscated my mind ? or maybe I just find those definitions to be really obvious and intuitive. Actually, I would find arbitrary boundaries to be unintuitive. There is absolutely nothing magical wrt trigonometric functions, if I compare them to any other kind of floating point arithmetic: as soon as you try to map `real' numbers into approximations, you have to be VERY wary if you don't want to lose all precision. There's nothing special, nor conventional about sine and cosine. Again, if you want ARBITRARY conventions, then look at reverse trig functions, or at logarithms. There you will find arbitrary discontinuities that can't be avoided.
bug or not ? ada loop in gcc-4.1-20050528
I've got my build on OpenBSD-i386 stuck in a loop compiling stage2/xgcc -Bstage2/ -B/usr/local/i386-unknown-openbsd3.7/bin/ -c -O2 -g -fomit-frame-pointer -gnatpg -gnata -I- -I. -Iada -I/spare/ports/lang/gcc/4.1/w-gcc-4.1-20050528/gcc-4.1-20050528/gcc/ada /spare/ports/lang/gcc/4.1/w-gcc-4.1-20050528/gcc-4.1-20050528/gcc/ada/ada.ads -o ada/ada.o I'm using an ada compiler bootstrapped from 3.3.6... My top says: 31002 espie 84 10 26M 2712K run - 107:26 96.14% gnat1 so I have little hope this will end. Does this ring a bell ? I assume some other people may have already run into that issue on less uncommon platforms, otherwise, I'll investigate...
Re: Sine and Cosine Accuracy
On Sun, May 29, 2005 at 05:52:11PM -0400, Scott Robert Ladd wrote: > (I expect Gabriel dos Rios to respond with something pithy here; please > don't disappoint me!) Funny, I don't expect any message from that signature. Gabriel dos Reis, on the other hand, may have something to say...
Re: Will Apple still support GCC development?
In article <[EMAIL PROTECTED]> you write: >Samuel Smythe wrote: >> It is well-known that Apple has been a significant provider of GCC >> enhancements. But it is also probably now well-known that they have >> opted to drop the PPC architecture in favor of an x86-based >> architecture. Will Apple continue to contribute to the PPC-related >> componentry of GCC, or will such contributions be phased out as the >> transition is made to the x86-based systems? In turn, will Apple be >> providing more x86-related contributions to GCC? > >A better question might be: Has Intel provided Apple with an OS X >version of their compiler? If so (and I think it very likely), Apple may >have little incentive for supporting GCC, given how well Intel's >compilers perform. Oh sure, and Intel as an Obj-C++ compiler up their sleeve... right. Speculations, speculations. Wait and see...
Re: signed is undefined and has been since 1992 (in GCC)
In article <[EMAIL PROTECTED]> you write: >Both OpenSSL and Apache programmers did this, in carefully reviewed >code which was written in response to a security report. They simply >didn't know that there is a potential problem. The reason for this >gap in knowledge isn't quite clear to me. Well, it's reasonably clear to me. I've been reviewing code for the OpenBSD project, it's incredible the number of errors you can find in code which is supposed to - have been written by competent programmers; - have been reviewed by tens of people. Quite simply, formal code reviews in free software don't work. The `many eyes' paradigm is a fallacy. Ten persons can look at the same code and fail to notice a problem if they don't look for the right thing. A lot of people don't even think about overflows when they look at arithmetic, there are a lot of integer overflows out there. I still routinely find off-by-one accesses in buffers, some of them quite obvious. The only reasons I see them is because my malloc can put allocations on page boundaries, and thus the program barfs here, and not on other machines. A lot of people don't know about the peculiarities of C signed arithmetic. A lot of `portable' code that uses C arithmetic buries such peculiarities under tons of macros and typedefs such that it is really hard to figure out what's going on even if you understand the issues. >From past experience, both Apache and OpenSSL are very bad in that regards. Bottom-line is, if it passes tests on major architectures and major OSes, it's very unlikely that someone will notice something is amiss, and that the same someone will have the knowledge to fix it. If it passes all practical tests, but is incorrect, from a language point of view, it is even more unlikely.
Re: Warning C vs C++
In article <[EMAIL PROTECTED]> you write: >On Saturday 17 September 2005 17:45, you wrote: >> That's a real misunderstanding. There are many warnings that are very >> specialized, and if -Wall really turned on all warnings, it would be >> essentially useless. The idea behind -Wall is that it represents a >> comprehensive set of warnings that most/many programmers can live >> with. To turn on all warnings would be the usability faux pas. >Ok, sure. This option is also used by many developers to see all possible >problems in their code. And btw, signed/unsigned isn't a minor problem. >Majority of code giving such warning is exploitable (in the black-hackish >terms). >I am developer myself, but just using gcc, hence my user's opinion. Typical black-hat attitude. Band-aid problems instead of writing correct code. Guessing at compiler's behavior instead of reading the specs and writing robust portable code. That's the big reason there are lots of security holes all over the place. People keep guessing and learning by trial and error.
Re: [RFC] add push/pop pragma to control the scope of "using"
On Wed, 15 Jan 2020, 马江 wrote: Hello, After some google, I find there is no way to control the scope of "using" for the moment. This seems strange as we definitely need this feature especially when writing inline member functions in c++ headers. Currently I am trying to build a simple class in a c++ header file as following: #include using namespace std; class mytest { string test_name; int test_val; public: inline string & get_name () {return test_name;} }; Why is mytest in the global namespace? As a experienced C coder, I know that inline functions must be put into headers or else users could only rely on LTO. And I know that to use "using" in a header file is a bad idea as it might silently change meanings of other codes. However, after I put all my inline functions into the header file, I found I must write many "std::string" instead of "string" which is totally a torture. Can we add something like "#pragma push_using" (just like #pragma pop_macro)? I believe it's feasible and probably not hard to implement. We try to avoid extensions in gcc, you may want to propose this to the C++ standard committee first. However, you should first check if modules (C++20) affect the issue. -- Marc Glisse
Re: How to get the data dependency of GIMPLE variables?
On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote: I am trying to analyze the following gimple statements, where the data dependency of _23 is a tree, whose leave nodes are three constant values {13, 4, 14}. Could anyone shed some light on how such a backward traversal can be implemented? Given _22 used in the last assignment, I have no idea of how to trace back to its definition on the fourth statement... Thank you very much! SSA_NAME_DEF_STMT _13 = 13; _14 = _13 + 4; _15 = 14; _22 = (unsigned long) _15; _23 = _22 + _14; -- Marc Glisse
Re: How to get the data dependency of GIMPLE variables?
On Mon, 15 Jun 2020, Shuai Wang via Gcc wrote: Dear Marc, Thank you very much! Just another quick question.. Can I iterate the operands of a GIMPLE statement, like how I iterate a LLVM instruction in the following way? Instruction* instr; for (size_t i=0; i< instr->getNumOperands();i++) { instr->getOperand(i)) } Sorry for such naive questions.. I actually searched the documents and GIMPLE pretty print for a while but couldn't find such a way of accessing arbitrary numbers of operands... https://gcc.gnu.org/onlinedocs/gccint/GIMPLE_005fASSIGN.html or for lower level https://gcc.gnu.org/onlinedocs/gccint/Logical-Operators.html#Operand-vector-allocation But really you need to look at the code of gcc. Search for places that use SSA_NAME_DEF_STMT and see what they do with the result. -- Marc Glisse
Re: Local optimization options
On Sun, 5 Jul 2020, Thomas König wrote: Am 04.07.2020 um 19:11 schrieb Richard Biener : On July 4, 2020 11:30:05 AM GMT+02:00, "Thomas König" wrote: What could be a preferred way to achieve that? Could optimization options like -ffast-math be applied to blocks instead of functions? Could we set flags on the TREE codes to allow certain optinizations? Other things? The middle end can handle those things on function granularity only. Richard. OK, so that will not work (or not without a disproportionate amount of effort). Would it be possible to set something like a TREE_FAST_MATH flag on TREEs? An operation could then be optimized according to these rules iff both operands had that flag, and would also have it then. In order to support various semantics on floating point operations, I was planning to replace some trees with internal functions, with an extra operand to specify various behaviors (rounding, exception, etc). Although at least in the beginning, I was thinking of only using those functions in safe mode, to avoid perf regressions. https://gcc.gnu.org/pipermail/gcc-patches/2019-August/527040.html This may never happen now, but it sounds similar to setting flags like TREE_FAST_MATH that you are suggesting. I was going with functions for more flexibility, and to avoid all the existing assumptions about trees. While I guess for fast-math, the worst the assumptions could do is clear the flag, which would make use optimize less than possible, not so bad. -- Marc Glisse
Re: [RFC] Add new flag to specify output constraint in match.pd
On Fri, 21 Aug 2020, Feng Xue OS via Gcc wrote: There is a match-folding issue derived from pr94234. A piece of code like: int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); return t1 - t2; } It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and be folded to constant "8". But this folding will fail if both v1 and v2 have multiple uses, as the following code. int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); use_fn (t1, t2); return t1 - t2; } Given an expression with non-single-use operands, folding it will introduce duplicated computation in most situations, and is deemed to be unprofitable. But it is always beneficial if final result is a constant or existing SSA value. And the rule is: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type) /* If @1 +- @2 is constant require a hard single-use on either original operand (but not on both). */ && (single_use (@3) || single_use (@4))) <- control whether match or not (mult (plusminus @1 @2) @0))) Current matcher only provides a way to check something before folding, but no mechanism to affect decision after folding. If has, for the above case, we can let it go when we find result is a constant. :s already has a counter-measure where it still folds if the output is at most one operation. So this transformation has a counter-counter-measure of checking single_use explicitly. And now we want a counter^3-measure... Like the way to describe input operand using flags, we could also add a new flag to specify this kind of constraint on output that we expect it is a simple gimple value. Proposed syntax is (opcode:v{ condition } ) The char "v" stands for gimple value, if more descriptive, other char is preferred. "condition" enclosed by { } is an optional c-syntax condition expression. If present, only when "condition" is met, matcher will check whether folding result is a gimple value using gimple_simplified_result_is_gimple_val (). Since there is no SSA concept in GENERIC, this is only for GIMPLE-match, not GENERIC-match. With this syntax, the rule is changed to #Form 1: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) ( if (!single_use (@3) && !single_use (@4)) (mult:v (plusminus @1 @2) @0))) (mult (plusminus @1 @2) @0) That seems to match what you can do with '!' now (that's very recent). #Form 2: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) (mult:v{ !single_use (@3) && !single_use (@4 } (plusminus @1 @2) @0 Indeed, something more flexible than '!' would be nice, but I am not so sure about this version. If we are going to allow inserting code after resimplification and before validation, maybe we should go even further and let people insert arbitrary code there... -- Marc Glisse
Re: [RFC] Add new flag to specify output constraint in match.pd
On Wed, 2 Sep 2020, Richard Biener via Gcc wrote: On Mon, Aug 24, 2020 at 8:20 AM Feng Xue OS via Gcc wrote: There is a match-folding issue derived from pr94234. A piece of code like: int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); return t1 - t2; } It can be perfectly caught by the rule "(A * C) +- (B * C) -> (A +- B) * C", and be folded to constant "8". But this folding will fail if both v1 and v2 have multiple uses, as the following code. int foo (int n) { int t1 = 8 * n; int t2 = 8 * (n - 1); use_fn (t1, t2); return t1 - t2; } Given an expression with non-single-use operands, folding it will introduce duplicated computation in most situations, and is deemed to be unprofitable. But it is always beneficial if final result is a constant or existing SSA value. And the rule is: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type) /* If @1 +- @2 is constant require a hard single-use on either original operand (but not on both). */ && (single_use (@3) || single_use (@4))) <- control whether match or not (mult (plusminus @1 @2) @0))) Current matcher only provides a way to check something before folding, but no mechanism to affect decision after folding. If has, for the above case, we can let it go when we find result is a constant. :s already has a counter-measure where it still folds if the output is at most one operation. So this transformation has a counter-counter-measure of checking single_use explicitly. And now we want a counter^3-measure... Counter-measure is key factor to matching-cost. ":s" seems to be somewhat coarse-grained. And here we do need more control over it. But ideally, we could decouple these counter-measures from definitions of match-rule, and let gimple-matcher get a more reasonable match-or-not decision based on these counters. Anyway, it is another story. Like the way to describe input operand using flags, we could also add a new flag to specify this kind of constraint on output that we expect it is a simple gimple value. Proposed syntax is (opcode:v{ condition } ) The char "v" stands for gimple value, if more descriptive, other char is preferred. "condition" enclosed by { } is an optional c-syntax condition expression. If present, only when "condition" is met, matcher will check whether folding result is a gimple value using gimple_simplified_result_is_gimple_val (). Since there is no SSA concept in GENERIC, this is only for GIMPLE-match, not GENERIC-match. With this syntax, the rule is changed to #Form 1: (simplify (plusminus (mult:cs@3 @0 @1) (mult:cs@4 @0 @2)) (if ((!ANY_INTEGRAL_TYPE_P (type) || TYPE_OVERFLOW_WRAPS (type) || (INTEGRAL_TYPE_P (type) && tree_expr_nonzero_p (@0) && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)) ( if (!single_use (@3) && !single_use (@4)) (mult:v (plusminus @1 @2) @0))) (mult (plusminus @1 @2) @0) That seems to match what you can do with '!' now (that's very recent). It's also what :s does but a slight bit more "local". When any operand is marked :s and it has more than a single-use we only allow simplifications that do not require insertion of extra stmts. So basically the above pattern doesn't behave any different than if you omit your :v. Only if you'd place :v on an inner expression there would be a difference. Correlating the inner expression we'd not want to insert new expressions for with a specific :s (or multiple ones) would be a more natural extension of what :s provides. Thus, for the above case (Form 1), you do not need :v at all and :s works. Let's consider that multiplication is expensive. We have code like 5*X-3*X, which can be simplified to 2*X. However, if both 5*X and 3*X have other uses, that would increase the number of multiplications. :s would not block a simplification to 2*X, which is a single stmt. So the existing transformation has extra explicit checks for single_use. And those extra checks block the transformation even for 5*X-4*X -> X which does not increase the number of multiplications. Which is where '!' (or :v here) comes in. Or we could decide that the extra multiplication is not that bad if it saves an addition, simplifies the expression, possibly gains more insn parallelism, etc, in which case we could just drop the existing hard single_use check... -- Marc Glisse
Re: A couple GIMPLE questions
On Sat, 5 Sep 2020, Gary Oblock via Gcc wrote: First off one of the questions just me being curious but second is quite serious. Note, this is GIMPLE coming into my optimization and not something I've modified. Here's the C code: type_t * do_comp( type_t *data, size_t len) { type_t *res; type_t *x = min_of_x( data, len); type_t *y = max_of_y( data, len); res = y; if ( x < y ) res = 0; return res; } And here's the resulting GIMPLE: ;; Function do_comp.constprop (do_comp.constprop.0, funcdef_no=5, decl_uid=4392, cgraph_uid=3, symbol_order=68) (executed once) do_comp.constprop (struct type_t * data) { struct type_t * res; struct type_t * x; struct type_t * y; size_t len; [local count: 1073741824]: [local count: 1073741824]: x_2 = min_of_x (data_1(D), 1); y_3 = max_of_y (data_1(D), 1); if (x_2 < y_3) goto ; [29.00%] else goto ; [71.00%] [local count: 311385128]: [local count: 1073741824]: # res_4 = PHI return res_4; } The silly question first. In the "if" stmt how does GCC get those probabilities? Which it shows as 29.00% and 71.00%. I believe they should both be 50.00%. See the profile_estimate pass dump. One branch makes the function return NULL, which makes gcc guess that it may be a bit less likely than the other. Those are heuristics, which are tuned to help on average, but of course they are sometimes wrong. The serious question is what is going on with this phi? res_4 = PHI This makes zero sense practicality wise to me and how is it supposed to be recognized and used? Note, I really do need to transform the "0B" into something else for my structure reorganization optimization. That's not a question? Are you asking why PHIs exist at all? They are the standard way to represent merging in SSA representations. You can iterate on the PHIs of a basic block, etc. CONFIDENTIALITY NOTICE: This e-mail message, including any attachments, is for the sole use of the intended recipient(s) and contains information that is confidential and proprietary to Ampere Computing or its subsidiaries. It is to be used solely for the purpose of furthering the parties' business relationship. Any unauthorized review, copying, or distribution of this email (or any attachments thereto) is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. Could you please get rid of this when posting on public mailing lists? -- Marc Glisse
Re: Installing a generated header file
On Thu, 12 Nov 2020, Bill Schmidt via Gcc wrote: Hi! I'm working on a project where it's desirable to generate a target-specific header file while building GCC, and install it with the rest of the target-specific headers (i.e., in lib/gcc//11.0.0/include). Today it appears that only those headers listed in "extra_headers" in config.gcc will be placed there, and those are assumed to be found in gcc/config/. In my case, the header file will end up in my build directory instead. Questions: * Has anyone tried something like this before? I didn't find anything. * If so, can you please point me to an example? * Otherwise, I'd be interested in advice about providing new infrastructure to support this. I'm a relative noob with respect to the configury code, and I'm sure my initial instincts will be wrong. :) Does the i386 mm_malloc.h file match your scenario? -- Marc Glisse
Re: Reassociation and trapping operations
On Wed, 25 Nov 2020, Ilya Leoshkevich via Gcc wrote: I have a C floating point comparison (a <= b && a >= b), which test_for_singularity turns into (a <= b && a == b) and vectorizer turns into ((a <= b) & (a == b)). So far so good. eliminate_redundant_comparison, however, turns it into just (a == b). I don't think this is correct, because (a <= b) traps and (a == b) doesn't. Hello, let me just mention the old https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53805 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53806 There has been some debate about the exact meaning of -ftrapping-math, but don't let that stop you. -- Marc Glisse
Re: The conditions when convert from double to float is permitted?
On Thu, 10 Dec 2020, Xionghu Luo via Gcc wrote: I have a maybe silly question about whether there is any *standard* or *options* (like -ffast-math) for GCC that allow double to float demotion optimization? For example, 1) from PR22326: #include float foo(float f, float x, float y) { return (fabs(f)*x+y); } The fabs will return double result but it could be demoted to float actually since the function returns float finally. With fp-contract, this is (float)fma((double)f,(double)x,(double)y). This could almost be transformed into fmaf(f,x,y), except that the double rounding may not be strictly equivalent. Still, that seems like it would be no problem with -funsafe-math-optimizations, just like turning (float)((double)x*(double)y) into x*y, as long as it is a single operation with casts on all inputs and output. Whether there are cases that can be optimized without -funsafe-math-optimizations is harder to tell. -- Marc Glisse
Re: Integer division on x86 -m32
On Thu, 10 Dec 2020, Lucas de Almeida via Gcc wrote: when performing (int64_t) foo / (int32_t) bar in gcc under x86, a call to __divdi3 is always output, even though it seems the use of the idiv instruction could be faster. IIRC, idiv requires that the quotient fit in 32 bits, while your C code doesn't. (1LL << 60) / 3 would cause an error with idiv. It would be possible to use idiv in some cases, if the compiler can prove that variables are in the right range, but that's not so easy. You can use inline asm to force the use of idiv if you know it is safe for your case, the most common being modular arithmetic: if you know that uint32_t a, b, c, d are smaller than m (and m!=0), you can compute a*b+c+d in uint64_t, then use div to compute that modulo m. -- Marc Glisse
Re: What is the type of vector signed + vector unsigned?
On Tue, 29 Dec 2020, Richard Sandiford via Gcc wrote: Any thoughts on what f should return in the following testcase, given the usual GNU behaviour of treating signed >> as arithmetic shift right? typedef int vs4 __attribute__((vector_size(16))); typedef unsigned int vu4 __attribute__((vector_size(16))); int f (void) { vs4 x = { -1, -1, -1, -1 }; vu4 y = { 0, 0, 0, 0 }; return ((x + y) >> 1)[0]; } The C frontend takes the type of x+y from the first operand, so x+y is signed and f returns -1. Symmetry is an important property of addition in C/C++. The C++ frontend applies similar rules to x+y as it would to scalars, with unsigned T having a higher rank than signed T, so x+y is unsigned and f returns 0x7fff. That looks like the most natural choice. FWIW, Clang treats x+y as signed, so f returns -1 for both C and C++. I think clang follows gcc and uses the type of the first operand. -- Marc Glisse
Re: bug in DSE?
On Fri, 12 Feb 2021, Andrew MacLeod via Gcc wrote: I dont't want to immediately open a PR, so I'll just ask about testsuite/gcc.dg/pr83609.c. the compilation string is -O2 -fno-tree-forwprop -fno-tree-ccp -fno-tree-fre -fno-tree-pre -fno-code-hoisting Which passes as is. if I however add -fno-tree-vrp as well, then it looks like dead store maybe does something wong... with EVRP running, we translate function foo() from complex float foo () { complex float c; complex float * c.0_1; complex float _4; : c.0_1 = &c; MEM[(long long unsigned int *)c.0_1] = 1311768467463790320; _4 = c; Isn't that a clear violation of strict aliasing? -- Marc Glisse
Re: Possible issue with ARC gcc 4.8
On Mon, 6 Jul 2015, Vineet Gupta wrote: It is the C language standard that says that shifts like this invoke undefined behavior. Right, but the compiler is a program nevertheless and it knows what to do when it sees 1 << 62 It's not like there is an uninitialized variable or something which will provide unexpected behaviour. More importantly, the question is can ports define a specific behaviour for such cases and whether that would be sufficient to guarantee the semantics. The point being ARC ISA provides a neat feature where core only considers lower 5 bits of bitpos operands. Thus we can make such behaviour not only deterministic in the context of ARC, but also optimal, eliding the need for doing specific masking/clamping to 5 bits. IMO, writing a << (b & 31) instead of a << b has only advantages. It documents the behavior you are expecting. It makes the code standard-conformant and portable. And the back-ends can provide patterns for exactly this so they generate a single insn (the same as for a << b). When I see x << 1024, 0 is the only value that makes sense to me, and I'd much rather get undefined behavior (detected by sanitizers) than silently get 'x' back. -- Marc Glisse
Re: [RFH] Move some flag_unsafe_math_optimizations using simplify and match
On Fri, 7 Aug 2015, Hurugalawadi, Naveen wrote: Please find attached the patch "simplify-1.patch" that moves some "flag_unsafe_math_optimizations" from fold-const.c to simplify and match. Some random comments (not a review). First, patches go to gcc-patc...@gcc.gnu.org. /* fold_builtin_logarithm */ (if (flag_unsafe_math_optimizations) Please indent everything below by one space. + +/* Simplify sqrt(x) * sqrt(x) -> x. */ +(simplify + (mult:c (SQRT @0) (SQRT @0)) (mult (SQRT@1 @0) @1) + (if (!HONOR_SNANS (element_mode (type))) You don't need element_mode here, HONOR_SNANS (type) should do the right thing. + @0)) + +/* Simplify root(x) * root(y) -> root(x*y). */ +/* FIXME : cbrt ICE's with AArch64. */ +(for root (SQRT CBRT) Indent below. +(simplify + (mult:c (root @0) (root @1)) No need to commute, it yields the same pattern. On the other hand, you may want root:s since if the roots are going to be computed anyway, a multiplication is cheaper than computing yet another root (I didn't check what the existing code does). (this applies to several other patterns) + (root (mult @0 @1 + +/* Simplify expN(x) * expN(y) -> expN(x+y). */ +(for exps (EXP EXP2) +/* FIXME : exp2 ICE's with AArch64. */ +(simplify + (mult:c (exps @0) (exps @1)) + (exps (plus @0 @1 I am wondering if we should handle mixed operations (say expf(x)*exp2(y)), for this pattern and others, but that's not a prerequisite. + +/* Simplify pow(x,y) * pow(x,z) -> pow(x,y+z). */ +(simplify + (mult:c (POW @0 @1) (POW @0 @2)) + (POW @0 (plus @1 @2))) + +/* Simplify pow(x,y) * pow(z,y) -> pow(x*z,y). */ +(simplify + (mult:c (POW @0 @1) (POW @2 @1)) + (POW (mult @0 @2) @1)) + +/* Simplify tan(x) * cos(x) -> sin(x). */ +(simplify + (mult:c (TAN @0) (COS @0)) + (SIN @0)) Since this will only trigger for the same version of cos and tan (say cosl with tanl or cosf with tanf), I am wondering if we get smaller code with a linear 'for' or with a quadratic 'for' which shares the same tail (I assume the above is quadratic, I did not check). This may depend on Richard's latest patches. + +/* Simplify x * pow(x,c) -> pow(x,c+1). */ +(simplify + (mult:c @0 (POW @0 @1)) + (if (TREE_CODE (@1) == REAL_CST + && !TREE_OVERFLOW (@1)) + (POW @0 (plus @1 { build_one_cst (type); } + +/* Simplify sin(x) / cos(x) -> tan(x). */ +(simplify + (rdiv (SIN @0) (COS @0)) + (TAN @0)) + +/* Simplify cos(x) / sin(x) -> 1 / tan(x). */ +(simplify + (rdiv (COS @0) (SIN @0)) + (rdiv { build_one_cst (type); } (TAN @0))) + +/* Simplify sin(x) / tan(x) -> cos(x). */ +(simplify + (rdiv (SIN @0) (TAN @0)) + (if (! HONOR_NANS (@0) + && ! HONOR_INFINITIES (element_mode (@0))) + (cos @0))) + +/* Simplify tan(x) / sin(x) -> 1.0 / cos(x). */ +(simplify + (rdiv (TAN @0) (SIN @0)) + (if (! HONOR_NANS (@0) + && ! HONOR_INFINITIES (element_mode (@0))) + (rdiv { build_one_cst (type); } (COS @0 + +/* Simplify pow(x,c) / x -> pow(x,c-1). */ +(simplify + (rdiv (POW @0 @1) @0) + (if (TREE_CODE (@1) == REAL_CST + && !TREE_OVERFLOW (@1)) + (POW @0 (minus @1 { build_one_cst (type); } + +/* Simplify a/root(b/c) into a*root(c/b). */ +/* FIXME : cbrt ICE's with AArch64. */ +(for root (SQRT CBRT) +(simplify + (rdiv @0 (root (rdiv @1 @2))) + (mult @0 (root (rdiv @2 @1) + +/* Simplify x / expN(y) into x*expN(-y). */ +/* FIXME : exp2 ICE's with AArch64. */ +(for exps (EXP EXP2) +(simplify + (rdiv @0 (exps @1)) + (mult @0 (exps (negate @1) + +/* Simplify x / pow (y,z) -> x * pow(y,-z). */ +(simplify + (rdiv @0 (POW @1 @2)) + (mult @0 (POW @1 (negate @2 + /* Special case, optimize logN(expN(x)) = x. */ (for logs (LOG LOG2 LOG10) exps (EXP EXP2 EXP10) -- Marc Glisse
Re: Replacing malloc with alloca.
On Sun, 13 Sep 2015, Ajit Kumar Agarwal wrote: The replacement of malloc with alloca can be done on the following analysis. If the lifetime of an object does not stretch beyond the immediate scope. In such cases the malloc can be replaced with alloca. This increases the performance to a great extent. Inlining helps to a great extent the scope of lifetime of an object doesn't stretch the immediate scope of an object. And the scope of replacing malloc with alloca can be identified. I am wondering what phases of our optimization pipeline the malloc is replaced with alloca and what analysis is done to transform The malloc with alloca. This greatly increases the performance of the benchmarks? Is the analysis done through Escape Analysis? If yes, then what data structure is used for the abstract execution interpretation? Did you try it? I don't think gcc ever replaces malloc with alloca. The only optimization we do with malloc/free is removing it when it is obviously unused. There are several PRs open about possible optimizations (19831 for instance). I posted a WIP patch a couple years ago to replace some malloc+free with local arrays (fixed length) but never had time to finish it. https://gcc.gnu.org/ml/gcc-patches/2013-11/msg03108.html -- Marc Glisse
Re: Multiprecision Arithmetic Builtins
On Mon, 21 Sep 2015, Florian Weimer wrote: On 09/21/2015 08:09 AM, Oleg Endo wrote: Hi all, I was thinking of adding some SH specific builtin functions for the addc, subc and negc instructions. Are there any plans to add clang's target independent multiprecision arithmetic builtins (http://clang.llvm.org/docs/LanguageExtensions.html) to GCC? Do you mean these? <https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html> Is there something else that is missing? http://clang.llvm.org/docs/LanguageExtensions.html#multiprecision-arithmetic-builtins Those that take a carryin argument. -- Marc Glisse
Re: avoiding recursive calls of calloc due to optimization
On Mon, 21 Sep 2015, Daniel Gutson wrote: This is derived from https://gcc.gnu.org/ml/gcc-help/2015-03/msg00091.html Currently, gcc provides an optimization that transforms a call to malloc and a call to memset into a call to calloc. This is fine except when it takes place within the calloc() function implementation itself, causing a recursive call. Two alternatives have been proposed: -fno-malloc-builtin and disable optimizations in calloc(). I think the former is suboptimal since it affects all the code just because of the implementation of one function (calloc()), whereas the latter is suboptimal too since it disables the optimizations in the whole function (calloc too). I think of two alternatives: either make -fno-calloc-builtin to disable the optimization, or make the optimization aware of the function context where it is operating and prevent it to do the transformation if the function is calloc(). Please help me to find the best alternative so we can implent it. You may want to read this PR for more context https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56888#c27 -- Marc Glisse
Re: complex support when using -std=c++11
On Thu, 12 Nov 2015, D Haley wrote: I am currently trying to understand an issue to do with complex number support in gcc. Consider the following code: #include int main() { float _Complex a = _Complex_I; } Attempting to compile this with these commands is fine: $ g++ tmp.cpp -std=gnu++11 $ g++ tmp.cpp Clang is also fine: $ clang tmp.cpp -std=c++11 Not here, I am getting the same error with clang (or "use of undeclared identifier '_Complex_I'" with libc++). This probably depends more on your libc. Attempting to compile with c++11 is not: $ g++ tmp.cpp -std=c++11 In file included from /usr/include/c++/5/complex.h:36:0, from tmp.cpp:2: tmp.cpp: In function ‘int main()’: tmp.cpp:5:29: error: unable to find numeric literal operator ‘operator""iF’ float _Complex a = _Complex_I; ^ tmp.cpp:5:29: note: use -std=gnu++11 or -fext-numeric-literals to enable more built-in suffixes I'm using debian testing's gcc: $ gcc --version gcc (Debian 5.2.1-17) 5.2.1 20150911 ... I discussed this on #gcc, and it was suggested (or I misunderstood) that this is intentional, and the library should not support c-type C++ primitives - however I can find no deprecation notice for this, nor does it appear that the c++11 standard (as far as I can see from a quick skim) has changed the behaviour in this regard. Is this intended behaviour, or is this a bug? This behaviour was noticed when troubleshooting compilation behaviours in mathgl. https://groups.google.com/forum/?_escaped_fragment_=topic/mathgl/cl4uYygPmOU#!topic/mathgl/cl4uYygPmOU C++11, for some unknown reason, decided to hijack the C header complex.h and make it equivalent to the C++ header complex. The fact that you are still getting _Complex_I defined is already a gcc extension, as is providing _Complex in C++. The C++ standard introduced User Defined Literals, which prevents the compiler from recognizing extra suffixes like iF in standard mode (why are so many people using c++11 and not gnu++11?). Our support for complex.h in C++11 in gcc is kind of best-effort. In this case, I can think of a couple ways we could improve this * _Complex_I is defined as (__extension__ 1.0iF). Maybe __extension__ could imply -fext-numeric-literals? * glibc could define _Complex_I some other way, or libstdc++ could redefine it to some other safer form (for some reason __builtin_complex is currently C-only). -- Marc Glisse
Re: GCC 5.4 Status report (2015-12-04)
On Fri, 4 Dec 2015, NightStrike wrote: Will there be another 4.9 release, too? I'm really hoping that branch can stay open a bit, since I can't upgrade to the new std::string implementation yet. Uh? The new ABI in libstdc++ is supposed to be optional, you can still use the old std::string in gcc-5, can't you? -- Marc Glisse
RE: GCC Front-End Questions
On Tue, 8 Dec 2015, Jodi A. Miller wrote: One algebraic simplification we are seeing is particularly interesting. Given the following code snippet intended to check for buffer overflow, which is actually undefined behavior in C++, we expected to maybe see the if check optimized away entirely. char buffer[100]; int length; //value received through argument or command line . . If (buffer + length < buffer) { cout << "Overflow" << endl; } Instead, our assembly code showed that the conditional was changed to length < 0, which is not what was intended at all. Again, this showed up in the first IR file generated with g++ so we are thinking it happened in the compiler front-end, which is surprising. Any thoughts on this? In addition, when the above conditional expression is not used as part of an if check (e.g., assigned to a Boolean), it is not simplified. Those optimizations during parsing exist mostly for historical reasons, and we are slowly moving away from them. You can look for any function call including "fold" in its name in the front-end. They work on expressions and mostly consist of matching patterns (described in fold-const.c and match.pd), like p + n < p in this case. -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Dominik Vogt wrote: This snippet ist from the Plumhall 2014 xvs test suite: #if CXX03 || CXX11 || CXX14 static float (*p1_)(float) = abs; ... checkthat(__LINE__, p1_ != 0); #endif (With the testsuite specific macros doing the obvious). abs() is declared as: int abs(int j) Am I missing some odd C++ feature or is that part of the test just plain wrong? I don't know where to look in the C++ standard; is this supposed to compile (with or without a warning?) or generate an error or is it just undefined? error: invalid conversion from ‘int (*)(int) throw ()’ to ‘float (*)(float)’ [-fpermissive] (Of course even with -fpermissive this won't work because (at least on my platform) ints are passed in different registers than floats.) There are other overloads of 'abs' declared in math.h / cmath (only in namespace std in the second case, and there are bugs (or standard issues) about having them in the global namespace for the first one). -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Jonathan Wakely wrote: There are other overloads of 'abs' declared in math.h / cmath (only in namespace std in the second case, and there are bugs (or standard issues) about having them in the global namespace for the first one). That's not quite accurate, C++11 was altered slightly to reflect reality. is required to declare std::abs and it's unspecified whether it also declares it as ::abs. is required to declare ::abs and it's unspecified whether it also declares it as std::abs. $ cat a.cc #include int main(){ abs(3.5); } $ g++-snapshot a.cc -c -Wall -W a.cc: In function 'int main()': a.cc:3:10: error: 'abs' was not declared in this scope abs(3.5); ^ That's what I called "bug" in my message (there are a few bugzilla PRs for this). It would probably work on Solaris. And I seem to remember there are at least 2 open LWG issues on the topic, one saying that the C++11 change didn't go far enough to match reality, since it still documents C headers differently from the C standard, and one saying that all overloads of abs should be declared as soon as one is (yes, they contradict each other). -- Marc Glisse
Re: Strange C++ function pointer test
On Thu, 31 Dec 2015, Dominik Vogt wrote: The minimal failing program is -- abs.C -- #include static float (*p1_)(float) = abs; -- abs.C -- This is allowed to fail. If you include math.h (in addition or instead of stdlib.h), it has to work (gcc bug if it doesn't). See also http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-active.html#2294 -- Marc Glisse
Re: getting bugzilla access for my account
On Sat, 2 Jan 2016, Mike Frysinger wrote: seeing as how i have commit access to the gcc tree, could i have my bugzilla privs extended as well ? atm i only have normal ones which means i only get to edit my own bugs ... can't dupe/update other ones people have filed. couldn't seem to find docs for how to request this, so spamming this list. my account on gcc.gnu.org/bugzilla is "vap...@gentoo.org". Permissions are automatic for @gcc addresses, you should create a new account with that one (you can make it follow the old account, etc). -- Marc Glisse
Re: RFC: Update Intel386, x86-64 and IA MCU psABIs for passing/returning empty struct
On Sat, 20 Feb 2016, H.J. Lu wrote: On Fri, Feb 19, 2016 at 1:07 PM, Richard Smith wrote: On Fri, Feb 19, 2016 at 5:35 AM, Michael Matz wrote: Hi, On Thu, 18 Feb 2016, Richard Smith wrote: An empty type is a type where it and all of its subobjects (recursively) are of class, structure, union, or array type. No memory slot nor register should be used to pass or return an object of empty type. The trivially copyable is gone again. Why is it not necessary? The C++ ABI doesn't defer to the C psABI for types that aren't trivially-copyable. See http://mentorembedded.github.io/cxx-abi/abi.html#normal-call Hmm, yes, but we don't want to define something for only C and C++, but language independend (so far as possible). And given only the above language I think this type: struct S { S() {something();} }; would be an empty type, and that's not what we want. Yes it is. Did you mean to give S a copy constructor, copy assignment operator, or destructor instead? "Trivially copyable" is a reasonably common abstraction (if in doubt we could even define it in the ABI), and captures the idea that we need well (namely that a bit-copy is enough). In this case: struct dummy0 { }; struct dummy { dummy0 d[20]; dummy0 * foo (int i); }; dummy0 * dummy::foo (int i) { return &d[i]; } dummy0 * bar (dummy d, int i) { return d.foo (i); } dummy shouldn't be passed as empty type. Why not? We need to have a clear definition for what kinds of member functions are allowed in an empty type. -- Marc Glisse
Re: Subtyping support in GCC?
On Wed, 23 Mar 2016, Jason Chagas wrote: The the ARM compiler (armcc) provides a subtyping ($Sub/$Super) mechanism useful as a patching technique (see links below for details). Can someone tell me if GCC has similar support? If so, where can I learn more about it? FYI, before posting this question here, I researched the web extensivelly on this topic. There seems to be some GNU support for subtyping in C++. But I had no luck finding any information specifically for 'C'. Thanks, Jason How to use $Super$$ and $Sub$$ for patching data?: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15416.html Using $Super$$ and $Sub$$ to patch symbol definitions: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0474c/Chdefdce.html (the best list would have been gcc-h...@gcc.gnu.org) GNU ld has an option --wrap=symbol. Does that roughly match your need? -- Marc Glisse
Re: Constexpr in intrinsics?
On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote: Would it be possible to add constexpr to the intrinsics headers? For instance _mm_set_XX and _mm_setzero intrinsics. Already suggested here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197 A patch would be welcome (I started doing it at some point, I don't remember if it was functional, the patch is attached). Ideally it could also be added all intrinsics that can be evaluated at compile time, but it is harder to tell which those are. Does gcc have a C extension we can use to set constexpr? What for? -- Marc GlisseIndex: gcc/config/i386/avx2intrin.h === --- gcc/config/i386/avx2intrin.h(revision 223886) +++ gcc/config/i386/avx2intrin.h(working copy) @@ -93,41 +93,45 @@ _mm256_packus_epi32 (__m256i __A, __m256 return (__m256i)__builtin_ia32_packusdw256 ((__v8si)__A, (__v8si)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_packus_epi16 (__m256i __A, __m256i __B) { return (__m256i)__builtin_ia32_packuswb256 ((__v16hi)__A, (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qu)__A + (__v32qu)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hu)__A + (__v16hu)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi32 (__m256i __A, __m256i __B) { return (__m256i) ((__v8su)__A + (__v8su)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi64 (__m256i __A, __m256i __B) { return (__m256i) ((__v4du)__A + (__v4du)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_adds_epi8 (__m256i __A, __m256i __B) @@ -167,20 +171,21 @@ _mm256_alignr_epi8 (__m256i __A, __m256i } #else /* In that case (__N*8) will be in vreg, and insn will not be matched. */ /* Use define instead */ #define _mm256_alignr_epi8(A, B, N) \ ((__m256i) __builtin_ia32_palignr256 ((__v4di)(__m256i)(A), \ (__v4di)(__m256i)(B), \ (int)(N) * 8)) #endif +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_and_si256 (__m256i __A, __m256i __B) { return (__m256i) ((__v4du)__A & (__v4du)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_andnot_si256 (__m256i __A, __m256i __B) @@ -219,69 +224,77 @@ _mm256_blend_epi16 (__m256i __X, __m256i return (__m256i) __builtin_ia32_pblendw256 ((__v16hi)__X, (__v16hi)__Y, __M); } #else #define _mm256_blend_epi16(X, Y, M)\ ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \ (__v16hi)(__m256i)(Y), (int)(M))) #endif +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qi)__A == (__v32qi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hi)__A == (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi32 (__m256i __A, __m256i __B) { return (__m256i) ((__v8si)__A == (__v8si)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi64 (__m256i __A, __m256i __B) { return (__m256i) ((__v4di)__A == (__v4di)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi8 (__m256i __A, __m256i __B) { return (__m256i) ((__v32qi)__A > (__v32qi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi16 (__m256i __A, __m256i __B) { return (__m256i) ((__v16hi)__A > (__v16hi)__B); } +__GCC_X86_CONSTEXPR11 extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi32 (__m256i __A, __m256i __B) {
Re: Constexpr in intrinsics?
On Mon, 28 Mar 2016, Allan Sandfeld Jensen wrote: On Sunday 27 March 2016, Marc Glisse wrote: On Sun, 27 Mar 2016, Allan Sandfeld Jensen wrote: Would it be possible to add constexpr to the intrinsics headers? For instance _mm_set_XX and _mm_setzero intrinsics. Already suggested here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65197 A patch would be welcome (I started doing it at some point, I don't remember if it was functional, the patch is attached). That looks very similar to the patch I experimented with, and that at least works for using them in C++11 constexpr functions. Ideally it could also be added all intrinsics that can be evaluated at compile time, but it is harder to tell which those are. Does gcc have a C extension we can use to set constexpr? What for? To have similar functionality in C. For instance to explicitly allow those functions to be evaluated at compile time, and values with similar attributes be optimized completely out. Those intrinsics that are implemented without builtins can already be evaluated at compile time. #include __m128d f(){ __m128d a=_mm_set_pd(1,2); __m128d b=_mm_setr_pd(4,3); return _mm_add_pd(a, b); } The generated asm is just movapd .LC0(%rip), %xmm0 ret For the more esoteric intrinsics, what is missing is not in the parser, it is a folder that understands the behavior of each particular intrinsic. And of course avoid using precompiler noise, in shared C/C++ headers like these are. -- Marc Glisse
Re: Updating the GCC 6 release notes
On Tue, 3 May 2016, Damian Rouson wrote: Could someone please tell me how to edit or submit edits for the GCC 6 release notes at https://gcc.gnu.org/gcc-6/changes.html? Specially, the listed Fortran improvements are missing several significant items. I signed the copyright assignment in case hat helps. https://gcc.gnu.org/about.html#cvs You can send a diff to gcc-patc...@gcc.gnu.org to propose a patch (possibly Cc: the fortran mailing-list if your patch is related), same as code changes. -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Thu, 26 May 2016, martin krastev wrote: Hello, I've been scratching my head over an implicit conversion issue, depicted in the following code: typedef __attribute__ ((vector_size(4 * sizeof(int int generic_int32x4; struct Foo { Foo() { } Foo(const generic_int32x4& src) { } operator generic_int32x4() const { return (generic_int32x4){ 42 }; } }; struct Bar { Bar() { } Bar(const int src) { } operator int() const { return 42; } }; int main(int, char**) { const Bar b = Bar() + Bar(); const generic_int32x4 v = (generic_int32x4){ 42 } + (generic_int32x4){ 42 }; const Foo e = generic_int32x4(Foo()) + generic_int32x4(Foo()); const Foo f = Foo() + Foo(); const Foo g = (generic_int32x4){ 42 } + Foo(); const Foo h = Foo() + (generic_int32x4){ 42 }; return 0; } In the above, the initialization expression for local 'b' compiles as expected, and so do the expressions for locals 'v' and 'e'. The initializations of locals 'f', 'g' and 'h', though, fail to compile (under g++-6.1.1, likewise under 5.x and 4.x) with: $ g++-6 xxx.cpp xxx.cpp: In function ‘int main(int, char**)’: xxx.cpp:28:22: error: no match for ‘operator+’ (operand types are ‘Foo’ and ‘Foo’) const Foo f = Foo() + Foo(); ~~^~~ xxx.cpp:29:40: error: no match for ‘operator+’ (operand types are ‘generic_int32x4 {aka __vector(4) int}’ and ‘Foo’) const Foo g = (generic_int32x4){ 42 } + Foo(); ~~~^~~ xxx.cpp:30:22: error: no match for ‘operator+’ (operand types are ‘Foo’ and ‘generic_int32x4 {aka __vector(4) int}’) const Foo h = Foo() + (generic_int32x4){ 42 }; ~~^ Apparently there is some implicit conversion rule that stops g++ from doing the expected implicit conversions, but I can't figure out which rule that is. The fact clang handles the code without an issue does not help either. Any help will be appreciated. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57572 -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Thu, 26 May 2016, martin krastev wrote: Thank you for the reply. So it's a known g++ issue with a candidate patch. Looking at the patch, I was wondering, what precludes the generic vector types form being proper arithmetic types? In some cases vectors act like arithmetic types (operator+, etc), and in others they don't (conversions in general). We have scalarish_type_p for things that are scalars or vectors, we could add arithmeticish_type_p ;-) (I think the name arithmetic comes directly from the standard, so we don't want to change its meaning) -- Marc Glisse
Re: Implicit conversion to a generic vector type
On Fri, 27 May 2016, martin krastev wrote: A new arithmeticish type would take more effort, I understand. Marc, are there plans to incorporate your patch, perhaps in an extended form, in a release any time soon? There is no plan either way. When someone is motivated enough (I am not, currently), they will submit a patch to gcc-patc...@gcc.gnu.org, which will be reviewed. Note that a patch needs to include testcases (see the files in gcc/testsuite/g++.dg for examples). If you are interested, you could give it a try... -- Marc Glisse
Re: An issue with GCC 6.1.0's make install?
On Sat, 4 Jun 2016, Ethin Probst wrote: Yesterday I managed to successfully build GCC and all of the accompanying languages that it supports by default (Ada, C, C++, Fortran, Go, Java, Objective-C, Objective-C++, and Link-time Optimization (LTO)). I did not build JIT support because I have not herd if it is stable or not. Anyways, seeing as I didn't (and still do not) want to wait another 12 hours for that to build, I compressed it into a .tar.bz2 archive, Did you use "make -j 8" (where 8 is roughly how many CPUs you have in your server)? 12 hours seems excessive. copied it over to another server, decompressed it, and here's when the Did you copy it to exactly the same path as on the original server, preserving time stamps, and do both servers have identical systems? problems start. Keep in mind that I did ensure that all files were compressed and extracted. When I go into my build subdirectory build tree, and type "make install -s", it installs gnat, gcc (and g++), gfortran, gccgo, and gcj, but it errors out (and, subsequently, bales out) and says the following: Making install in tools make[3]: *** [install-recursive] Error 1 make[2]: *** [install-recursive] Error 1 make[1]: *** [install-target-libjava] Error 2 make: *** [install] Error 2 And then: $ gcj gcj: error: libgcj.spec: No such file or directory A more common approach would be to run "make install DESTDIR=/some/where", tar that directory, copy this archive to other servers, and untar it in the right location. That's roughly what linux distributions do. I'm considering the test suite, but until it installs, I'm not sure if executing the test suite would be very wise at this point. To get it to say that no input file was specified, I have to manually run the following commands: $ cd x86_64-pc-linux-gnu/libjava $ cp libgcj.spec /usr/bin That seems like a strange location for this file. Has the transportation of the source code caused the build tree to be messed up? I know that it works perfectly fine on my other server. Running make install without the -s command line parameter yields nothing. Have I done something wrong? "nothing" is not very helpful... Surely it gave some error message. -- Marc Glisse
Re: [RFC][Draft patch] Introduce IntegerSanitizer in GCC.
On Mon, 4 Jul 2016, Maxim Ostapenko wrote: Is community interested in such a tool? On the one hand, it is clearly useful since you found bugs thanks to it. On the other hand: 1) I hope we never reach the situation caused by Microsoft's infamous warning C4146 (which is even an error if you enable "secure" mode), where projects writing perfectly legal bignum code keep getting misguided reports by users who see those warnings. 2) This kind of encourages people to keep using unsigned types for non-negative integers, whereas they would be better reserved to bignum and bitfields (sadly, the standards make it hard to avoid unsigned types...). -- Marc Glisse
Vector unaligned load/store x86 intrinsics
Hello, I was considering changing the implementation of _mm_loadu_pd in x86's emmintrin.h to avoid a builtin. Here are 3 versions: typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, aligned(1))); __m128d f (double const *__P) { return __builtin_ia32_loadupd (__P); } __m128d g (double const *__P) { return *(__m128d_u*)(__P); } __m128d h (double const *__P) { __m128d __r; __builtin_memcpy (&__r, __P, 16); return __r; } f is what we have currently. f and g generate the same code. h also generates the same code except at -O0 where it is slightly longer. (note that I haven't regtested either version yet) 1) I don't have any strong preference between g and h, is there a reason to pick one over the other? I may have a slight preference for g, which expands to __m128d _3; _3 = MEM[(__m128d_u * {ref-all})__P_2(D)]; while h yields __int128 unsigned _3; _3 = MEM[(char * {ref-all})__P_2(D)]; _4 = VIEW_CONVERT_EXPR(_3); 2) Reading Intel's doc for movupd, it says: "If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary." Since we generate movupd for memcpy even when the alignment is presumably only 1 byte, I assume that this alignment-check stuff is not supported by gcc? -- Marc Glisse
Re: Vector unaligned load/store x86 intrinsics
On Fri, 26 Aug 2016, Richard Biener wrote: On Thu, Aug 25, 2016 at 9:40 PM, Marc Glisse wrote: Hello, I was considering changing the implementation of _mm_loadu_pd in x86's emmintrin.h to avoid a builtin. Here are 3 versions: typedef double __m128d __attribute__ ((__vector_size__ (16), __may_alias__)); typedef double __m128d_u __attribute__ ((__vector_size__ (16), __may_alias__, aligned(1))); __m128d f (double const *__P) { return __builtin_ia32_loadupd (__P); } __m128d g (double const *__P) { return *(__m128d_u*)(__P); } __m128d h (double const *__P) { __m128d __r; __builtin_memcpy (&__r, __P, 16); return __r; } f is what we have currently. f and g generate the same code. h also generates the same code except at -O0 where it is slightly longer. (note that I haven't regtested either version yet) 1) I don't have any strong preference between g and h, is there a reason to pick one over the other? I may have a slight preference for g, which expands to __m128d _3; _3 = MEM[(__m128d_u * {ref-all})__P_2(D)]; while h yields __int128 unsigned _3; _3 = MEM[(char * {ref-all})__P_2(D)]; _4 = VIEW_CONVERT_EXPR(_3); I prefer 'g' which is just more natural. Ok, thanks. Note that the C language requires that __P be aligned to alignof (double) (not sure what the Intel intrinsic specs say here), and thus it doesn't allow arbitrary misalignment. This means that you could use a slightly better aligned type with aligned(alignof(double)). I had thought about it, but since we already generate movupd with aligned(1), it didn't really seem worth the trouble for this prototype. Or to be conforming the parameter should not be double const * but a double type variant with alignment 1 ... Yeah, those intrinsics have issues: __m128i _mm_loadu_si128 (__m128i const* mem_addr) "mem_addr does not need to be aligned on any particular boundary." that doesn't really make sense. I may try to experiment with your suggestion, see if it breaks anything. Gcc seems happy to ignore those alignment differences when casting function pointers, so it should be fine. 2) Reading Intel's doc for movupd, it says: "If alignment checking is enabled (CR0.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception (#AC) may or may not be generated (depending on processor implementation) when the operand is not aligned on an 8-byte boundary." Since we generate movupd for memcpy even when the alignment is presumably only 1 byte, I assume that this alignment-check stuff is not supported by gcc? Huh, never heard of this. Does this mean that mov_u_XX do alignment-check exceptions? I believe this would break almost all code (glibc memcpy, GCC generated code, etc). Thus it would require kernel support, emulating the unaligned ops to still work (but record them somehow). Elsewhere ( https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_loadu_pd&expand=3106,3115,3106,3124,3106&techs=SSE2 ) Intel doesn't mention this at all, it just says: "mem_addr does not need to be aligned on any particular boundary." So it might be a provision in the spec that was added just in case, but never implemented... -- Marc Glisse
Re: Is this FE bug or am I missing something?
On Sun, 11 Sep 2016, Igor Shevlyakov wrote: Small sample below fails (at least on 6.1) for multiple targets. The difference between two functions start at the very first tree pass... You are missing -fsanitize=undefined (and #include ). Please use the mailing list gcc-h...@gcc.gnu.org next time. -- Marc Glisse
Re: Is this FE bug or am I missing something?
On Mon, 12 Sep 2016, Igor Shevlyakov wrote: Well, my concern is not what happens with overflow (which in second case -fsanitize=undefined will address), but rather consistency of that 2 cases. p[x+1] generates RTL which leads to better generated code at the expense of leading to overflow, while p[1+x] never overflows but leads to worse code. It would be beneficial to make the behaviour consistent between those 2 cases. True. Your example with undefined behavior confused me as to what your point was. For int* f1(int* p, int x) { return &p[x + 1]; } int* f2(int* p, int x) { return &p[1 + x]; } we get in the gimple dump _1 = (sizetype) x; _2 = _1 + 1; vs _1 = x + 1; _2 = (long unsigned int) _1; The second one is a better starting point (it has more information about potential overflow), but the first one has the advantage that all numbers have the same size, which saves an instruction in the end movslq %esi, %rsi leaq4(%rdi,%rsi,4), %rax vs addl$1, %esi movslq %esi, %rsi leaq(%rdi,%rsi,4), %rax We regularly discuss the potential benefits of a pass that would try to uniformize integer sizes... In the mean time, I agree that gimplifying x+1 and 1+x differently makes little sense, you could file a PR about that. -- Marc Glisse
Re: how to check if target supports andnot instruction ?
On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote: I was having a look at PR71636 and added the following pattern to match.pd: x & ((1U << b) - 1) -> x & ~(~0U << b) However the transform is useful only if the target supports "andnot" instruction. rth was selling the transformation as a canonicalization, which is beneficial when there is an andnot instruction, and neutral otherwise, so it could be done always. As pointed out by Marc in PR for -march=core2, lhs generates worse code than rhs, so we shouldn't do the transform if target doesn't support andnot insn. (perhaps we could do the reverse transform for target not supporting andnot?) Rereading my comment in the PR, I pointed out that instead of being neutral, the transformation was very slightly detrimental in one case (one extra mov) because of a RA issue. That doesn't mean we should avoid the transformation, just that we should fix the RA issue (by the way, if you have time to file a separate PR for the RA issue, that would be great, otherwise I'll try to do it at some point...). However it seems andnot isn't a standard pattern name, so am not sure how to check if target supports andnot insn ? -- Marc Glisse
Re: how to check if target supports andnot instruction ?
On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote: On 12 October 2016 at 14:43, Richard Biener wrote: On Wed, 12 Oct 2016, Marc Glisse wrote: On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote: I was having a look at PR71636 and added the following pattern to match.pd: x & ((1U << b) - 1) -> x & ~(~0U << b) However the transform is useful only if the target supports "andnot" instruction. rth was selling the transformation as a canonicalization, which is beneficial when there is an andnot instruction, and neutral otherwise, so it could be done always. Well, its three instructions to three instructions and a more expensive constant(?). ~0U might not be available as immediate for the shift instruction and 1U << b might be available as a bit-set instruction ... (vs. the andnot). True, I hadn't thought of bit-set. So yes, we might decide to canonicalize to andnot (and decide that three binary to two binary and one unary op is "better"). So no excuse to explore the target specific .pd fragment idea ... :/ Hi, I have attached patch that adds the transform. Does that look OK ? Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested in the PR? If we only do the transformation when (1<bit_and, then we probably want to require that it has a single use (maybe even the shift). I am not sure how to write test-cases for it though. For the test-case: unsigned f(unsigned x, unsigned b) { unsigned t1 = 1U << b; unsigned t2 = t1 - 1; unsigned t3 = x & t2; return t3; } forwprop dump shows: Applying pattern match.pd:523, gimple-match.c:47419 gimple_simplified to _6 = 4294967295 << b_1(D); _8 = ~_6; t3_5 = x_4(D) & _8; I could scan for "_6 = 4294967295 << b_1(D);" however I suppose ~0 would depend on width of int and not always be 4294967295 ? Or should I scan for "_6 = 4294967295 << b_1(D);" and add /* { dg-require-effective int32 } */ to the test-case ? You could check that you have ~, or that you don't have " 1 << ". -- Marc Glisse
Re: GCC 6.2.0 : What does the undocumented -r option ?
On Mon, 7 Nov 2016, Emmanuel Charpentier wrote: The Sage project (http://www.sagemath.org) has recently hit an interesting snag : its developers using Debian testing began to encounter difficulties compiling the flint package (http://groups.googl e.co.uk/group/flint-devel) with gcc 2.6.0. One of us found (see https://groups.google.com/d/msg/sage-devel/TduebNo ZuBE/sEULolL0BQAJ) that this was bound to a conflict between the -pie option (now default) and an undocumented -r option. We would like to know what is this -r option, what it does and why it is undocumented. (the mailing list you are looking for is gcc-h...@gcc.gnu.org) As can be seen in the first message of the conversation you link to "/usr/bin/ld: -r and -pie may not be used together" The option -r is passed to ld, so you have to look for it in ld's manual where it is clearly documented. (that hardening stuff is such a pain...) -- Marc Glisse
Re: Need some help with a possible bug
(should have been gcc-h...@gcc.gnu.org, please send any follow-ups there) On Wed, 23 Apr 2014, George R Goffe wrote: I'm trying to build the latest gcc Do you really need gcj? If not, please disable java. and am getting a message from the process "collect2: error: ld returned 1 exit status" for this library /usr/lsd/Linux/lib/libgmp.so. Here's the full msg: "/usr/lsd/Linux/lib/libgmp.so: could not read symbols: File in wrong format" You are doing a multilib build (--disable-multilib if you don't want that), so it tries to build both a 64 bit and a 32 bit versions of libjavamath.so, both of which want to link to GMP. So you need both versions of GMP installed as well. I thought the configure script in classpath would detect your missing 32 bit GMP and disable use of GMP in that case, but apparently not... You may want to file a PR in bugzilla about that if there isn't one already. But you'll need to provide more info there: your configure command line, the file config.log in the 32 bit version of classpath, etc. -- Marc Glisse
Re: RTL representation of i386 shrdl instruction is incorrect?
On Thu, 5 Jun 2014, Niranjan Hasabnis wrote: Thanks for your reply. I looked into some of the details of how that particular RTL template is used. It seems to me that the particular RTL template is used only when shifting 64-bit data type on a 32-bit machine. This is the underlying assumption encoded in i386.c file which generates that particular RTL only when instruction mode is DImode. If that is the case, then it won't matter whether one uses arithmetic shift or logical shift to right shift lower 4-bytes of a 8-byte value. In other words, the mapping between RTL template and shrdl is incorrect, but the underlying assumption in i386.c guards the bug. This is still a bug, please file a PR. The use of (match_dup 0) apparently prevents combine from matching the insn (that's just a guess from my notes in PR 55583, I don't have access to my gcc machine right now to check), but that doesn't mean we shouldn't fix things. -- Marc Glisse
Re: What is "fnspec function type attribute"?
On Fri, 6 Jun 2014, FX wrote: In fortran/trans-decl.c, we have a comment above the code building function decls, saying: The SPEC parameter specifies the function argument and return type specification according to the fnspec function type attribute. */ I was away from GCC development for some time, so this is news to me. The syntax is not immediately clear, and neither a Google search nor a grep of the trunk’s numerous .texi files reveals any information. I’m creating new decls, what I am to do with it? You can look at the 2 functions in gimple.c that use gimple_call_fnspec, and refer to tree-core.h for the meaning of EAF_*, etc. A string like "2x." means: '2': the first letter is about the return, here we are returning the second argument 'x': the first argument is ignored '.': not saying anything about the second argument. -- Marc Glisse
Re: Comparison of GCC-4.9 and LLVM-3.4 performance on SPECInt2000 for x86-64 and ARM
On Wed, 25 Jun 2014, Vladimir Makarov wrote: Maybe. But in this case LLVM did a right thing. The variable addressing was through a restrict pointer. Ah, gcc implements (on purpose?) a weak version of restrict, where it only considers that 2 restrict pointers don't alias, whereas all other compilers assume that restrict pointers don't alias other non-derived pointers (see several PRs in bugzilla). I believe Richard recently added code that would make implementing the strong version of restrict easier. Maybe that's what is missing here? -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Tue, 1 Jul 2014, Jeff Law wrote: On 07/01/14 13:27, Tom de Vries wrote: Vladimir, There are a few patterns which use both the read/write constraint modifier (+) and the earlyclobber constraint modifier (&): ... $ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0 gcc/config/aarch64/aarch64-simd.md:1 gcc/config/arc/arc.md:1 gcc/config/arm/ldmstm.md:30 gcc/config/rs6000/spe.md:8 ... F.i., this one in gcc/config/aarch64/aarch64-simd.md: ... (define_insn "vec_pack_trunc_" [(set (match_operand: 0 "register_operand" "+&w") (vec_concat: (truncate: (match_operand:VQN 1 "register_operand" "w")) (truncate: (match_operand:VQN 2 "register_operand" "w"] ... The documentation ( https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states: ... '‘&’ does not obviate the need to write ‘=’. ... which seems to state that '&' implies '='. An earlyclobber operand is defined as 'modified before the instruction is finished using the input operands'. AFAIU that would indeed exclude the possibility that the earlyclobber operand is an input/output operand it self, but perhaps I misunderstand. So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? I don't think we can define any reasonable semantics for &+. My recommendation would be for this to be considered a hard error. Uh? The doc explicitly says "An input operand can be tied to an earlyclobber operand" and goes on to explain why that is useful. It avoids using the same register for other input when they are identical. -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Tue, 1 Jul 2014, Tom de Vries wrote: On 01-07-14 21:58, Marc Glisse wrote: So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? I don't think we can define any reasonable semantics for &+. My recommendation would be for this to be considered a hard error. Uh? The doc explicitly says "An input operand can be tied to an earlyclobber operand" and goes on to explain why that is useful. It avoids using the same register for other input when they are identical. Hi Marc, That part of the doc refers to the mulsi3 insn for ARM as example: ... ;; Use `&' and then `0' to prevent the operands 0 and 1 being the same (define_insn "*arm_mulsi3" [(set (match_operand:SI 0 "s_register_operand" "=&r,&r") (mult:SI (match_operand:SI 2 "s_register_operand" "r,r") (match_operand:SI 1 "s_register_operand" "%0,r")))] "TARGET_32BIT && !arm_arch6" "mul%?\\t%0, %2, %1" [(set_attr "type" "mul") (set_attr "predicable" "yes")] ) ... Note that there's no combination of & and + here. I think it could have used (match_dup 0) instead of operand 1, if there had been only the first alternative. And then the constraint would have been +&. AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber output operand 0 using the '0' matching constraint. Having said that, I don't understand the comment, AFAIU it should be: 'Use '0' to make sure operands 0 and 1 are the same, and use '&' to make sure operands 0 and 2 are not the same. Well, yeah, the comment doesn't seem completely in sync with the code. In the first example you gave, looking at the pattern (no match_dup, setting the full register), it seems that it may have wanted "=&" instead of "+&". (by the way, in the same aarch64-simd.md file, I noticed some define_expand with constraints, that looks strange) -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Wed, 2 Jul 2014, Tom de Vries wrote: On 02-07-14 08:23, Marc Glisse wrote: I think it could have used (match_dup 0) instead of operand 1, if there had been only the first alternative. And then the constraint would have been +&. isn't that explicitly listed as unsupported here ( https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 ): ... Note that match_dup should not be used to tell the compiler that a particular register is being used for two operands (example: add that adds one register to another; the second register is both an input operand and the output operand). Use a matching constraint (see Simple Constraints) for those. match_dup is for the cases where one operand is used in two places in the template, such as an instruction that computes both a quotient and a remainder, where the opcode takes two input operands but the RTL template has to refer to each of those twice; once for the quotient pattern and once for the remainder pattern. ... ? Well, looking for instance at x86_shrd... Ok, I didn't know it wasn't supported (though I did suggest using match_operand and "0" at some point). Still, the meaning of +&, in inline asm for instance, seems relatively clear, no? -- Marc Glisse
Re: combination of read/write and earlyclobber constraint modifier
On Wed, 2 Jul 2014, Tom de Vries wrote: On 02-07-14 09:02, Marc Glisse wrote: Still, the meaning of +&, in inline asm for instance, seems relatively clear, no? I can't find any testsuite examples using this construct. Furthermore, I'd expect the same semantics and restrictions for constraints in rtl templates and inline asm. So I'm not sure what you mean. Coming back to your original question: An earlyclobber operand is defined as 'modified before the instruction is finished using the input operands'. AFAIU that would indeed exclude the possibility that the earlyclobber operand is an input/output operand it self, but perhaps I misunderstand. So my question is: is the combination of '&' and '+' supported ? If so, what is the exact semantics ? If not, should we warn or give an error ? An earlyclobber operand X prevents *other* input operands from using the same register, but that does not include X itself (if it is using +) or operands explicitly using a matching constraint for X. At least that's how I understand it. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Jakub Jelinek wrote: - libstdc++ ABI changes It seems unlikely to be in the next release, it is too late in the cycle. Chances to break the ABI don't come often, and rushing one at the end of stage1 would be wasting a good opportunity. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Richard Biener wrote: It's an ABI change for all modes (but not a SONAME change because the old and new definitions will both be present in the .so). Ugh. That's going to be a nightmare to support. Yes. And IMO a waste of effort compared to a clean .so.7 break, but well... Is there a configure switch to change the default ABI used? That is, on a legacy system can I upgrate to 5.0 and get code that interoperates fine with code built with 4.8? (including ABI boundaries using the affected classes? I suspect APIs with std::string passing are _very_ common, not sure about std::list) What's the failure mode the user will see when linking against a 4.8 compiled library with a std::string interface using 5.0? In good cases, a linker error about a missing symbol (different mangling). In less good cases, a warning at compile-time about using a class marked with abi_tag in a class not marked with it. In worse cases (passing through void* for instance), a runtime crash. And how do libraries with such an API avoid silently changing their ABI dependent on the compiler used to compile them? That is, I suppose those need to change their SONAME dependent on the compiler version used?! Yes, just like a move to .so.7 would entail. -- Marc Glisse
Re: GCC version bikeshedding
On Wed, 6 Aug 2014, Jakub Jelinek wrote: On Wed, Aug 06, 2014 at 12:31:57PM +0200, Richard Biener wrote: Ok, so the problematical case is struct X { std::string s; }; void foo (X&); Yeah. then. OTOH I remember that then mangling of X changes as well? Only if you add abi_tag attribute to X. Note that -Wabi-tag can tell you where it is needed. struct __attribute__((abi_tag("marc"))) X {}; struct Y { X x; }; a.cc:2:8: warning: 'Y' does not have the "marc" abi tag that 'X' (used in the type of 'Y::x') has [-Wabi-tag] struct Y { X x; }; ^ a.cc:2:14: note: 'Y::x' declared here struct Y { X x; }; ^ a.cc:1:41: note: 'X' declared here struct __attribute__((abi_tag("marc"))) X {}; ^ I hope the libstdc++ folks will add some macro which will include the right abi_tag attribute for the std::list/std::string cases, so you'd in the end just add #ifndef _GLIBCXX_ABI_TAG_SOMETHING #define _GLIBCXX_ABI_TAG_SOMETHING #endif ... struct X _GLIBCXX_ABI_TAG_SOMETHING { std::string s; }; void foo (X&); or similar. So we only need to patch every project out there... A clean .so.7 break would be significantly worse nightmare. We've been there many years ago, e.g. 3.2/3.3 vs. 3.4, there has been significantly fewer C++ plugins etc. in packages and it still it was unsolvable. With the abi_tag stuff, you have the option to make stuff interoperable when mixing compiler, either with no effort at all, or some limited effort. With .so.7, you have no option, nothing will be interoperable. I disagree that it is worse, but you have more experience, I guess we will see the results in a few years... -- Marc Glisse
Re: Where does GCC pick passes for different opt. levels
On Mon, 11 Aug 2014, Steve Ellcey wrote: I have a basic question about optimization selection in GCC. There used to be some code in GCC (passes.c?) that would set various optimize pass flags depending on if the 'optimize' flag was > 0, > 1, or > 2; later I think there may have been a table. There is still a table in opts.c, with entries that look like: { OPT_LEVELS_2_PLUS, OPT_ftree_vrp, NULL, 1 }, This code seems gone now and I can't figure out how GCC is selecting what optimization passes to run at what optimization levels (-O1 vs. -O2 vs. -O3). How is this handled in the top-of-tree GCC code? I see passes.def but there doesn't seem to be anything in there to tie specific passes to specific optimization levels. Likewise in common.opt I see flags for various optimization passes but nothing to tie them to -O1 or -O2, etc. I'm probably missing something obvious, but a pointer would be much appreciated. -- Marc Glisse
Re: Conditional negation elimination in tree-ssa-phiopt.c
On Mon, 11 Aug 2014, Kyrill Tkachov wrote: The aarch64 target has a conditional negation instruction CSNEG Rd, Rs1, Rs2, cond with semantics Rd = if cond then Rs1 else -Rs2. This, however doesn't get end up getting matched for code such as: int foo2 (unsigned a, unsigned b) { int r = 0; r = a & b; if (a & b) return -r; return r; } Note that in this particular case, we should just return -(a&b) like llvm does. -- Marc Glisse
Re: gcc parallel make check
On Wed, 3 Sep 2014, VandeVondele Joost wrote: I've noticed that make -j -k check-fortran results in a serialized checking, while make -j32 -k check-fortran goes parallel. Somehow the explicit 'N' in -jN seems to be needed for the check target, while the other targets seem to do just fine. Is that a feature, or should I file a PR for that... ? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53155 -- Marc Glisse
Re: Fwd: Building gcc-4.9 on OpenBSD
On Wed, 17 Sep 2014, Ian Grant wrote: And is there any way to disable the Intel library? --disable-libcilkrts (same as the other libs) If it explicitly doesn't support your system, I am a bit surprised it isn't disabled automatically, that seems like a bug. Please don't call it "the Intel library", that doesn't mean anything. -- Marc Glisse
Re: Fwd: Building gcc-4.9 on OpenBSD
On Wed, 17 Sep 2014, Ian Grant wrote: On Wed, Sep 17, 2014 at 1:36 PM, Marc Glisse wrote: On Wed, 17 Sep 2014, Ian Grant wrote: And is there any way to disable the Intel library? --disable-libcilkrts (same as the other libs) If it explicitly doesn't support your system, I am a bit surprised it isn't disabled automatically, that seems like a bug. Not necessarily a bug, but it would have been good if the --help option had mentioned it. I looked, really. Perhaps I missed it though. So many options for disabling one thing or another https://gcc.gnu.org/install/configure.html lists a number of others but not this one, maybe it should be added. Please don't call it "the Intel library", that doesn't mean anything. Doesn't it? How did you know what 'it' was then? Or is that a stupid question? This identity concept is much slipperier than it seems at first, isn't it? You included error messages... How about my question about the size of the binaries? Is that 60+MB what other systems show? I still see <20M here, but I don't know if there are reasons for what you are seeing. Are you maybe using different options? (debug information, optimization, lto, etc) -- Marc Glisse
Re: How to identify the type of the object being created using the new operator?
On Mon, 6 Oct 2014, Swati Rathi wrote: Statement : A *a = new B; gets translated in GIMPLE as 1. void * D.2805; 2. struct A * a; 3. D.2805 = operator new (20); 4. a = D.2805; A is the base class and B is the derived class. In statement 3, new operator is creating an object of derived class B. By analyzing the RHS of the assignment statement 3, how can we identify the type (in this case B) of the object being created? I strongly doubt you can. It is calling B's constructor that will turn this memory region into a B, operator new is the same as malloc, it only returns raw memory. (If A and B don't have the same size, the argument 20 can be a hint) -- Marc Glisse
Re: volatile access optimization (C++ / x86_64)
On Fri, 26 Dec 2014, Matt Godbolt wrote: I'm investigating ways to have single-threaded writers write to memory areas which are then (very infrequently) read from another thread for monitoring purposes. Things like "number of units of work done". I initially modeled this with relaxed atomic operations. This generates a "lock xadd" style instruction, as I can't convey that there are no other writers. As best I can tell, there's no memory order I can use to explain my usage characteristics. Giving up on the atomics, I tried volatiles. These are less than ideal as their power is less expressive, but in my instance I am not trying to fight the ISA's reordering; just prevent the compiler from eliding updates to my shared metrics. GCC's code generation uses a "load; add; store" for volatiles, instead of a single "add 1, [metric]". https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50677 -- Marc Glisse
Re: C++ Standard Question
On Thu, 22 Jan 2015, Joel Sherrill wrote: I think this is a glibc issue but since this method is defined in the C++ standards, I thought there were plenty of language lawyers here. :) s/glibc/libstdc++/ and they have their own ML. That's deprecated, isn't it? class strstreambuf : public basic_streambuf > ISSUE > int pcount() const; <= ISSUE My reading of the C++03 and draft C++14 says that the int pcount() method in this class is not const. glibc has it const in the glibc shipped with Fedora 20 and CentOS 6. This is a simple test case: #include int main() { int (std::strstreambuf::*dummy)() = &std::strstreambuf::pcount; /*-- pcount is conformant --*/ return 0; } What's the consensus? The exact signature of member functions is not mandated by the standard, implementations are allowed to make the function const if that works (or provide both a const and a non-const version). Your code is not guaranteed to work. Lambdas usually provide a fine workaround. -- Marc Glisse
Re: unfused fma question
On Mon, 23 Feb 2015, Jeff Law wrote: On 02/23/15 11:38, Joseph Myers wrote: (I wonder if convert_mult_to_fma is something that should move to match-and-simplify infrastructure.) Yea, it probably should. Currently, it happens in a pass that is quite late. If it moves to match-and-simplify, I am afraid it might inhibit some other optimizations (we can turn plus+mult to fma but not the reverse), unless we use some way to inhibit some patterns until a certain pass (possibly a simple "if", if that's not too costly). Such "time-restricted" patterns might be useful for other purposes: don't introduce complicated vector/complex operations after the corresponding lowering passes, do narrowing until a certain point but then prefer fast integer sizes, etc (I haven't thought about those particular examples, they are only an illustration). -- Marc Glisse
Re: A bug (?) with inline functions at O0: undefined reference
On Fri, 6 Mar 2015, Ilya Verbin wrote: I've discovered a strange behaviour on trunk gcc, here is the reproducer: inline int foo () { return 0; } int main () { return foo (); } $ gcc main.c /tmp/ccD1LeXo.o: In function `main': main.c:(.text+0xa): undefined reference to `foo' collect2: error: ld returned 1 exit status Is this a bug? If yes, is it known? GCC 4.8.3 works fine though. Not a bug, that's what inline means in C99 and later. -- Marc Glisse
RE: PR65416, alloca on xtensa
augustine.sterl...@gmail.com wrote: > On Fri, Mar 13, 2015 at 7:54 AM, Max Filippov wrote: [...] > > 2. alloca seems to make an additional 16-bytes padding to each stack > > allocation: alloca(1) results in moving sp down by 32 bytes, > > alloca(17) > > moves it by 48 bytes, etc. This padding looks unnecessary to me: > > either > > this space is not used (previous register frame is not spilled), or > > alloca > > exception handler will take care about reloading or moving spilled > > registers > > to a new location. In both cases after movsp this space is just > > wasted. > > Do you know why this padding may be needed? > > Answering this question definitively requires some time with the ABI > manual, which I don't have. You may be right, but I would check what > XCC does in this case. It is far better tested. Other than the required 16-byte stack alignment, there's nothing in the ABI that requires these extra 16 bytes. Perhaps there was a bad implementation of the alloca exception handler at some point a long time ago that prompted the extra 16 bytes? Today XCC doesn't add the extra 16 bytes. alloca(n) with n in a2 comes out as this: 0x6490 <+12>:movi.n a8, -16 0x6492 <+14>:addi.n a3, a2, 15 0x6494 <+16>:and a3, a3, a8 0x6497 <+19>:sub a3, a1, a3 0x649a <+22>:movsp a1, a3 which just rounds up to 16 bytes. -Marc
Re: Named parameters
On Mon, 16 Mar 2015, David Brown wrote: In a discussion on comp.lang.c, the subject of "named parameters" (or "designated parameters") has come up again. This is a feature that some of us feel would be very useful in C (and in C++). I think it would be possible to include it in the language without leading to any conflicts with existing code - it is therefore something that could be made as a gcc extension, with a hope of adding it to the standards for a later C standards revision. I wanted to ask opinions on the mailing list as to the feasibility of the idea - there is little point in my cluttering up bugzilla with an enhancement request if the gcc developers can spot obvious flaws in the idea. Filing a report in bugzilla would be quite useless: language extensions are now almost automatically rejected unless they come with a proposal that has already been favorably seen by the standardization committee. On the other hand, implementing the feature (in your own fork) is almost a requirement if you intend to propose this for standardization. And it should not be too hard. Basically, the idea is this: int foo(int a, int b, int c); void bar(void) { foo(1, 2, 3); // Normal call foo(.a = 1, .b = 2, .c = 3) // Same as foo(1, 2, 3) foo(.c = 3, .b = 2, .a = 1) // Same as foo(1, 2, 3) } struct foo_args { int a, b, c; }; void foo(struct foo_args); #define foo(...) foo((struct foo_args){__VA_ARGS__}) void g(){ foo(1,2,3); foo(.c=3,.b=2); } In C++ you could almost get away without the macro, calling f({1,2,3}), but f({.c=3}) currently gives "sorry, unimplemented". Maybe you would like to work on that? If only the first variant is allowed (with the named parameters in the order declared in the prototype), then this would not affect code generation at all - the designators could only be used for static error checking. If the second variant is allowed, then the parameters could be re-ordered. The aim of this is to make it easier and safer to call functions with a large number of parameters. The syntax is chosen to match that of designated initialisers - that should be clearer to the programmer, and hopefully also make implementation easier. If there is more than one declaration of the function, then the designators used should follow the most recent in-scope declaration. An error may be safer, you would at least want a warning. This feature could be particularly useful when combined with default arguments in C++, as it would allow the programmer to override later default arguments without specifying all earlier arguments. C++ is always more complicated (so many features can interact in strange ways), I suggest you start with C. At the moment, I am not asking for an implementation, or even /how/ it might be implemented (perhaps a MELT plugin?) - I would merely like opinions on whether it would be a useful and practical enhancement. This is not such a good list for that, comp.lang.c is better suited. This will be a good list if you have technical issues implementing the feature. -- Marc Glisse
Re: -Wno-c++11-extensions addition
On Wed, 25 Mar 2015, Jack Howarth wrote: On Wed, Mar 25, 2015 at 12:41 PM, Jonathan Wakely wrote: On 25 March 2015 at 16:16, Jack Howarth wrote: Does anyone remember which FSF gcc release first added the -Wno-c++11-extensions option for g++? I know it exists in 4.6.3 Are you sure? It doesn't exist for 4.6.4 or anything later. Are you thinking of -Wc++0x-compat ? On x86_64 Fedora 15... $ /usr/bin/g++ --version g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2) Copyright (C) 2011 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ /usr/bin/g++ -Wno-c++11-extensions hello.cc $ So gcc 4.6.3 appears to at least tolerate that warning without claiming that it is unknown. https://gcc.gnu.org/wiki/FAQ#The_warning_.22unrecognized_command-line_option.22_is_not_given_for_-Wno-foo -- Marc Glisse
Re: [i386] Scalar DImode instructions on XMM registers
On Fri, 24 Apr 2015, Uros Bizjak wrote: Please try to generate paradoxical subreg (V2DImode subreg of V1DImode pseudo). IIRC, there is some functionality in the compiler that is able to tell if the highpart of the paradoxical register is zeroed. Those are not currently legal (I tried to change that) https://gcc.gnu.org/ml/gcc-patches/2013-03/msg00745.html https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00769.html In this case, a subreg:V2DI of DImode should work. -- Marc Glisse
Re: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?
On Fri, 12 Oct 2018, Thomas Schwinge wrote: Hmm, and without any OpenACC/OpenMP etc., actually the same problem is also present when running the following code through the vectorizer: for (int tmp = 0; tmp < N_J * N_I; ++tmp) { int j = tmp / N_I; int i = tmp % N_I; a[j][i] = 0; } ... whereas the following variant (obviously) does vectorize: int a[NJ * NI]; for (int tmp = 0; tmp < N_J * N_I; ++tmp) a[tmp] = 0; I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsigned int) j_10; _2 = _1 * 1025; _3 = (sizetype) i_11; _4 = _2 + _3; or for a power of 2 j_10 = tmp_17 >> 10; i_11 = tmp_17 & 1023; _1 = (long unsigned int) j_10; _2 = _1 * 1024; _3 = (sizetype) i_11; _4 = _2 + _3; and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true). So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed. -- Marc Glisse
Re: "match.pd" (was: Can support TRUNC_DIV_EXPR, TRUNC_MOD_EXPR in GCC vectorization/scalar evolution -- and/or linearization?)
(resent because of mail issues on my end) On Mon, 22 Oct 2018, Thomas Schwinge wrote: I had a quick look at the difference, and a[j][i] remains in this form throughout optimization. If I write instead *((*(a+j))+i) = 0; I get j_10 = tmp_17 / 1025; i_11 = tmp_17 % 1025; _1 = (long unsigned int) j_10; _2 = _1 * 1025; _3 = (sizetype) i_11; _4 = _2 + _3; or for a power of 2 j_10 = tmp_17 >> 10; i_11 = tmp_17 & 1023; _1 = (long unsigned int) j_10; _2 = _1 * 1024; _3 = (sizetype) i_11; _4 = _2 + _3; and in both cases we fail to notice that _4 = (sizetype) tmp_17; (at least I think that's true). So there are missing match.pd transformations in addition to whatever scev/ivdep/other work is needed. With a very simplistic "match.pd" rule (not yet any special cases checking etc.): diff --git gcc/match.pd gcc/match.pd index b36d7ccb5dc3..4c23116308da 100644 --- gcc/match.pd +++ gcc/match.pd @@ -5126,3 +5126,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) { wide_int_to_tree (sizetype, off); }) { swap_p ? @0 : @2; })) { rhs_tree; }) + +/* Given: + + j = in / N_I + i = in % N_I + + ..., fold: + + out = j * N_I + i + + ..., into: + + out = in +*/ + +/* As long as only considering N_I being INTEGER_CST (which are always second + argument?), probably don't need ":c" variants? */ + +(simplify + (plus:c + (mult:c + (trunc_div @0 INTEGER_CST@1) + INTEGER_CST@1) + (trunc_mod @0 INTEGER_CST@1)) + (convert @0)) You should only specify INTEGER_CST@1 on the first occurence, the others can be just @1. (you may be interested in @@1 at some point, but that gets tricky) ..., the original code: int f1(int in) { int j = in / N_I; int i = in % N_I; int out = j * N_I + i; return out; } ... gets simplified from ("div-mod-0.c.027t.objsz1"): f1 (int in) { int out; int i; int j; int _1; int _6; : gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_return <_6> } ... to ("div-mod-0.c.028t.ccp1"): f1 (int in) { int out; int i; int j; int _1; : gimple_assign gimple_assign gimple_assign gimple_return } (The three dead "gimple_assign"s get eliminated later on.) So, that works. However, it doesn't work yet for the original construct that I'd ran into, which looks like this: [...] int i; int j; [...] signed int .offset.5_2; [...] unsigned int .offset.7_23; unsigned int .iter.0_24; unsigned int _25; unsigned int _26; [...] unsigned int .iter.0_32; [...] : # gimple_phi <.offset.5_2, .offset.5_21(8), .offset.5_30(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Resolving the "a[j][i] = 123" we'll need to look into later. As Marc noted above, with that changed into "*(*(a + j) + i) = 123", we get: [...] int i; int j; long unsigned int _1; long unsigned int _2; sizetype _3; sizetype _4; sizetype _5; int * _6; [...] signed int .offset.5_8; [...] unsigned int .offset.7_29; unsigned int .iter.0_30; unsigned int _31; unsigned int _32; [...] : # gimple_phi <.offset.5_8, .offset.5_27(8), .offset.5_36(9)> gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign gimple_assign [...] Here, unless I'm confused, "_4" is supposed to be equal to ".iter.0_30", but "match.pd" doesn't agree yet. Note the many "nop_expr"s here, which I have not yet figured out how to handle, I suppose? I tried some things but couldn't get it to work. Apparently the existing instances of "(match (nop_convert @0)" and "Basic strip-useless-type-conversions / strip_nops" rule also don't handle these; should they? Or, are in fact here the types mixed up too much? "(match (nop_convert @0)" defines a shortcut so some transformations can use nop_convert to detect some specific conversions, but it doesn't do anything by itself. "Basic strip-useless-type-conversions" strips conversions that are *useless*, essentially from a type to the same type. If you want to handle true conversions, you need to do that explicitly, see the many transformations that use convert? convert1? convert2? and specify for which particular conversions the transformation is valid. Finding out the right conditions to detect these conversions is often the most painful part of writing a match.pd transformation. I hope to get some time again soon to continue looking into this, but if anybody got any ideas, I'm all ears. -- Marc Glisse
Re: [RFC] -Weverything
On Tue, 22 Jan 2019, Thomas Koenig wrote: Hi, What would people think about a -Weverything option which turns on every warning there is? I think that could be quite useful in some circumstances, especially to find potential bugs with warnings that people, for some reason or other, found too noisy for -Wextra. The name could be something else, of course. In the best GNU tradition, -Wkitchen-sink could be another option :-) https://gcc.gnu.org/bugzilla/show_bug.cgi?id=31573 and duplicates already list quite a few arguments. Basically, it could be useful for debugging gcc or to discover warnings, but gcc devs fear that users will actually use it for real. -- Marc Glisse
Re: [RFC] -Weverything
On Wed, 23 Jan 2019, Jakub Jelinek wrote: We have that, gcc -Q --help=warning Of course, for warnings which do require arguments (numerical, or enumeration/string), one still needs to pick up his choices of those arguments; no idea what -Weverything would do here, while some warnings have different levels where a higher (or lower) level is a superset of another level, what numbers would you pick for e.g. warnings where the argument is bytes? For most of them, there is a value that maximizes the number of warnings, so the same superset argument applies. -Wframe-larger-than=0 so it shows the estimated frame size on every function, -Walloca-larger-than=0 so it is equivalent to -Walloca, etc. -- Marc Glisse
named address space problem
Hi ! While porting a GCC 4.9 private port to GCC 7, I've encountered an issue with named address space support. I have defined the following target macros: #define K1_ADDR_SPACE_UNCACHED 1 #define K1_ADDR_SPACE_CONVERT 2 TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P (returns false for CONVERT, regular legitimate hook for other as) TARGET_ADDR_SPACE_LEGITIMIZE_ADDRESS (raises an error if using CONVERT as or calls regular legitimize_address hook) TARGET_ADDR_SPACE_SUBSET_P (always true) TARGET_ADDR_SPACE_CONVERT (emits a warning if not to/from CONVERT as and always returns first operand) #define REGISTER_TARGET_PRAGMAS() do { \ c_register_addr_space ("__uncached", K1_ADDR_SPACE_UNCACHED); \ c_register_addr_space ("__convert", K1_ADDR_SPACE_CONVERT); \ } while (0) The usage is very basic and is used to drive the insn selection to use cached/uncached variants for load/store. Pointers are declared with `__uncached` to use uncached variants and `__convert` is used when converting pointers to/from this uncached space. It works as expected on GCC 4.9. On our current port on GCC 7 (using latest gcc-7-branch branch), we have an issue with simple code: ``` typedef struct { unsigned long count; } foo_t; unsigned long foobar(foo_t *cond, int bar) { if (bar == 1 ) { } __uncached foo_t *ucond = cond; return ucond->count; } ``` Raises the following error: ``` : In function 'foobar': :9:3: error: unknown type name '__uncached' __uncached foo_t *ucond = cond; ^~ :9:20: error: expected '=', ',', ';', 'asm' or '__attribute__' before '*' token __uncached foo_t *ucond = cond; ^ :10:10: error: 'ucond' undeclared (first use in this function); did you mean 'cond'? return ucond->count; ^ cond :10:10: note: each undeclared identifier is reported only once for each function it appears in Compiler returned: 1 ``` The following changes make the code compile as expected: - moving the variable declaration at the beginning of the block - opening a block before the declaration and closing it after the return stmt. I could not find a matching PR in bugzilla. Do you know of any issue with this ? Maybe this has been fixed in later versions. Thanks, Marc
Re: On-Demand range technology [2/5] - Major Components : How it works
On Tue, 4 Jun 2019, Martin Sebor wrote: On 5/31/19 9:40 AM, Andrew MacLeod wrote: On 5/29/19 7:15 AM, Richard Biener wrote: On Tue, May 28, 2019 at 4:17 PM Andrew MacLeod wrote: On 5/27/19 9:02 AM, Richard Biener wrote: On Fri, May 24, 2019 at 5:50 PM Andrew MacLeod wrote: The above suggests that iff this is done at all it is not in GORI because those are not conditional stmts or ranges from feeding those. The machinery doing the use-def walking from stmt context also cannot come along these so I have the suspicion that Ranger cannot handle telling us that for the stmt following above, for example if (_5 != 0) that _5 is not zero? Can you clarify? So there are 2 aspects to this. the range-ops code for DIV_EXPR, if asked for the range of op2 () would return ~[0,0] for _5. But you are also correct in that the walk backwards would not find this. This is similar functionality to how null_derefs are currently handled, and in fact could probably be done simultaneously using the same code base. I didn't bring null derefs up, but this is a good time :-) There is a separate class used by the gori-cache which tracks the non-nullness property at the block level. It has a single API: non_null_deref_p (name, bb) which determines whether the is a dereference in any BB for NAME, which indicates whether the range has an implicit ~[0,0] range in that basic block or not. So when we then have _1 = *_2; // after this _2 is non-NULL _3 = _1 + 1; // _3 is non-NULL _4 = *_3; ... when a on-demand user asks whether _3 is non-NULL at the point of _4 = *_3 we don't have this information? Since the per-BB caching will only say _1 is non-NULL after the BB. I'm also not sure whether _3 ever gets non-NULL during non-NULL processing of the block since walking immediate uses doesn't really help here? presumably _3 is globally non-null due to the definition being (pointer + x) ... ie, _3 has a global range o f ~[0,0] ? No, _3 is ~[0, 0] because it is derived from _1 which is ~[0, 0] and you cannot arrive at NULL by pointer arithmetic from a non-NULL pointer. I'm confused. _1 was loaded from _2 (thus asserting _2 is non-NULL). but we have no idea what the range of _1 is, so how do you assert _1 is [~0,0] ? The only way I see to determine _3 is non-NULL is through the _4 = *_3 statement. In the first two statements from the above (where _1 is a pointer): _1 = *_2; _3 = _1 + 1; _1 must be non-null because C/C++ define pointer addition only for non-null pointers, and therefore so must _3. (int*)0+0 is well-defined, so this uses the fact that 1 is non-null. This is all well done in extract_range_from_binary_expr already, although it seems to miss the (dangerous) optimization NULL + unknown == NULL. Just in case, a quote: "When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P. (4.1) — If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value. (4.2) — Otherwise, if P points to element x[i] of an array object x with n elements, 80 the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n and the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n. (4.3) — Otherwise, the behavior is undefined" Or does the middle-end allow arithmetic on null pointers? When people use -fno-delete-null-pointer-checks because their (embedded) platform has important stuff at address 0, they also want to be able to do arithmetic there. -- Marc Glisse
Re: Testsuite not passing and problem with xgcc executable
On Sat, 8 Jun 2019, Jonathan Wakely wrote: You can see which tests failed by looking in the .log files in the testsuite directories, There are .sum files for a quick summary. or by running the contrib/test_summary script. There is also contrib/compare_tests, although running it globally has been failing for a long time now, and running it for individual .sum files fails for jit and libphobos. Other scripts in contrib/ may be relevant. -- Marc Glisse