Re: SH runtime switchable atomics - proposed design
On Wed, 2016-01-20 at 20:22 -0500, Rich Felker wrote: > On Thu, Jan 21, 2016 at 08:08:18AM +0900, Oleg Endo wrote: > > Do you have plans to add other atomic operations (like > > arithmetic)? > > No, at least not in musl. From musl's perspective cas is the main one > that's used anyway. But even in general I don't think there's a > significant advantage to doing 'direct' arithmetic ops without a cas > loop even when you can (with llsc, gusa, or imask model). With gusa > and imask the only time you benefit from not implementing them in > terms of cas is on the _highly_ unlucky/unlikely occasion where an > interrupt occurs between the old-value read before cas and the cas. > For llsc there's more potential advantage because actual smp > contention is possible, but sh4a is probably not a very interesting > target anymore. Things like atomic increments are combinable, so you can make them scale better. You can do combining too with CAS, but if you're using the CAS to implement an increment, the program would have to issue another CAS, so you're not gaining as much.
Re: SH runtime switchable atomics - proposed design
On Wed, 2016-01-20 at 20:22 -0500, Rich Felker wrote: > On Thu, Jan 21, 2016 at 08:08:18AM +0900, Oleg Endo wrote: > > On Tue, 2016-01-19 at 15:28 -0500, Rich Felker wrote: > > > I've been working on the new version of runtime-selected SH > > > atomics > > > for musl, and I think what I've got might be appropriate for > > > GCC's > > > generated atomics too. I know Oleg was not very excited about > > > doing > > > this on the gcc side from a cost/benefit perspective > > > > I am just not keen on making this the default atomic model for SH. > > If you have a system built around this atomic model and want to add > > it > > to GCC, please send in patches. Just a few comments below... > > OK, thanks for clarifying. I don't have a patch yet but I might do > one > later. Sato-san's work on adding direct cas.l support showed me how > this part of the gcc code seems to work, so it shouldn't be too hard > to hook it up, but there are ABI design considerations still if we > decide to go this way. > > > > Inputs: > > > - R0: Memory address to operate on > > > - R1: Address of implementation function, loaded from a global > > > - R2: Comparison value > > > - R3: Value to set on success > > > > > > Outputs: > > > - R3: Old value read, ==R2 iff cas succeeded. > > > > > Preserved: R0, R2. > > > > > > Clobbered: R1, PR, T. > > > > The T bit is obviously the result of the cas operation. So you > > could > > use it as an output directly instead of the implicit R3 == R2 > > condition. > > I didn't want to impose a requirement that all backends leave the > result in the T bit. At the C source level, I think most software > uses > old==expected as the test for success; this is the API > __sync_val_compare_and_swap provides, and what people used to x86 > would naturally do anyway. > > > > This call (performed from __asm__ for musl, but gcc would do it > > > as SH > > > "SFUNC") is highly compact/convenient for inlining because it > > > avoids > > > clobbering any of the argument registers that are likely to > > > already > > > be > > > in use by the caller, and it preserves the important values that > > > are > > > likely to be reused after the cas operation. > > > > > > For J2 and future J4, the function pointer just points to: > > > > > > rts > > >cas.l r2,r3,@r0 > > > > > > > > and the only costs vs an inline cas.l are loading the address of > > > the > > > function (done in the caller; involves GOT access) and clobbering > > > R1 > > > and PR. > > > > > > This is still a draft design and the version in musl is subject > > > to > > > change at any time since it's not a public API/ABI, but I think > > > it > > > could turn into something useful to have on the gcc side with a > > > -matomic-model=libfunc option or similar. Other ABI > > > considerations > > > for > > > gcc use would be where to store the function pointer and how to > > > initialize it. To be reasonably efficient with FDPIC the caller > > > needs > > > to be responsible for loading the function pointer (and it needs > > > to > > > always point to code, not a function descriptor) so that the > > > callee > > > does not need a GOT pointer passed in. > > > > Obviously the ABI has been constructed around the J-core's cas.l > > instruction. > > Yes, but that was a choice I made after a first draft that was no > more > optimal for the other backends and less optimal for J-core. And the > only real choices that were based on the instruction's properties > were > using r0 for the address input and swapping the old value into r3 > rather than producing it in a different register. Other than these > minor details ABI was guided more by avoiding clobbers/reloads of > potentially valuable data in the caller. > > One possible change I just thought of: with one extra instruction in > the J-core version we could have the result come out in r1 and > preserve r3. Similar changes to the other versions are probably easy. > > > Do you have plans to add other atomic operations (like > > arithmetic)? > > No, at least not in musl. From musl's perspective cas is the main one > that's used anyway. But even in general I don't think there's a > significant advantage to doing 'direct' arithmetic ops without a cas > loop even when you can (with llsc, gusa, or imask model). With gusa > and imask the only time you benefit from not implementing them in > terms of cas is on the _highly_ unlucky/unlikely occasion where an > interrupt occurs between the old-value read before cas and the cas. > For llsc there's more potential advantage because actual smp > contention is possible, but sh4a is probably not a very interesting > target anymore. > > > If not, then I'd suggest to name the atomic model > > "libfunc-musl-cas". > > I'm not sure how the "musl" naming here makes sense unless you're > thinking of having it just call into musl's definitions, which is > certainly a possible design but not what I had in mind. I was > thinking > of adapting the design to gcc and prov
Re: Source Code for Profile Guided Code Positioning
On 20/01/16 10:08, Sriraman Tallam wrote: On Fri, Jan 15, 2016 at 9:51 AM, Yury Gribov wrote: On 01/15/2016 08:44 PM, vivek pandya wrote: Thanks Yury for https://gcc.gnu.org/ml/gcc-patches/2011-09/msg01440.html this link. It implements procedure reordering as linker plugin. I have some questions : 1 ) Can you point me to some documentation for "how to write plugin for linkers " I am I have not seen doc for structs with 'ld_' prefix (i.e defined in plugin-api.h ) 2 ) There is one more algorithm for Basic Block ordering with execution frequency count in PH paper . Is there any implementation available for it ? Quite frankly - I don't know (I've only learned about Google implementation recently). I've added Sriram to maybe comment. Sorry for the late response. The google/gcc_4_9 branch has the source of function reordering linker Plugin. It is available in the function_reordering_plugin directory under the top level gcc directory. The function reordering plugin constructs a callgraph and uses profile information to do a Pettis Hansen style function reordering. This plugin does not do basic block re-ordering. There is no documentation as such that I am aware of to write a linker plugin. Here is a very brief overview. The linker calls the plugin's "onload" function when registering the plugin and the plugin inturn can register two call-backs with the linker, "claim_file_hook" and the "all_symbols_read_hook". "claim_file_hook" is called for each object file that the linker prcesses and the "all_symbols_read_hook" is called after all the symbols have been read by the linker. These are just two different interesting points in the course of a link. The plugin can also get handles to linker functions like "get_input_section_name" which it can use to process sections given their handle. You can also check the gold linker tests for simpler plugin examples. HTH, Thanks Sri Very interesting, thank you! Actually, I see -freorder-functions switch in GCC that allows perform pretty the same optimizations in linker, right? Just wondering if you achieved to get performance improvements against -freorder-functions with your plugin. Thanks, -Maxim -Y
Need forms for contribution towards gcc
Hi, I am a C/C++ programmer with 10 years of professional programming experience. I have been using gcc since version 2. I am not a compiler expert but would like to start contributing to gcc for which I need forms which require submission from gcc side. Please send me those and I will fill and send them back. Hoping to help gcc become even better. -- Respect, Shiv Shankar Dayal
Status of GCC 6 on x86_64 (Debian)
Matthias Klose did an archive rebuild of Ubuntu with GCC 6 recently [1]. He wanted to file bugs in Debian and so I offered to do a build of Debian unstable. This was done with GCC 6 20160117 as packaged for Debian by Matthias. I built 11,500 Debian packages. I ran into the following compiler errors: * ICE in fold_convert_loc, at fold-const.c:2366 (PR69379): new, fix available * ICE in dwarf2out_finish, at dwarf2out.c:27175 (PR69393): new, outstanding * ICE in get_initial_def_for_reduction, at tree-vect-loop.c:4196 (PR69166): reported by Matthias, outstanding * ICE in assign_temp, at function.c:961 (PR69241): known, outstanding * Wrong code generation at -O2 and higher (PR69320): known, fixed Afterwards, I took all packages that compiled successfully with GCC 6 and compiled them with dpkg-buildflags set to -O3 (note that not every package uses flags from dpkg-buildflags but many do). I ran into the following compiler errors: * ICE in vect_get_vec_def_for_operand, at tree-vect-stmts.c:1379 (PR69328): known, fixed * ICE in rewrite_use_nonlinear_expr, at tree-ssa-loop-ivopts.c:7086 (PR68021): known, outstanding * ICE in maybe_legitimize_operand, at optabs.c:6888 (PR69421): new, outstanding This was done on x86_64. I'd also like to do an archive build on arm64. I'm impressed by the speed the GCC developers have been looking at these bugs! In terms of build failures, I reported 520 bugs to Debian. Most of them were new GCC errors or warnings (some packages use -Werror and many -Werror=format-security). Here are some of the most frequent errors see: * 77: error: narrowing conversion of '128' from 'int' to 'char' inside { } [-Wnarrowing] * 73: error: no match for or error: no matching function for call to * 65: error: cannot convert 'bool' to '...' in return error: cannot convert 'std::basic_istream' to 'bool' in initialization error: could not convert 'false' from 'bool' to '...' * 36: error: statement is indented as if it were guarded by... [-Werror=misleading-indentation] * 16: error: 'xxx' defined but not used [-Werror=unused-const-variable] * 16: error: reference to '...' is ambiguous * 13: test suite failures (segfaults and similar); not clear if the package or if GCC is at fault. * 12: error: enumerator value for '...' is not an integer constant expression [-Wpedantic] * 12: error: call of overloaded ... is ambiguous * 9: 'xxx' needed for in-class initialization of static data member * 9: error: '...' was not declared in this scope * 8: error: unknown type name * 8: error: unable to find string literal operator * 8: cmake failure (cmake configure scripts fail) * 6: error: macro "..." passed 2 arguments, but takes just 1 * 3: error: invalid suffix on literal; C++11 requires a space between literal and string macro [-Werror=literal-suffix] * 3: error: template argument X is invalid [1] https://gcc.gnu.org/ml/gcc/2016-01/msg00101.html -- Martin Michlmayr Linux for HPE Helion, Hewlett Packard Enterprise
Re: Status of GCC 6 on x86_64 (Debian)
On 22.01.2016 06:09, Martin Michlmayr wrote: In terms of build failures, I reported 520 bugs to Debian. Most of them were new GCC errors or warnings (some packages use -Werror and many -Werror=format-security). Here are some of the most frequent errors see: [...] Martin tagged these issues; https://wiki.debian.org/GCC6 has links with these bug searches.