Re: [testsuite] What's the expected behaviour of dg-require-effective-target shared?
On Fri, 21 Jun 2019 at 16:28, Iain Sandoe wrote: > > Hi Christophe, > > we’ve been looking at some cases where Darwin tests fail or pass unexpectedly > depending on > options. It came as a surprise to see it failing a test for shared support > (since it’s always > supported shared libs). > > - > > It’s a long time ago, but in r216117 you added this to target-supports. > > # Return 1 if -shared is supported, as in no warnings or errors > # emitted, 0 otherwise. > > proc check_effective_target_shared { } { > # Note that M68K has a multilib that supports -fpic but not > # -fPIC, so we need to check both. We test with a program that > # requires GOT references. > return [check_no_compiler_messages shared executable { >extern int foo (void); extern int bar; > int baz (void) { return foo () + bar; } > } "-shared -fpic” > } > > > > The thing is that this is testing two things: > > 1) if the target consumes -shared -fpic without warning > > 2) assuming that those cause a shared lib to be made it also tests that > the target will allow a link of that to complete with undefined symbols. > > So Darwin *does* support “-shared -fpic” and is very happy to make > shared libraries. However, it doesn’t (by default) allow undefined symbols > in the link. > > So my question is really about the intent of the test: > > if the intent is to see if the target supports shared libs, then we should > arrange for Darwin to pass - either by hardwiring it (since all Darwin > versions do support shared) or by adding suitable options to suppress > the error. > > if the intent is to check that the target supports linking a shared lib with > undefined external symbols, then perhaps we need a different test for the > “just supports shared libs” > > === The patch was posted in https://gcc.gnu.org/ml/gcc-patches/2014-10/msg00596.html so it's really a matter of testing whether shared libs are supported. > (note, also the comment doesn’t match what’s actually done, but that’s prob > a cut & pasto). Indeed looks like I cut & pasted too much from check_effective_target_fpic Thanks, Christophe > thanks > Iain > > > >
Re: [PATCH] Add .gnu.lto_.meta section.
On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: > > On 6/21/19 2:57 PM, Jan Hubicka wrote: > > This looks like good step (and please stream it in host independent > > way). I suppose all these issues can be done one-by-one. > > So there's a working patch for that. However one will see following errors > when using an older compiler or older LTO bytecode: > > $ gcc main9.o -flto > lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO > version -25480.4493 instead of the expected 9.0 > > $ gcc main.o > lto1: internal compiler error: compressed stream: data error This is because of your change to bitfields or because with the old scheme the header with the version is compressed (is it?). I'd simply avoid any layout changes in the version check range. > To be honest, I would prefer the new .gnu.lto_.meta section. > Richi why is that so ugly? Because it's a change in the wrong direction and doesn't solve the issue we already have (cannot determine if a section is compressed or not). ELF section overhead is quite big if you have lots of small functions. Richard. > > Martin
Re: [PATCH] let hash-based containers work with non-trivial types (PR 90923)
On Fri, Jun 21, 2019 at 7:17 PM Martin Sebor wrote: > > On 6/21/19 6:06 AM, Richard Biener wrote: > > On Wed, Jun 19, 2019 at 5:15 AM Martin Sebor wrote: > >> > >> Bug 90923 shows that even though GCC hash-table based containers > >> like hash_map can be instantiated on types with user-defined ctors > >> and dtors they invoke the dtors of such types without invoking > >> the corresponding ctors. > >> > >> It was thanks to this bug that I spent a day debugging "interesting" > >> miscompilations during GCC bootstrap (in fairness, it was that and > >> bug 90904 about auto_vec copy assignment/construction also being > >> hosed even for POD types). > >> > >> The attached patch corrects the hash_map and hash_set templates > >> to invoke the ctors of the elements they insert and makes them > >> (hopefully) safe to use with non-trivial user-defined types. > > > > Hum. I am worried about the difference of assignment vs. construction > > in ::put() > > > > + bool ins = hash_entry::is_empty (*e); > > + if (ins) > > + { > > + e->m_key = k; > > + new ((void *) &e->m_value) Value (v); > > + } > > + else > > + e->m_value = v; > > > > why not invoke the dtor on the old value and then the ctor again? > > It wouldn't work for self-assignment: > >Value &y = m.get_or_insert (key); >m.put (key, y); > > The usual rule of thumb for writes into containers is to use > construction when creating a new element and assignment when > replacing the value of an existing element. > > Which reminds me that the hash containers, despite being copy- > constructible (at least for POD types, they aren't for user- > defined types), also aren't safe for assignment even for PODs. > I opened bug 90959 for this. Until the assignment is fixed > I made it inaccessibe in the patch (I have fixed the copy > ctor to DTRT for non-PODs). > > > How is an empty hash_entry constructed? > > In hash_table::find_slot_with_hash simply by finding an empty > slot and returning a pointer to it. The memory for the slot > is marked "empty" by calling the Traits::mark_empty() function. > > The full type of hash_map is actually > >hash_map Value, > simple_hashmap_traits, > Value> > > and simple_hashmap_traits delegates it to default_hash_traits > whose mark_empty() just clears the void*, leaving the Value > part uninitialized. That makes sense because we don't want > to call ctors for empty entries. I think the questions one > might ask if one were to extend the design are: a) what class > should invoke the ctor/assignment and b) should it do it > directly or via the traits? > > > ::remove() doesn't > > seem to invoke the dtor either, instead it relies on the > > traits::remove function? > > Yes. There is no Traits::construct or assign or copy. We > could add them but I'm not sure I see to what end (there could > be use cases, I just don't know enough about these classes to > think of any). > > Attached is an updated patch with the additional minor fixes > mentioned above. > > Martin > > PS I think we would get much better results by adopting > the properly designed and tested standard library containers > than by spending time trying to improve the design of these > legacy classes. For simple uses that don't need to integrate > with the GC machinery the standard containers should be fine > (plus, it'd provide us with greater motivation to improve > them and the code GCC emits for their uses). Unfortunately, > to be able to use the hash-based containers we would need to > upgrade to C++ 11. Isn't it time yet? I don't think so. I'm also not sure if C++11 on its own is desirable or if it should be C++14 or later at that point. SLES 12 has GCC 4.8 as host compiler (but also GCC 7+ optionally), SLES 15 has GCC 7. SLES 11 already struggles currently (GCC 4.3) but I'd no longer consider that important enough. Note any such change to libstdc++ containers should be complete and come with both code-size and compile-time and memory-usage measurements (both of GCC and other apps of course). Richard.
Re: [PATCH] Add .gnu.lto_.meta section.
On 6/24/19 2:02 PM, Richard Biener wrote: > On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: >> >> On 6/21/19 2:57 PM, Jan Hubicka wrote: >>> This looks like good step (and please stream it in host independent >>> way). I suppose all these issues can be done one-by-one. >> >> So there's a working patch for that. However one will see following errors >> when using an older compiler or older LTO bytecode: >> >> $ gcc main9.o -flto >> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO >> version -25480.4493 instead of the expected 9.0 >> >> $ gcc main.o >> lto1: internal compiler error: compressed stream: data error > > This is because of your change to bitfields or because with the old > scheme the header with the > version is compressed (is it?). Because currently also the header is compressed. > I'd simply avoid any layout changes > in the version check range. Well, then we have to find out how to distinguish between compression algorithms. > >> To be honest, I would prefer the new .gnu.lto_.meta section. >> Richi why is that so ugly? > > Because it's a change in the wrong direction and doesn't solve the > issue we already > have (cannot determine if a section is compressed or not). That's not true, the .gnu.lto_.meta section will be always uncompressed and we can also backport changes to older compiler that can read it and print a proper error message about LTO bytecode version mismatch. > ELF section overhead > is quite big if you have lots of small functions. My patch is actually shrinking space as I'm suggesting to add _one_ extra ELF section and remove the section header from all other LTO sections. That will save space for all function sections. Martin > > Richard. > >> >> Martin
Re: [PATCH] Add .gnu.lto_.meta section.
On Mon, Jun 24, 2019 at 2:12 PM Martin Liška wrote: > > On 6/24/19 2:02 PM, Richard Biener wrote: > > On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: > >> > >> On 6/21/19 2:57 PM, Jan Hubicka wrote: > >>> This looks like good step (and please stream it in host independent > >>> way). I suppose all these issues can be done one-by-one. > >> > >> So there's a working patch for that. However one will see following errors > >> when using an older compiler or older LTO bytecode: > >> > >> $ gcc main9.o -flto > >> lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO > >> version -25480.4493 instead of the expected 9.0 > >> > >> $ gcc main.o > >> lto1: internal compiler error: compressed stream: data error > > > > This is because of your change to bitfields or because with the old > > scheme the header with the > > version is compressed (is it?). > > Because currently also the header is compressed. That was it, yeah :/ Stupid decisions in the past. I guess we have to bite the bullet and do this kind of incompatible change, accepting the odd error message above. > > I'd simply avoid any layout changes > > in the version check range. > > Well, then we have to find out how to distinguish between compression > algorithms. > > > > >> To be honest, I would prefer the new .gnu.lto_.meta section. > >> Richi why is that so ugly? > > > > Because it's a change in the wrong direction and doesn't solve the > > issue we already > > have (cannot determine if a section is compressed or not). > > That's not true, the .gnu.lto_.meta section will be always uncompressed and > we can > also backport changes to older compiler that can read it and print a proper > error > message about LTO bytecode version mismatch. We can always backport changes, yes, but I don't see why we have to. > > ELF section overhead > > is quite big if you have lots of small functions. > > My patch is actually shrinking space as I'm suggesting to add _one_ extra ELF > section > and remove the section header from all other LTO sections. That will save > space > for all function sections. But we want the header there to at least say if the section is compressed or not. The fact that we have so many ELF section means we have the redundant version info everywhere. We should have a single .gnu.lto_ section (and also get rid of those __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for existence of a symbol is more expensive compared to existence of a section). Richard. > Martin > > > > > Richard. > > > >> > >> Martin >
Parallelize GCC with Threads -- First Evaluation
Hi, Parallelize GCC with Threads -- First Evaluation Hi everyone, I am attaching the first evaluation report here publicly for gathering feedback. The file is in markdown format and it can be easily be converted to PDF for better visualization. I am also open to suggestions and ideas in order to improve the current project :-) My branch can be seen here: https://gitlab.com/flusp/gcc/tree/giulianob_parallel Giuliano # Parallelize GCC with Threads: First Evaluation Report ## Tasks accomplished until the First Evaluation: These are the tasks which I accomplished until now: - Refactor `cgraph_node::expand()` and `expand_all_functions`, splitting IPA, GIMPLE and RTL passes - Some documentation about the global states of the GIMPLE passes. And therefore accomplishing all itens planned to the first evaluation. Here I explain some technical challenges I was put through and architectural decisions that I've made. ## Chapter 1 -- Splitting `node::expand`. This chapter discusses the the technical issues found when splitting the `node::expand` into `node::expand_ipa`, `node::expand`, and `node::expand_rtl`. The first step was to split `all_passes` into `all_passes` and `all_rtl_passes`. This was straightforward, I had to modify the `passes.def` file, add a new pass list there and call the constructor in `passes.c`. Unfortunately I could not reuse the `pass_rest_of_compilation`, as sugested by Richi, therefore I've created a new pass. The next step was to split `node::expand` into `node::expand_ipa`, `node::expand` and `node::expand_rtl`. These names are according to the folowing distinction: - `expand_ipa`, which are related with the Inter Process Analysis (IPA). - `expand`: which are Intra Process GIMPLE passes. (`all_passes`) - `expand_rtl`: which are Intra Process RTL passes (`all_rtl_passes`). This was somewhat straightforward, since I just had to store the `location_t input_location` in the `struct function`. Unfortunately, executing `node::expand_ipa` in every function before `node::expand`, and `node::expand_rtl` on every function after `node::expand` was way harder since I had to modify lots of things such as: 1. The `default_obstack`: There is a pass called `pass_fixup_cfg` which is called in both `pass_local_optimization_pases` and `all_passes`. This pass uses a global shared object called `default_obstack`, which is modified every time it runs into a new function. The solution was to create a local obstack to each function, storing it in the `struct function`. Unfortunately, this broke several tests related to IPA, which made me define a new compiler state called `EXPAND_IPA`, then select the default obstack when the compiler is in this state, or the local obstack when it is in EXPAND state, fixing the issue. 2. Adjust the cfun stack to push the cfun before executing the passes, and poping it when it finishes. I've done it to `node::expand_ipa`, `node::expand`, and `node::expand_rtl` 3. Set the `gimple_register_cfg_hooks` when node::expand is called 4. Store the `input_location` in the `struct function` This allowed me to refactor the loop in `expand_all_functions`: ``` FOR_EACH_CGRAPH_NODE(node) { node->expand_ipa(); node->expand(); node->expand_rtl(); } ``` into: ``` FOR_EACH_CGRAPH_NODE(node) node->expand_ipa(); FOR_EACH_CGRAPH_NODE(node) node->expand(); FOR_EACH_CGRAPH_NODE(node) node->expand_rtl(); ``` Enabling me to proceed with the parallelization. ## Chapter 2: -- Parallelization Decisions After the work done in Chapter 1, I started working on how I will parallelize the compiler. Richi suggested a pipelined architecture: split the passes into threads, and pass the cfun across it. When one thread finishes executing its pass to one cfun, it sends the cfun to the next thread, which holds the next pass, until all passes are over. This has the advantage of not requiring a per-pass analysis of global states but has the disadvantage of being slower (just imagine some passes being slower than every other passes). What I propose is to compile functions in parallel. Since at this point we are executing Intra Process optimizations, this is theoretically possible. The issue is that this is way harder to implement since it requires looking into the global states of every local pass. With this in mind, I created a `worker_set` structure that implements a producer-consumer threadsafe queue: One thread (the master) inserts all cfuns into a queue, and the other threads (the workers) are responsible to remove its the cfuns from the queue and start processing. When a thread finishes its work, it will automatically remove the next function from the queue until it finds a NULL function, which represents that there is no more work in the queue. Regarding implementations details, I am using pthreads and unix semaphores. After this structure was finished, I m
Re: [PATCH] Add .gnu.lto_.meta section.
On 6/24/19 2:44 PM, Richard Biener wrote: > On Mon, Jun 24, 2019 at 2:12 PM Martin Liška wrote: >> >> On 6/24/19 2:02 PM, Richard Biener wrote: >>> On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: On 6/21/19 2:57 PM, Jan Hubicka wrote: > This looks like good step (and please stream it in host independent > way). I suppose all these issues can be done one-by-one. So there's a working patch for that. However one will see following errors when using an older compiler or older LTO bytecode: $ gcc main9.o -flto lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO version -25480.4493 instead of the expected 9.0 $ gcc main.o lto1: internal compiler error: compressed stream: data error >>> >>> This is because of your change to bitfields or because with the old >>> scheme the header with the >>> version is compressed (is it?). >> >> Because currently also the header is compressed. > > That was it, yeah :/ Stupid decisions in the past. > > I guess we have to bite the bullet and do this kind of incompatible > change, accepting > the odd error message above. > >>> I'd simply avoid any layout changes >>> in the version check range. >> >> Well, then we have to find out how to distinguish between compression >> algorithms. >> >>> To be honest, I would prefer the new .gnu.lto_.meta section. Richi why is that so ugly? >>> >>> Because it's a change in the wrong direction and doesn't solve the >>> issue we already >>> have (cannot determine if a section is compressed or not). >> >> That's not true, the .gnu.lto_.meta section will be always uncompressed and >> we can >> also backport changes to older compiler that can read it and print a proper >> error >> message about LTO bytecode version mismatch. > > We can always backport changes, yes, but I don't see why we have to. I'm fine with the backward compatibility break. But we should also consider lto-plugin.c that is parsing following 2 sections: 91 #define LTO_SECTION_PREFIX ".gnu.lto_.symtab" 92 #define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) 93 #define OFFLOAD_SECTION ".gnu.offload_lto_.opts" 94 #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1) > >>> ELF section overhead >>> is quite big if you have lots of small functions. >> >> My patch is actually shrinking space as I'm suggesting to add _one_ extra >> ELF section >> and remove the section header from all other LTO sections. That will save >> space >> for all function sections. > > But we want the header there to at least say if the section is > compressed or not. > The fact that we have so many ELF section means we have the redundant version > info everywhere. > > We should have a single .gnu.lto_ section (and also get rid of those > __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for > existence of a symbol is more expensive compared to existence > of a section). I like removal of the 2 aforementioned sections. To be honest I would recommend to add a new .gnu.lto_.meta section. We can use it instead of __gnu_lto_v1 and we can have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to concatenate all LTO_section_function_body, LTO_section_static_initializer sections into a single one. That will require an index that will have to be created. I can discuss that with Honza as he suggested using something smarter than function names. Thoughts? Martin > > Richard. > >> Martin >> >>> >>> Richard. >>> Martin >>
Re: [PATCH] Add .gnu.lto_.meta section.
> On 24 Jun 2019, at 14:31, Martin Liška wrote: > > On 6/24/19 2:44 PM, Richard Biener wrote: >> On Mon, Jun 24, 2019 at 2:12 PM Martin Liška wrote: >>> >>> On 6/24/19 2:02 PM, Richard Biener wrote: On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: > > On 6/21/19 2:57 PM, Jan Hubicka wrote: >> This looks like good step (and please stream it in host independent >> way). I suppose all these issues can be done one-by-one. > > So there's a working patch for that. However one will see following errors > when using an older compiler or older LTO bytecode: > > $ gcc main9.o -flto > lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO > version -25480.4493 instead of the expected 9.0 > > $ gcc main.o > lto1: internal compiler error: compressed stream: data error This is because of your change to bitfields or because with the old scheme the header with the version is compressed (is it?). >>> >>> Because currently also the header is compressed. >> >> That was it, yeah :/ Stupid decisions in the past. >> >> I guess we have to bite the bullet and do this kind of incompatible >> change, accepting >> the odd error message above. >> I'd simply avoid any layout changes in the version check range. >>> >>> Well, then we have to find out how to distinguish between compression >>> algorithms. >>> > To be honest, I would prefer the new .gnu.lto_.meta section. > Richi why is that so ugly? Because it's a change in the wrong direction and doesn't solve the issue we already have (cannot determine if a section is compressed or not). >>> >>> That's not true, the .gnu.lto_.meta section will be always uncompressed and >>> we can >>> also backport changes to older compiler that can read it and print a proper >>> error >>> message about LTO bytecode version mismatch. >> >> We can always backport changes, yes, but I don't see why we have to. > > I'm fine with the backward compatibility break. But we should also consider > lto-plugin.c > that is parsing following 2 sections: > >91 #define LTO_SECTION_PREFIX ".gnu.lto_.symtab" >92 #define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) >93 #define OFFLOAD_SECTION ".gnu.offload_lto_.opts" >94 #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1) > >> ELF section overhead is quite big if you have lots of small functions. >>> >>> My patch is actually shrinking space as I'm suggesting to add _one_ extra >>> ELF section >>> and remove the section header from all other LTO sections. That will save >>> space >>> for all function sections. >> >> But we want the header there to at least say if the section is >> compressed or not. >> The fact that we have so many ELF section means we have the redundant version >> info everywhere. >> >> We should have a single .gnu.lto_ section (and also get rid of those >> __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for >> existence of a symbol is more expensive compared to existence >> of a section). > > I like removal of the 2 aforementioned sections. To be honest I would > recommend to > add a new .gnu.lto_.meta section. We can use it instead of __gnu_lto_v1 and > we can > have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to > concatenate all > > LTO_section_function_body, > LTO_section_static_initializer > > sections into a single one. That will require an index that will have to be > created. I can discuss > that with Honza as he suggested using something smarter than function names. I already implemented a scheme (using three sections: INDEX, NAMES, PAYLOAD) for Mach-O - since it doesn’t have unlimited section count - it works - and hardly rocket science ;) - if one were to import the tabular portion of that at the start of a section and then the variable portion as a trailer … it could all be a single section. iain > Martin > >> >> Richard. >> >>> Martin >>> Richard. > > Martin
Re: [PATCH] let hash-based containers work with non-trivial types (PR 90923)
On 6/24/19 6:11 AM, Richard Biener wrote: On Fri, Jun 21, 2019 at 7:17 PM Martin Sebor wrote: On 6/21/19 6:06 AM, Richard Biener wrote: On Wed, Jun 19, 2019 at 5:15 AM Martin Sebor wrote: Bug 90923 shows that even though GCC hash-table based containers like hash_map can be instantiated on types with user-defined ctors and dtors they invoke the dtors of such types without invoking the corresponding ctors. It was thanks to this bug that I spent a day debugging "interesting" miscompilations during GCC bootstrap (in fairness, it was that and bug 90904 about auto_vec copy assignment/construction also being hosed even for POD types). The attached patch corrects the hash_map and hash_set templates to invoke the ctors of the elements they insert and makes them (hopefully) safe to use with non-trivial user-defined types. Hum. I am worried about the difference of assignment vs. construction in ::put() + bool ins = hash_entry::is_empty (*e); + if (ins) + { + e->m_key = k; + new ((void *) &e->m_value) Value (v); + } + else + e->m_value = v; why not invoke the dtor on the old value and then the ctor again? It wouldn't work for self-assignment: Value &y = m.get_or_insert (key); m.put (key, y); The usual rule of thumb for writes into containers is to use construction when creating a new element and assignment when replacing the value of an existing element. Which reminds me that the hash containers, despite being copy- constructible (at least for POD types, they aren't for user- defined types), also aren't safe for assignment even for PODs. I opened bug 90959 for this. Until the assignment is fixed I made it inaccessibe in the patch (I have fixed the copy ctor to DTRT for non-PODs). How is an empty hash_entry constructed? In hash_table::find_slot_with_hash simply by finding an empty slot and returning a pointer to it. The memory for the slot is marked "empty" by calling the Traits::mark_empty() function. The full type of hash_map is actually hash_map, Value> and simple_hashmap_traits delegates it to default_hash_traits whose mark_empty() just clears the void*, leaving the Value part uninitialized. That makes sense because we don't want to call ctors for empty entries. I think the questions one might ask if one were to extend the design are: a) what class should invoke the ctor/assignment and b) should it do it directly or via the traits? ::remove() doesn't seem to invoke the dtor either, instead it relies on the traits::remove function? Yes. There is no Traits::construct or assign or copy. We could add them but I'm not sure I see to what end (there could be use cases, I just don't know enough about these classes to think of any). Attached is an updated patch with the additional minor fixes mentioned above. Martin PS I think we would get much better results by adopting the properly designed and tested standard library containers than by spending time trying to improve the design of these legacy classes. For simple uses that don't need to integrate with the GC machinery the standard containers should be fine (plus, it'd provide us with greater motivation to improve them and the code GCC emits for their uses). Unfortunately, to be able to use the hash-based containers we would need to upgrade to C++ 11. Isn't it time yet? I don't think so. I'm also not sure if C++11 on its own is desirable or if it should be C++14 or later at that point. SLES 12 has GCC 4.8 as host compiler (but also GCC 7+ optionally), SLES 15 has GCC 7. SLES 11 already struggles currently (GCC 4.3) but I'd no longer consider that important enough. Note any such change to libstdc++ containers should be complete and come with both code-size and compile-time and memory-usage measurements (both of GCC and other apps of course). Can I go ahead and commit the patch? Martin
RE: gcc: -ftest-coverage and -auxbase
> From: Martin Liška > Sent: Thursday, June 20, 2019 9:49 AM > I see. What about the following patch: > > $ ./gcc/xgcc -Bgcc /tmp/main.c --coverage -fprofile-note=/tmp/main.gcno $ > ls -l /tmp/main.gcno > -rw-r--r-- 1 marxin users 428 Jun 20 15:48 /tmp/main.gcno Thanks. That did the trick. Next, I will be looking at modifying GCC to do one or both of two things -- . put the coverage variables into a section named on the command line instead of .data. The .data section would remain the default. . when requested on the command line, initialize the instrumentation variables at run time instead of at compile time. Compile time would remain the default. I haven't looked into the implementation of either, but the former feels easy. The latter is much more useful for us, but I expect it is also significantly harder. My thinking is that I will implement the former and will look into how hard the latter is. Depending on the outcome / other demands on my time, I might implement it as well. Background (or why do I want this): Our use is embedded. We disallow .data. Specialized data, for example, read-only data or data that goes into a different named section, is allowed. (Part of why we disallow .data is that we support warm boot, not just cold boot.) If neither was implemented, then for files currently without .data, we could alter the linker script. This would mean that files with .data would be uninstrumentable. Not ideal. If the former was implemented, then any file which is not time critical nor containing code called by the instrumentation code would be instrumentable. If the latter was implemented, then that would facilitate clearing the data on demand. Since one of the goals is not just to collect coverage data but to also learn what coverage is provided by each test or collection of tests, that is greatly facilitated by being able to clear the accumulated results on demand. A cold boot would take 20-30 minutes and you might need to 'set up' the tests...
Re: [PATCH] let hash-based containers work with non-trivial types (PR 90923)
On Mon, Jun 24, 2019 at 4:35 PM Martin Sebor wrote: > > On 6/24/19 6:11 AM, Richard Biener wrote: > > On Fri, Jun 21, 2019 at 7:17 PM Martin Sebor wrote: > >> > >> On 6/21/19 6:06 AM, Richard Biener wrote: > >>> On Wed, Jun 19, 2019 at 5:15 AM Martin Sebor wrote: > > Bug 90923 shows that even though GCC hash-table based containers > like hash_map can be instantiated on types with user-defined ctors > and dtors they invoke the dtors of such types without invoking > the corresponding ctors. > > It was thanks to this bug that I spent a day debugging "interesting" > miscompilations during GCC bootstrap (in fairness, it was that and > bug 90904 about auto_vec copy assignment/construction also being > hosed even for POD types). > > The attached patch corrects the hash_map and hash_set templates > to invoke the ctors of the elements they insert and makes them > (hopefully) safe to use with non-trivial user-defined types. > >>> > >>> Hum. I am worried about the difference of assignment vs. construction > >>> in ::put() > >>> > >>> + bool ins = hash_entry::is_empty (*e); > >>> + if (ins) > >>> + { > >>> + e->m_key = k; > >>> + new ((void *) &e->m_value) Value (v); > >>> + } > >>> + else > >>> + e->m_value = v; > >>> > >>> why not invoke the dtor on the old value and then the ctor again? > >> > >> It wouldn't work for self-assignment: > >> > >> Value &y = m.get_or_insert (key); > >> m.put (key, y); > >> > >> The usual rule of thumb for writes into containers is to use > >> construction when creating a new element and assignment when > >> replacing the value of an existing element. > >> > >> Which reminds me that the hash containers, despite being copy- > >> constructible (at least for POD types, they aren't for user- > >> defined types), also aren't safe for assignment even for PODs. > >> I opened bug 90959 for this. Until the assignment is fixed > >> I made it inaccessibe in the patch (I have fixed the copy > >> ctor to DTRT for non-PODs). > >> > >>> How is an empty hash_entry constructed? > >> > >> In hash_table::find_slot_with_hash simply by finding an empty > >> slot and returning a pointer to it. The memory for the slot > >> is marked "empty" by calling the Traits::mark_empty() function. > >> > >> The full type of hash_map is actually > >> > >> hash_map >> Value, > >> simple_hashmap_traits, > >>Value> > >> > >> and simple_hashmap_traits delegates it to default_hash_traits > >> whose mark_empty() just clears the void*, leaving the Value > >> part uninitialized. That makes sense because we don't want > >> to call ctors for empty entries. I think the questions one > >> might ask if one were to extend the design are: a) what class > >> should invoke the ctor/assignment and b) should it do it > >> directly or via the traits? > >> > >>> ::remove() doesn't > >>> seem to invoke the dtor either, instead it relies on the > >>> traits::remove function? > >> > >> Yes. There is no Traits::construct or assign or copy. We > >> could add them but I'm not sure I see to what end (there could > >> be use cases, I just don't know enough about these classes to > >> think of any). > >> > >> Attached is an updated patch with the additional minor fixes > >> mentioned above. > >> > >> Martin > >> > >> PS I think we would get much better results by adopting > >> the properly designed and tested standard library containers > >> than by spending time trying to improve the design of these > >> legacy classes. For simple uses that don't need to integrate > >> with the GC machinery the standard containers should be fine > >> (plus, it'd provide us with greater motivation to improve > >> them and the code GCC emits for their uses). Unfortunately, > >> to be able to use the hash-based containers we would need to > >> upgrade to C++ 11. Isn't it time yet? > > > > I don't think so. I'm also not sure if C++11 on its own is desirable > > or if it should be C++14 or later at that point. SLES 12 has GCC 4.8 > > as host compiler (but also GCC 7+ optionally), SLES 15 has GCC 7. > > SLES 11 already struggles currently (GCC 4.3) but I'd no longer > > consider that important enough. > > > > Note any such change to libstdc++ containers should be complete > > and come with both code-size and compile-time and memory-usage > > measurements (both of GCC and other apps of course). > > Can I go ahead and commit the patch? I think we need to document the requirements on Value classes better. @@ -177,7 +185,10 @@ public: INSERT); bool ins = Traits::is_empty (*e); if (ins) - e->m_key = k; + { + e->m_key = k; + new ((void *)&e->m_value) Value (); + } this now requires a default constructor and I always forget about differences between the different form of init
GSoC Project: Make C/C++ not automatically promote memory_order_consume to memory_order_acquire
Hello everyone, We are working on the project to not allow memory_order_consume gets automatically promoted to memory_order_acquire. In implementing memory_order_consume, the problem comes when tracing dependencies at the C/C++ source level. We are first targeting to trace the dependencies only for the pointer types at C source level. To mark the pointers at C source level, we thought of introducing a new qualifier _Dependent_ptr, for only pointer type variables. We believe adding this qualifier will allow us to trace dependencies and not break any of the pointer optimizations. For start, we have implemented the _Dependent_ptr qualifier in the GCC C front-end. The patch is given in this link: https://github.com/AKG001/gcc/commit/fb4187bc3872a50880159232cf336f0a03505fa8 and given below also. We have also prepared some testcases for checking whether _Dependent_ptr parses correctly or not. A simple test case is present on this https://github.com/AKG001/gcc/commit/193a040df649d773fab5cb5e2698e9729dfd2b97 Some litmus test cases from the document ( http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf) are present in this commit https://github.com/AKG001/gcc/commit/b4c3c55b11597bde8ea5b297343ab349caedb6a4 For litmus test cases, the file name convention is: _.c Full set of testcases are present in this repository https://github.com/AKG001/test/. The .original files are also present for each test case. Command used to generate .original file: ./cc1 -fdump-tree-original .c We request the community to give us their valuable feedback. Also, let us know if something is wrong or can be done in a better way. Thanks, Akshat --- commit fb4187bc3872a50880159232cf336f0a03505fa8 Author: Akshat Garg Date: Sat Jun 8 15:06:23 2019 +0530 Add _Dependent_ptr qualifier diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 4057be3..6d37851 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -345,6 +345,7 @@ const struct c_common_resword c_common_reswords[] = { "_Alignas", RID_ALIGNAS, D_CONLY }, { "_Alignof", RID_ALIGNOF, D_CONLY }, { "_Atomic", RID_ATOMIC,D_CONLY }, + { "_Dependent_ptr", RID_DEPENDENT_PTR, 0 }, { "_Bool", RID_BOOL, D_CONLY }, { "_Complex", RID_COMPLEX, 0 }, { "_Imaginary", RID_IMAGINARY, D_CONLY }, @@ -7845,6 +7846,7 @@ keyword_is_type_qualifier (enum rid keyword) case RID_VOLATILE: case RID_RESTRICT: case RID_ATOMIC: +case RID_DEPENDENT_PTR: return true; default: return false; diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h index 1cf2cae..a4c7e03 100644 --- a/gcc/c-family/c-common.h +++ b/gcc/c-family/c-common.h @@ -68,7 +68,7 @@ enum rid RID_UNSIGNED, RID_LONG,RID_CONST, RID_EXTERN, RID_REGISTER, RID_TYPEDEF, RID_SHORT, RID_INLINE, RID_VOLATILE, RID_SIGNED, RID_AUTO, RID_RESTRICT, - RID_NORETURN, RID_ATOMIC, + RID_NORETURN, RID_ATOMIC, RID_DEPENDENT_PTR, /* C extensions */ RID_COMPLEX, RID_THREAD, RID_SAT, diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c index b10599f..3ac3a21 100644 --- a/gcc/c-family/c-format.c +++ b/gcc/c-family/c-format.c @@ -3189,6 +3189,7 @@ check_format_types (const substring_loc &fmt_loc, && (TYPE_READONLY (cur_type) || TYPE_VOLATILE (cur_type) || TYPE_ATOMIC (cur_type) +|| TYPE_DEPENDENT_PTR (cur_type) || TYPE_RESTRICT (cur_type))) warning (OPT_Wformat_, "extra type qualifiers in format " "argument (argument %d)", diff --git a/gcc/c-family/c-pretty-print.c b/gcc/c-family/c-pretty-print.c index 3e25624..e034a8f 100644 --- a/gcc/c-family/c-pretty-print.c +++ b/gcc/c-family/c-pretty-print.c @@ -181,6 +181,8 @@ pp_c_cv_qualifiers (c_pretty_printer *pp, int qualifiers, bool func_type) if (qualifiers & TYPE_QUAL_ATOMIC) pp_c_ws_string (pp, "_Atomic"); + if (qualifiers & TYPE_QUAL_DEPENDENT_PTR) +pp_c_ws_string (pp, "_Dependent_ptr"); if (qualifiers & TYPE_QUAL_CONST) pp_c_ws_string (pp, func_type ? "__attribute__((const))" : "const"); if (qualifiers & TYPE_QUAL_VOLATILE) diff --git a/gcc/c/c-aux-info.c b/gcc/c/c-aux-info.c index 96bb2e2..6632447 100644 --- a/gcc/c/c-aux-info.c +++ b/gcc/c/c-aux-info.c @@ -284,6 +284,8 @@ gen_type (const char *ret_val, tree t, formals_style style) case POINTER_TYPE: if (TYPE_ATOMIC (t)) ret_val = concat ("_Atomic ", ret_val, NULL); + if (TYPE_DEPENDENT_PTR (t)) +ret_val = concat ("_Dependent_ptr ", ret_val, NULL); if (TYPE_READONLY (t)) ret_val = concat ("const ", ret_val, NULL); if (TYPE_VOLATILE (t)) @@ -427,6 +429,8 @@ gen_type (const char *ret_val, tree t, formals_style style) } if (TYPE_ATOMIC (t)) ret_val = concat ("_Atomic ", ret_val, NULL); + if (TYPE_DEPENDENT_PTR (t)) +ret_val = concat ("_Dependent_ptr ", ret_
Re: [PATCH] Add .gnu.lto_.meta section.
On Mon, Jun 24, 2019 at 3:31 PM Martin Liška wrote: > > On 6/24/19 2:44 PM, Richard Biener wrote: > > On Mon, Jun 24, 2019 at 2:12 PM Martin Liška wrote: > >> > >> On 6/24/19 2:02 PM, Richard Biener wrote: > >>> On Fri, Jun 21, 2019 at 4:01 PM Martin Liška wrote: > > On 6/21/19 2:57 PM, Jan Hubicka wrote: > > This looks like good step (and please stream it in host independent > > way). I suppose all these issues can be done one-by-one. > > So there's a working patch for that. However one will see following > errors > when using an older compiler or older LTO bytecode: > > $ gcc main9.o -flto > lto1: fatal error: bytecode stream in file ‘main9.o’ generated with LTO > version -25480.4493 instead of the expected 9.0 > > $ gcc main.o > lto1: internal compiler error: compressed stream: data error > >>> > >>> This is because of your change to bitfields or because with the old > >>> scheme the header with the > >>> version is compressed (is it?). > >> > >> Because currently also the header is compressed. > > > > That was it, yeah :/ Stupid decisions in the past. > > > > I guess we have to bite the bullet and do this kind of incompatible > > change, accepting > > the odd error message above. > > > >>> I'd simply avoid any layout changes > >>> in the version check range. > >> > >> Well, then we have to find out how to distinguish between compression > >> algorithms. > >> > >>> > To be honest, I would prefer the new .gnu.lto_.meta section. > Richi why is that so ugly? > >>> > >>> Because it's a change in the wrong direction and doesn't solve the > >>> issue we already > >>> have (cannot determine if a section is compressed or not). > >> > >> That's not true, the .gnu.lto_.meta section will be always uncompressed > >> and we can > >> also backport changes to older compiler that can read it and print a > >> proper error > >> message about LTO bytecode version mismatch. > > > > We can always backport changes, yes, but I don't see why we have to. > > I'm fine with the backward compatibility break. But we should also consider > lto-plugin.c > that is parsing following 2 sections: > > 91 #define LTO_SECTION_PREFIX ".gnu.lto_.symtab" > 92 #define LTO_SECTION_PREFIX_LEN (sizeof (LTO_SECTION_PREFIX) - 1) > 93 #define OFFLOAD_SECTION ".gnu.offload_lto_.opts" > 94 #define OFFLOAD_SECTION_LEN (sizeof (OFFLOAD_SECTION) - 1) Yeah, I know. And BFD and gold hard-coded those __gnu_lto_{v1,slim} symbols... > > > >>> ELF section overhead > >>> is quite big if you have lots of small functions. > >> > >> My patch is actually shrinking space as I'm suggesting to add _one_ extra > >> ELF section > >> and remove the section header from all other LTO sections. That will save > >> space > >> for all function sections. > > > > But we want the header there to at least say if the section is > > compressed or not. > > The fact that we have so many ELF section means we have the redundant > > version > > info everywhere. > > > > We should have a single .gnu.lto_ section (and also get rid of those > > __gnu_lto_v1 and __gnu_lto_slim COMMON symbols - checking for > > existence of a symbol is more expensive compared to existence > > of a section). > > I like removal of the 2 aforementioned sections. To be honest I would > recommend to > add a new .gnu.lto_.meta section. Why .meta? Why not just .gnu.lto_? > We can use it instead of __gnu_lto_v1 and we can > have a flag there instead of __gnu_lto_slim. As a second step, I'm willing to > concatenate all > > LTO_section_function_body, > LTO_section_static_initializer > > sections into a single one. That will require an index that will have to be > created. I can discuss > that with Honza as he suggested using something smarter than function names. I think the index belongs to symtab? Let's properly do it if we want to change it. Removing of __gnu_lto_v1/slim is going to be the most intrusive change btw. and orthogonal to the section changes. Richard. > > Thoughts? > Martin > > > > > Richard. > > > >> Martin > >> > >>> > >>> Richard. > >>> > > Martin > >> >
Re: Parallelize GCC with Threads -- First Evaluation
On 2019-06-24 8:59 a.m., Giuliano Belinassi wrote: > Hi, > > Parallelize GCC with Threads -- First Evaluation > > Hi everyone, > > I am attaching the first evaluation report here publicly for gathering > feedback. The file is in markdown format and it can be easily be converted to > PDF for better visualization. > > I am also open to suggestions and ideas in order to improve the current > project :-) > > My branch can be seen here: > https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > Giuliano > Guiliano, Three things first your original proposal was just for expand_all_functions so don't know if it's extended out now but there's other parts in my research so the title was a little confusing. I'm assuming this is outside the scope of the current project but does your all_rtl_passes function help out with architecture specific tuning flags as it seems that my research states that's one area of shared state. In addition for memory this may be really hard to do but can you have a signaler that tells each phase what data to pass on therefore notifying what needs to be passed on to the next pass. So if expand_all_functions needs to pass x the signaler will notify the pass and just swap the values into the pass lock less if possible or just kill it off if not. This would mean writing a GENERIC to RTL final passes signaler which may take too long considering the scope of the project. Again that's just off the top of my head so it may be a really bad idea, Nick P.S. Good luck through.
Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
Hi, I'm not very familiar with the gomp plugin system. However, looking at 'GOMP_PLUGIN_target_task_completion' seem like tasks have to go in and out of the runtime. In that case, is it right that the tasks have to know from which queue they came from? I think I'll have to add the id of the corresponding queue of each task in the gomp_task structure. Ray Kim
Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
On Tue, Jun 25, 2019 at 04:55:17AM +0900, 김규래 wrote: > I'm not very familiar with the gomp plugin system. > However, looking at 'GOMP_PLUGIN_target_task_completion' seem like tasks have > to go in and out of the runtime. > In that case, is it right that the tasks have to know from which queue they > came from? > I think I'll have to add the id of the corresponding queue of each task in > the gomp_task structure. While libgomp has a plugin system, the only supported plugins are those in the tree, i.e. libgomp/plugin/plugin-{hsa,nvptx}.c and liboffloadmic/plugin/* nvptx plugin doesn't have async support ATM, so it is just hsa and xeonphi offloading that can call it when an asynchronous target region execution is over. No matter in which task queue the task is, gomp_target_task_completion needs to ensure that if something already waits on it (taskwait, taskgroup end, barrier, dependency wait), that it is awaken. And, like for other parts of task.c, there needs to be a design what lock is used to protect any code that needs to be guarded. The current code as you know uses team->task_lock as a big lock, I think with the separate workqueues + work stealing you need per implicit task lock + the per team (team->task_lock), design the locking such that there is no ABBA deadlock possibility and that you use the team task lock only when really necessary (not sure, but perhaps one example where I really don't see much way to avoid the per team lock are task dependencies, the hash table for that etc.). Jakub
Re: [PATCH] let hash-based containers work with non-trivial types (PR 90923)
On 6/24/19 11:42 AM, Richard Biener wrote: On Mon, Jun 24, 2019 at 4:35 PM Martin Sebor wrote: On 6/24/19 6:11 AM, Richard Biener wrote: On Fri, Jun 21, 2019 at 7:17 PM Martin Sebor wrote: On 6/21/19 6:06 AM, Richard Biener wrote: On Wed, Jun 19, 2019 at 5:15 AM Martin Sebor wrote: Bug 90923 shows that even though GCC hash-table based containers like hash_map can be instantiated on types with user-defined ctors and dtors they invoke the dtors of such types without invoking the corresponding ctors. It was thanks to this bug that I spent a day debugging "interesting" miscompilations during GCC bootstrap (in fairness, it was that and bug 90904 about auto_vec copy assignment/construction also being hosed even for POD types). The attached patch corrects the hash_map and hash_set templates to invoke the ctors of the elements they insert and makes them (hopefully) safe to use with non-trivial user-defined types. Hum. I am worried about the difference of assignment vs. construction in ::put() + bool ins = hash_entry::is_empty (*e); + if (ins) + { + e->m_key = k; + new ((void *) &e->m_value) Value (v); + } + else + e->m_value = v; why not invoke the dtor on the old value and then the ctor again? It wouldn't work for self-assignment: Value &y = m.get_or_insert (key); m.put (key, y); The usual rule of thumb for writes into containers is to use construction when creating a new element and assignment when replacing the value of an existing element. Which reminds me that the hash containers, despite being copy- constructible (at least for POD types, they aren't for user- defined types), also aren't safe for assignment even for PODs. I opened bug 90959 for this. Until the assignment is fixed I made it inaccessibe in the patch (I have fixed the copy ctor to DTRT for non-PODs). How is an empty hash_entry constructed? In hash_table::find_slot_with_hash simply by finding an empty slot and returning a pointer to it. The memory for the slot is marked "empty" by calling the Traits::mark_empty() function. The full type of hash_map is actually hash_map, Value> and simple_hashmap_traits delegates it to default_hash_traits whose mark_empty() just clears the void*, leaving the Value part uninitialized. That makes sense because we don't want to call ctors for empty entries. I think the questions one might ask if one were to extend the design are: a) what class should invoke the ctor/assignment and b) should it do it directly or via the traits? ::remove() doesn't seem to invoke the dtor either, instead it relies on the traits::remove function? Yes. There is no Traits::construct or assign or copy. We could add them but I'm not sure I see to what end (there could be use cases, I just don't know enough about these classes to think of any). Attached is an updated patch with the additional minor fixes mentioned above. Martin PS I think we would get much better results by adopting the properly designed and tested standard library containers than by spending time trying to improve the design of these legacy classes. For simple uses that don't need to integrate with the GC machinery the standard containers should be fine (plus, it'd provide us with greater motivation to improve them and the code GCC emits for their uses). Unfortunately, to be able to use the hash-based containers we would need to upgrade to C++ 11. Isn't it time yet? I don't think so. I'm also not sure if C++11 on its own is desirable or if it should be C++14 or later at that point. SLES 12 has GCC 4.8 as host compiler (but also GCC 7+ optionally), SLES 15 has GCC 7. SLES 11 already struggles currently (GCC 4.3) but I'd no longer consider that important enough. Note any such change to libstdc++ containers should be complete and come with both code-size and compile-time and memory-usage measurements (both of GCC and other apps of course). Can I go ahead and commit the patch? I think we need to document the requirements on Value classes better. @@ -177,7 +185,10 @@ public: INSERT); bool ins = Traits::is_empty (*e); if (ins) - e->m_key = k; + { + e->m_key = k; + new ((void *)&e->m_value) Value (); + } this now requires a default constructor and I always forget about differences between the different form of initializations -- for a POD, does this zero the entry? Yes, the /value initialization/ above (via Value ()) zeroes out PODs, and it doesn't need a ctor. The placement new syntax should not require any changes to existing code. Does this updated comment for hash_map make it clearer? /* KeyId must be a trivial (POD) type. Value may be non-trivial (non-POD). Inserted elements are value-initialized either to zero for POD types or by invoking their default ctor. Removed elements are destroyed by inv