Re: GCC documentation: porting to Sphinx
On 6/23/21 6:00 PM, Joseph Myers wrote: On Wed, 23 Jun 2021, Martin Liška wrote: @Joseph: Can you share your thoughts about the used Makefile integration? What do you suggest for 2) (note that explicit listing of all .rst file would be crazy)? You can write dependencies on e.g. doc/gcc/*.rst (which might be more files than actually are relevant in some cases, if the directory includes some common files shared by some but not all manuals, but should be conservatively safe if you list appropriate directories there), rather than needing to name all the individual files. Doing things with makefile dependencies seems better than relying on what sphinx-build does when rerun unnecessarily (if sphinx-build avoids rebuilding in some cases where the makefiles think a rebuild is needed, that's fine as an optimization). All right. I've just done that and it was easier than I expected. Now the dependencies are properly followed. It looks like this makefile integration loses some of the srcinfo / srcman support. That support should stay (be updated for the use of Sphinx) so that release tarballs (as generated by maintainer-scripts/gcc_release, which uses --enable-generated-files-in-srcdir) continue to include man pages / info files (and make sure that, if those files are present in the source directory, then building and installing GCC does install them even when sphinx-build is absent at build/install time). Oh, and I've just recovered this one as well. Pushed changes to the me/sphinx-v2 branch and I'm waiting for more feedback. In the meantime, I'm going to prepare further integration of other manuals and targets (PDF, HTML). Martin
daily report on extending static analyzer project [GSoC]
CURRENT STATUS : analyzer is now splitting nodes even at call sites which doesn’t have a cgraph_edge. But as now the call and return nodes are not connected, the part of the function after such calls becomes unreachable making them impossible to properly analyse. AIM for today : - try to create an intra-procedural link between the calls the calling and returning snodes - find the place where the exploded nodes and edges are being formed - figure out the program point where exploded graph would know about the function calls — PROGRESS : - I initially tried to connect the calling and returning snodes with an intraprocedural sedge but looks like for that only nodes which have a cgraph_edge or a CFG edge are connected in the supergraph. I tried a few ways to connect them but at the end thought I would be better off leaving them like this and connecting them during the creation of exploded graph itself. - As the exploded graph is created during building and processing of the worklist, "build_initial_worklist ()” and “process_worklist()” should be the interesting areas to analyse, especially the processing part. - “build_initial_worklist()” is just creating enodes for functions that can be called explicitly ( possible entry points ) so I guess the better place to investigate is “process_worklist ()” function. — STATUS AT THE END OF THE DAY :- - try to create an intra-procedural link between the calls the calling and returning snodes ( Abandoned ) - find the place where the exploded nodes and edges are being formed ( Done ) - figure out the program point where exploded graph knows about the function call ( Pending ) Thank you - Ankur
Re: replacing the backwards threader and more
On 6/21/2021 8:40 AM, Aldy Hernandez wrote: On 6/9/21 2:09 PM, Richard Biener wrote: On Wed, Jun 9, 2021 at 1:50 PM Aldy Hernandez via Gcc wrote: Hi Jeff. Hi folks. What started as a foray into severing the old (forward) threader's dependency on evrp, turned into a rewrite of the backwards threader code. I'd like to discuss the possibility of replacing the current backwards threader with a new one that gets far more threads and can potentially subsume all threaders in the future. I won't include code here, as it will just detract from the high level discussion. But if it helps, I could post what I have, which just needs some cleanups and porting to the latest trunk changes Andrew has made. Currently the backwards threader works by traversing DEF chains through PHIs leading to possible paths that start in a constant. When such a path is found, it is checked to see if it is profitable, and if so, the constant path is threaded. The current implementation is rather limited since backwards paths must end in a constant. For example, the backwards threader can't get any of the tests in gcc.dg/tree-ssa/ssa-thread-14.c: if (a && b) foo (); if (!b && c) bar (); etc. After my refactoring patches to the threading code, it is now possible to drop in an alternate implementation that shares the profitability code (is this path profitable?), the jump registry, and the actual jump threading code. I have leveraged this to write a ranger-based threader that gets every single thread the current code gets, plus 90-130% more. Here are the details from the branch, which should be very similar to trunk. I'm presenting the branch numbers because they contain Andrew's upcoming relational query which significantly juices up the results. New threader: ethread:65043 (+3.06%) dom:32450 (-13.3%) backwards threader:72482 (+89.6%) vrp:40532 (-30.7%) Total threaded: 210507 (+6.70%) This means that the new code gets 89.6% more jump threading opportunities than the code I want to replace. In doing so, it reduces the amount of DOM threading opportunities by 13.3% and by 30.7% from the VRP jump threader. The total improvement across the jump threading opportunities in the compiler is 6.70%. However, these are pessimistic numbers... I have noticed that some of the threading opportunities that DOM and VRP now get are not because they're smarter, but because they're picking up opportunities that the new code exposes. I experimented with running an iterative threader, and then seeing what VRP and DOM could actually get. This is too expensive to do in real life, but it at least shows what the effect of the new code is on DOM/VRP's abilities: Iterative threader: ethread:65043 (+3.06%) dom:31170 (-16.7%) thread:86717 (+127%) vrp:33851 (-42.2%) Total threaded: 216781 (+9.90%) This means that the new code not only gets 127% more cases, but it reduces the DOM and VRP opportunities considerably (16.7% and 42.2% respectively). The end result is that we have the possibility of getting almost 10% more jump threading opportunities in the entire compilation run. Yeah, DOM once was iterating ... You probably have noticed that we have very man (way too many) 'thread' passes, often in close succession with each other or DOM or VRP. So in the above numbers I wonder if you can break down the numbers individually for the actual passes (in their order)? As promised. *** LEGACY: ethread42:61152 30.1369% (61152 threads for 30.1% of total) thread117:29646 14.6101% vrp118:62088 30.5982% thread132:2232 1.09997% dom133:31116 15.3346% thread197:1950 0.960998% dom198:10661 5.25395% thread200:587 0.289285% vrp201:3482 1.716% Total: 202914 The above is from current trunk with my patches applied, defaulting to legacy mode. It follows the pass number nomenclature in the *.statistics files. New threader code (This is what I envision current trunk to look with my patchset): *** RANGER: ethread42:64389 30.2242% thread117:49449 23.2114% vrp118:46118 21.6478% thread132:8153 3.82702% dom133:27168 12.7527% thread197:5542 2.60141% dom198:8191 3.84485% thread200:1038 0.487237% vrp201:2990 1.40351% Total: 213038 So this makes me think we should focus on dropping thread197, thread200, & vrp201 and I'd probably focus on vrp201 first since we know we want to get rid of it anyway and that may change the data for thread???. Then I'd be looking at thread200 and thread197 in that order. I suspect that at least some of the cases in thread200 and vrp201 are exposed by dom198. Jeff
Re: daily report on extending static analyzer project [GSoC]
On Thu, 2021-06-24 at 19:59 +0530, Ankur Saini wrote: > CURRENT STATUS : > > analyzer is now splitting nodes even at call sites which doesn’t have > a cgraph_edge. But as now the call and return nodes are not > connected, the part of the function after such calls becomes > unreachable making them impossible to properly analyse. > > AIM for today : > > - try to create an intra-procedural link between the calls the > calling and returning snodes > - find the place where the exploded nodes and edges are being formed > - figure out the program point where exploded graph would know about > the function calls > > — > > PROGRESS : > > - I initially tried to connect the calling and returning snodes with > an intraprocedural sedge but looks like for that only nodes which > have a cgraph_edge or a CFG edge are connected in the supergraph. I > tried a few ways to connect them but at the end thought I would be > better off leaving them like this and connecting them during the > creation of exploded graph itself. > > - As the exploded graph is created during building and processing of > the worklist, "build_initial_worklist ()” and “process_worklist()” > should be the interesting areas to analyse, especially the processing > part. > > - “build_initial_worklist()” is just creating enodes for functions > that can be called explicitly ( possible entry points ) so I guess > the better place to investigate is “process_worklist ()” function. Yes. Have a look at exploded_graph::process_node (which is called by process_worklist). The eedges for calls with supergraph edges happens there in the "case PK_AFTER_SUPERNODE:", which looks at the outgoing superedges from that supernode and calls node->on_edge on them, creating a exploded nodes/exploded edge for each outgoing-superedge. So you'll need to make some changes there, I think. > > — > > STATUS AT THE END OF THE DAY :- > > - try to create an intra-procedural link between the calls the > calling and returning snodes ( Abandoned ) You may find the above useful if you're going to do it based on the code I mentioned above. > - find the place where the exploded nodes and edges are being formed > ( Done ) > - figure out the program point where exploded graph knows about the > function call ( Pending ) > Thanks for the update. Hope the above is helpful. Dave
RE: [EXTERNAL] Re: State of AutoFDO in GCC
Hi Andy, I'm trying to revive autofdo testing. One of the issues I'm running into with my setup is that PEBS doesn't work for with perf record even though PEBS is enabled. I'm running Ubuntu 20.04 in a Hyper-V virtual machine; the processor is Icelake (GenuineIntel-6-7E). I did the following: 1. Enabled pmu, lbr, and pebs in my Hyper-V virtual machine as described in https://docs.microsoft.com/en-us/windows-server/virtualization/hyper-v/manage/performance-monitoring-hardware 2. Verified that pmu, lbr, and pebs are enabled in the vm by running erozen@erozen-Virtual-Machine:~/objdir/gcc$ dmesg | egrep -i 'pmu' [0.266474] Performance Events: PEBS fmt4+, Icelake events, 32-deep LBR, full-width counters, Intel PMU driver. 3. Ran erozen@erozen-Virtual-Machine:~/objdir/gcc$ perf record -e cpu/event=0xc4,umask=0x20/pu -b -m8 true -v Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu/event=0xc4,umask=0x20/pu). /bin/dmesg | grep -i perf may provide additional information. Omitting /p works fine: erozen@erozen-Virtual-Machine:~/objdir/gcc$ perf record -e cpu/event=0xc4,umask=0x20/u -b -m8 true -v [ perf record: Woken up 0 times to write data ] [ perf record: Captured and wrote 0.007 MB perf.data (11 samples) ] Is there a way to get PEBS working with perf record in a vm? I would appreciate any pointers on how to investigate this. The version of perf I'm using is 5.8.18. Thanks, Eugene -Original Message- From: Andi Kleen Sent: Friday, April 30, 2021 2:46 PM To: Eugene Rozenfeld via Gcc Cc: Xinliang David Li ; Richard Biener ; Eugene Rozenfeld ; Jan Hubicka Subject: Re: [EXTERNAL] Re: State of AutoFDO in GCC Eugene Rozenfeld via Gcc writes: > Is the format produced by create_gcov and expected by GCC under > -fauto-rpofile documented somewhere? How is it different from .gcda > used in FDO, e.g., as described here: > https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsrc.gnu-darwin.org%2Fsrc%2Fcontrib%2Fgcc%2Fgcov-io.h.html&data=04%7C01%7CEugene.Rozenfeld%40microsoft.com%7C6c14ea3d93c44364845008d90c214cb6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637554159427749575%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=q9ROiOTma41UeQ%2FQG%2BUktEOrHAWonTTpcPRPmx%2Fgw0g%3D&reserved=0? I believe it's very similar. > I would prefer that AutoFDO is not removed from GCC and it would be > helpful if create_gcov were restored in google/autofdo. I checked out > a revision before the recent merge and tried it on a simple example > and it seems to work. > I'm also interested in contributing improvements for AutoFDO so will > try to investigate > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D71672&data=04%7C01%7CEuge > ne.Rozenfeld%40microsoft.com%7C6c14ea3d93c44364845008d90c214cb6%7C72f9 > 88bf86f141af91ab2d7cd011db47%7C1%7C0%7C637554159427749575%7CUnknown%7C > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC > I6Mn0%3D%7C1000&sdata=99Igueuxq7AoHU%2B20BZs4E4K5rgdPFCiR8eygKaJdK > E%3D&reserved=0 and > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgcc. > gnu.org%2Fbugzilla%2Fshow_bug.cgi%3Fid%3D81379&data=04%7C01%7CEuge > ne.Rozenfeld%40microsoft.com%7C6c14ea3d93c44364845008d90c214cb6%7C72f9 > 88bf86f141af91ab2d7cd011db47%7C1%7C0%7C637554159427759566%7CUnknown%7C > TWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVC > I6Mn0%3D%7C1000&sdata=Es91Dtt5Wt6%2BJtPWxHhkHqdWBVwzCiF5PcuXoHjY%2 > Bzs%3D&reserved=0 That would be great. -Andi
gcc-9-20210624 is now available
Snapshot gcc-9-20210624 is now available on https://gcc.gnu.org/pub/gcc/snapshots/9-20210624/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 9 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-9 revision 9b997caa72498bc3a14a064648b721fe0f11945e You'll find: gcc-9-20210624.tar.xzComplete GCC SHA256=eeb8581533b18381da806203b6cde8c114b87a918a47da8ea8053cbbc3548925 SHA1=74ba9599eb5bf5ef8e7c353d209cd2519c9f5ebd Diffs from 9-20210617 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
__fp16 is ambiguous error in C++
#include __fp16 foo (__fp16 a, __fp16 b) { return a + std::exp(b); } compiler options: = riscv64-unknown-linux-gnu-g++ foo.c -march=rv64gc_zfh -mabi=lp64 error: == foo.c: In function '__fp16 foo(__fp16, __fp16)': foo.c:6:23: error: call of overloaded 'exp(__fp16&)' is ambiguous 6 | return a + std::exp(b); | ^ In file included from $INSTALL/sysroot/usr/include/features.h:465, from $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/riscv64-unknown-linux-gnu/bits/os_defines.h:39, from $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/riscv64-unknown-linux-gnu/bits/c++config.h:518, from $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/cmath:41, from $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/math.h:36, from foo.c:2: $INSTALL/sysroot/usr/include/bits/mathcalls.h:95:1: note: candidate: 'double exp(double)' 95 | __MATHCALL_VEC (exp,, (Mdouble __x)); | ^~ In file included from $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/math.h:36, from foo.c:2: $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/cmath:222:3: note: candidate: 'constexpr float std::exp(float)' 222 | exp(float __x) | ^~~ $INSTALL/riscv64-unknown-linux-gnu/include/c++/10.2.0/cmath:226:3: note: candidate: 'constexpr long double std::exp(long double)' 226 | exp(long double __x) | ^~~ I think there is no prototype of __fp16 in libmath of glibc, I could cast '__fp16' to 'float' or 'double' to fix this issue with modifying code, it's not invisible for developers :( Is there any other method to fix this ? Maybe there is some c++ compiler's option for this ? — Jojo