[PATCH] [RFC] Prime path coverage in gcc/gcov

Jørgen Kvalsvik Tue, 04 Jun 2024 12:47:54 -0700

This patch adds prime path coverage to gcc/gcov. It is a bit rough in a few
places, but I think all the main components are there and ready for some
feedback while I keep working on the details. First a quick introduction to
path coverage, before I explain a bit on the pieces of the patch and on what's
missing.


PRIME PATHS

Path coverage is recording the paths taken through the program. Here is a
simple example:

if (cond1)  BB 1
  then1 ()  BB 2
else
  else1 ()  BB 3

if (cond2)  BB 4
  then2 ()  BB 5
else
  else2 ()  BB 6

_           BB 7

To cover all paths you must run {then1 then2}, {then1 else2}, {else1 then1},
{else1 else2}. This is in contrast with line/statement coverage where it is
sufficient to execute then2, and it does not matter if it was reached through
then1 or else1.

1 2 4 5 7
1 2 4 6 7
1 3 4 5 7
1 3 4 6 7

This gets more complicated with loops, because 0, 1, 2, ..., N iterations are
all different paths. There are different ways of addressing this, a promising
one being prime paths. A prime path is a simple path (a path with no repeated
vertices except for the first/last in a cycle) that does not appear as a subpath
of any other simple path. Prime paths seem to strike a decent balance between
number of tests, path growth, and loop coverage. Of course, the number of paths
still grows very fast with program complexity - for example, this program has
14 prime paths:

  while (a)
    {
      if (b)
        return;
      while (c--)
        a++;
    }

--

ALGORITHM

Since the numbers of paths grows so fast, we need a good algorithm. The naive
approach of generating all paths and discarding redundancies (see
reference_prime_paths in the diff) simply doesn't complete for even pretty
simple functions with a few ten thousand paths (granted, the implementation is
also poor, but only serves as a reference). Fazli & Afsharchi in their paper
"Time and Space-Efficient Compositional Method for Prime and Test Paths
Generation from describe a neat algorithm which drastically improves on this
and brings complexity down to something managable. This patch implements that
algorithm with a few minor tweaks.

The algorithm first finds the strongly connected components (SCC) of the graph
and creates a new graph where the vertices are the SCCs of the CFG. Within
these vertices different paths are found - regular prime paths, paths that
start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC
paths are combined with paths through the CFG which greatly reduces of paths
needed to be evaluated just to be thrown away.

Using this algorithm we can generate the prime paths for somewhat complicated
functions in a reasonable time. This is the prime_paths function. Please note
that some paths don't benefit from this at all. We need to find the prime paths
within a SCC, so if a single SCC is very large the function degenerates to the
naive implementation. Improving on this is an exercise for the future.

--

OVERALL ARCHITECTURE

Like the other coverages in gcc, this operates on the CFG in the profiling
phase, just after branch and condition coverage, in phases:

1. All prime paths are generated, counted, and enumerated
2. The paths are evaluted and counter instructions and accumulators are
   emitted
3. gcov reads the CFG and computes the prime paths
4. gcov gives its report

Simply writing out all the paths in the .gcno file is not really
practical, the files would be too big. Additionally, there are limits to the
practicality of measuring (and reporting) on millions of paths, so for most
programs where coverage is feasible, computing paths should be plenty fast. As
a result, path coverage really only adds 1 bit to the counter, rounded up to
nearest 64, so 64 paths takes up 8 bytes, 65 paths take up 16 bytes.

Recording paths is really just massaging large bitsets. Per function,
ceil(paths/64) buckets (uint64_t) are allocated. Paths are sorted, so the first
path maps to the lowest bit, the second path to the second lowest bit, and so
on. On taking an edge and entering a basic block, the a few bitmasks are
applied to unset the bits corresponding to the paths outside the block, and to
set the bits of the paths that start in that block. Finally, the right buckets
are masked and written to the global accumulators for the paths that end in the
block. Full coverage is achieved when all bits are set.

--

IMPLEMENTATION

In order to remove non-prime paths (subpaths) I use a non-clever suffix tree,
by inserting all subpaths into a trie. Fazli & Afsharchi do not discuss how
duplicates or subpaths are removed, and using the trie turned out to work
really well. The same prime_paths function is used both in gcc and in gcov
which meant adding some more objects in Makefile.in.

As for speed, I would say that it is acceptable (but see missing pieces on
knobs). It is a problem that is combinatorial in its very nature, so if you
enable this feature you can reasonably expect it taking a while. My main
benchmark tree.c generates approx 2m paths across the 20 functions or so in it
(where most functions have less than 1500 paths, and 2 around a million
each). Finding the paths takes 3.5-4s, but the instrumentation phase takes
approx. 9 minutes and generates a 50M binary. Not bad for a 1429 line source
file.

There are some selftests which deconstruct the algorithm, so it can be easily
referenced with the Fazli & Afsharchi. I hope that including them both help to
catch regression, clarify the assumptions, and help understanding the algorithm
by breaking up the phases.

The gcov.cc stuff is not in great shape, and reporting is still a bit
primitive. It has served very well as a starting point, but it is probably not
worth reviewing too much for style and neatness currently. Here is a snapshot
of what it currently looks like - I am very much open for ideas on how to
improve it.

$ gcc -fpath-coverage --coverage bs.c -o bs
$ gcov -et bs

paths covered 0 of 17
(edges) path  0 not covered: bs.c:6 bs.c:6(true) bs.c:11(true) bs.c:12
(edges) path  1 not covered: bs.c:6 bs.c:6(true) bs.c:11(false) bs.c:13(true) 
bs.c:14
(edges) path  2 not covered: bs.c:6 bs.c:6(true) bs.c:11(false) bs.c:13(false) 
bs.c:16 (unknown)
(edges) path  3 not covered: bs.c:6 bs.c:6(false) bs.c:18 (unknown)
(edges) path  4 not covered: bs.c:11(true) bs.c:12 bs.c:6(true) bs.c:11
(edges) path  5 not covered: bs.c:11(true) bs.c:12 bs.c:6(false) bs.c:18 
(unknown)
(edges) path  6 not covered: bs.c:11(false) bs.c:13(true) bs.c:14 bs.c:6(true) 
bs.c:11
(edges) path  7 not covered: bs.c:11(false) bs.c:13(true) bs.c:14 bs.c:6(false) 
bs.c:18 (unknown)
(edges) path  8 not covered: bs.c:12 bs.c:6(true) bs.c:11(true) bs.c:12
(edges) path  9 not covered: bs.c:12 bs.c:6(true) bs.c:11(false) bs.c:13(true) 
bs.c:14
(edges) path 10 not covered: bs.c:12 bs.c:6(true) bs.c:11(false) bs.c:13(false) 
bs.c:16 (unknown)
(edges) path 11 not covered: bs.c:13(true) bs.c:14 bs.c:6(true) bs.c:11(true) 
bs.c:12
(edges) path 12 not covered: bs.c:13(true) bs.c:14 bs.c:6(true) bs.c:11(false) 
bs.c:13
(edges) path 13 not covered: bs.c:14 bs.c:6(true) bs.c:11(false) bs.c:13(true) 
bs.c:14
(edges) path 14 not covered: bs.c:14 bs.c:6(true) bs.c:11(false) bs.c:13(false) 
bs.c:16 (unknown)
(edges) path 15 not covered: bs.c:6(true) bs.c:11(true) bs.c:12 bs.c:6
(edges) path 16 not covered: bs.c:6(true) bs.c:11(false) bs.c:13(true) bs.c:14 
bs.c:6
    #####:    1:int binary_search(int a[], int len, int from, int to, int key)
        -:    2:{
    #####:    3:    int low = from;
    #####:    4:    int high = to - 1;
        -:    5:
    #####:    6:    while (low <= high)
        -:    7:    {
    #####:    8:        int mid = (low + high) >> 1;
    #####:    9:        long midVal = a[mid];
        -:   10:
    #####:   11:        if (midVal < key)
    #####:   12:            low = mid + 1;
    #####:   13:        else if (midVal > key)
    #####:   14:            high = mid - 1;
        -:   15:        else
    #####:   16:            return mid; // key found
        -:   17:    }
    #####:   18:    return -1;
        -:   19:}

--

MISSING PIECES AND OPEN QUESTIONS

Tuning. There is no tuning available right now, but that won't work in
practice. For example, building tree
https://gitlab.com/OldManProgrammer/unix-tree with path-coverage can work for
tree.c, but really doesn't finish list.c. I plan to add a few knobs for when to
give up coverage, such as maybe size-of-graph, complexity estimation
(e.g. longest chain of vertices in an SCC), or maybe even a timeout (e.g. give
up after consdering 250k paths). None of this is done now. I am open to ideas
on what to limit on and what the flags should be.

Suffix trees. Implementing a trie was a massive simplification, and I think
maybe a better suffix tree implementation could speed things up even
more. Ukkonnen's algorithm came to mind, but I found it too complicated to be
worth it while working out the rest of the program.

Porting off STL. I might port the instrumentation code off the C++ Standard
Library. I am familiar with them and it made it very easy to prototype the
instrumentation function in prime-paths.cc, but seeing it now it should work
well with hash-map.h and friends.

Reporting. The gcov stuff needs some more polish.

Types. An idea is to return something like the trie, rather than first rendering
it to a vec<vec<int>>. While it does save a bit of work and might even open up
for a few things, it does put a slightly larger burden on the headers.

--

NOTES AND QUESTIONS

The compile times are brutal. Obviously finding prime paths could be faster,
but right now the real bottleneck is actually instrumentation. I have tracked
it down the worst offender, this line in path-coverage.cc:

   gassign *ga4 = gimple_build_assign (unshare_expr (counter), tmp3);

It does not seem to be the unshare_expr (), but rather actually writing to the
counter. Granted, for some functions we have to emit a million of these, but it
brings compile times up in 9+ minutes. If I remove the write to the counter, or
write to some other SSA, gcc finishes in under 40s.  I don't know what is going
on here, why is it so much slower to write to the counter?
---
 gcc/Makefile.in                        |    6 +-
 gcc/builtins.cc                        |    2 +-
 gcc/collect2.cc                        |    5 +-
 gcc/common.opt                         |    4 +
 gcc/gcc.cc                             |    4 +-
 gcc/gcov-counter.def                   |    3 +
 gcc/gcov-io.h                          |    3 +
 gcc/gcov.cc                            |  228 ++-
 gcc/ipa-inline.cc                      |    2 +-
 gcc/passes.cc                          |    4 +-
 gcc/path-coverage.cc                   |  301 ++++
 gcc/prime-paths.cc                     | 2015 ++++++++++++++++++++++++
 gcc/profile.cc                         |   11 +-
 gcc/selftest-run-tests.cc              |    1 +
 gcc/selftest.h                         |    1 +
 gcc/testsuite/gcc.misc-tests/gcov-25.c |  311 ++++
 gcc/testsuite/lib/gcov.exp             |   95 +-
 gcc/tree-profile.cc                    |   11 +-
 18 files changed, 2990 insertions(+), 17 deletions(-)
 create mode 100644 gcc/path-coverage.cc
 create mode 100644 gcc/prime-paths.cc
 create mode 100644 gcc/testsuite/gcc.misc-tests/gcov-25.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a74761b7ab3..8ed6dcdea01 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1618,6 +1618,8 @@ OBJS = \
        opts-global.o \
        ordered-hash-map-tests.o \
        passes.o \
+       prime-paths.o \
+       path-coverage.o \
        plugin.o \
        pointer-query.o \
        postreload-gcse.o \
@@ -2867,6 +2869,8 @@ GTFILES = $(CPPLIB_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/tree-scalar-evolution.cc \
   $(srcdir)/tree-ssa-operands.h \
   $(srcdir)/tree-profile.cc $(srcdir)/tree-nested.cc \
+  $(srcdir)/path-coverage.cc \
+  $(srcdir)/prime-paths.cc \
   $(srcdir)/omp-offload.h \
   $(srcdir)/omp-general.cc \
   $(srcdir)/omp-low.cc \
@@ -3243,7 +3247,7 @@ s-version: build/genversion$(build_exeext)
 # gcov.o needs $(ZLIBINC) added to the include flags.
 CFLAGS-gcov.o += $(ZLIBINC)
 
-GCOV_OBJS = gcov.o json.o
+GCOV_OBJS = gcov.o json.o graphds.o prime-paths.o bitmap.o sbitmap.o
 gcov$(exeext): $(GCOV_OBJS) $(LIBDEPS)
        +$(LINKER) $(ALL_LINKERFLAGS) $(LDFLAGS) $(GCOV_OBJS) \
                hash-table.o ggc-none.o $(LIBS) $(ZLIB) -o $@
diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index f8d94c4b435..75a19ccdefe 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -6329,7 +6329,7 @@ expand_builtin_fork_or_exec (tree fn, tree exp, rtx 
target, int ignore)
   tree call;
 
   /* If we are not profiling, just call the function.  */
-  if (!profile_arc_flag && !condition_coverage_flag)
+  if (!profile_arc_flag && !condition_coverage_flag && !path_coverage_flag)
     return NULL_RTX;
 
   /* Otherwise call the wrapper.  This should be equivalent for the rest of
diff --git a/gcc/collect2.cc b/gcc/collect2.cc
index 902014a9cc1..8be949d117a 100644
--- a/gcc/collect2.cc
+++ b/gcc/collect2.cc
@@ -1035,9 +1035,9 @@ main (int argc, char **argv)
       lto_mode = LTO_MODE_LTO;
   }
 
-  /* -fno-profile-arcs -fno-condition-coverage -fno-test-coverage
+  /* -fno-profile-arcs -fno-condition-coverage -fno-path-coverage 
-fno-test-coverage
      -fno-branch-probabilities -fno-exceptions -w -fno-whole-program */
-  num_c_args += 7;
+  num_c_args += 8;
 
   c_argv = XCNEWVEC (char *, num_c_args);
   c_ptr = CONST_CAST2 (const char **, char **, c_argv);
@@ -1234,6 +1234,7 @@ main (int argc, char **argv)
   obstack_free (&temporary_obstack, temporary_firstobj);
   *c_ptr++ = "-fno-profile-arcs";
   *c_ptr++ = "-fno-condition-coverage";
+  *c_ptr++ = "-fno-path-coverage";
   *c_ptr++ = "-fno-test-coverage";
   *c_ptr++ = "-fno-branch-probabilities";
   *c_ptr++ = "-fno-exceptions";
diff --git a/gcc/common.opt b/gcc/common.opt
index ad348844775..2a8e3b1d712 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2385,6 +2385,10 @@ foptimize-sibling-calls
 Common Var(flag_optimize_sibling_calls) Optimization
 Optimize sibling and tail recursive calls.
 
+fpath-coverage
+Common Var(path_coverage_flag)
+Insert path profiling code.
+
 fpartial-inlining
 Common Var(flag_partial_inlining) Optimization
 Perform partial inlining.
diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 728332b8153..f4c4e60de17 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -1165,7 +1165,7 @@ proper position among the other output files.  */
        %:include(libgomp.spec)%(link_gomp)}\
     %{fgnu-tm:%:include(libitm.spec)%(link_itm)}\
     " STACK_SPLIT_SPEC "\
-    %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:-lgcov} " 
SANITIZER_SPEC " \
+    
%{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:-lgcov}
 " SANITIZER_SPEC " \
     %{!nostdlib:%{!r:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}}\
     %{!nostdlib:%{!r:%{!nostartfiles:%E}}} %{T*}  \n%(post_link) }}}}}}"
 #endif
@@ -1288,7 +1288,7 @@ static const char *cc1_options =
  %{!fsyntax-only:%{S:%W{o*}%{!o*:-o %w%b.s}}}\
  %{fsyntax-only:-o %j} %{-param*}\
  %{coverage:-fprofile-arcs -ftest-coverage}\
- %{fprofile-arcs|fcondition-coverage|fprofile-generate*|coverage:\
+ 
%{fprofile-arcs|fcondition-coverage|fpath-coverage|fprofile-generate*|coverage:\
    %{!fprofile-update=single:\
      %{pthread:-fprofile-update=prefer-atomic}}}";
 
diff --git a/gcc/gcov-counter.def b/gcc/gcov-counter.def
index 45d4b3eb0c8..65404135c81 100644
--- a/gcc/gcov-counter.def
+++ b/gcc/gcov-counter.def
@@ -52,3 +52,6 @@ DEF_GCOV_COUNTER(GCOV_TIME_PROFILER, "time_profiler", 
_time_profile)
 
 /* Conditions.  The counter is interpreted as a bit-set.  */
 DEF_GCOV_COUNTER(GCOV_COUNTER_CONDS, "conditions", _ior)
+
+/* Paths.  The counter is interpreted as a bit-set.  */
+DEF_GCOV_COUNTER(GCOV_COUNTER_PATHS, "ior", _ior)
diff --git a/gcc/gcov-io.h b/gcc/gcov-io.h
index 5dc467c92b1..623ca56899a 100644
--- a/gcc/gcov-io.h
+++ b/gcc/gcov-io.h
@@ -264,6 +264,9 @@ typedef uint64_t gcov_type_unsigned;
 #define GCOV_TAG_CONDS            ((gcov_unsigned_t)0x01470000)
 #define GCOV_TAG_CONDS_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
 #define GCOV_TAG_CONDS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE) / 2)
+#define GCOV_TAG_PATHS            ((gcov_unsigned_t)0x01490000)
+#define GCOV_TAG_PATHS_LENGTH(NUM) ((NUM) * GCOV_WORD_SIZE)
+#define GCOV_TAG_PATHS_NUM(LENGTH) (((LENGTH) / GCOV_WORD_SIZE))
 #define GCOV_TAG_LINES          ((gcov_unsigned_t)0x01450000)
 #define GCOV_TAG_COUNTER_BASE   ((gcov_unsigned_t)0x01a10000)
 #define GCOV_TAG_COUNTER_LENGTH(NUM) ((NUM) * 2 * GCOV_WORD_SIZE)
diff --git a/gcc/gcov.cc b/gcc/gcov.cc
index b800c9bc939..ffda8b2adc8 100644
--- a/gcc/gcov.cc
+++ b/gcc/gcov.cc
@@ -47,6 +47,7 @@ along with Gcov; see the file COPYING3.  If not see
 #include "pretty-print.h"
 #include "json.h"
 #include "hwint.h"
+#include "graphds.h"
 
 #include <zlib.h>
 #include <getopt.h>
@@ -83,6 +84,7 @@ class function_info;
 class block_info;
 class source_info;
 class condition_info;
+class path_info;
 
 /* Describes an arc between two basic blocks.  */
 
@@ -128,6 +130,22 @@ struct arc_info
   struct arc_info *pred_next;
 };
 
+class path_info
+{
+public:
+  path_info () : paths (), covered () {}
+
+  vector<vector<unsigned>> paths;
+  vector<unsigned> covered;
+
+  unsigned covered_paths () const {
+    unsigned cnt = 0;
+    for (unsigned v : covered)
+      cnt += popcount_hwi (v);
+    return cnt;
+  }
+};
+
 /* Describes which locations (lines and files) are associated with
    a basic block.  */
 
@@ -315,6 +333,7 @@ public:
   unsigned blocks_executed;
 
   vector<condition_info*> conditions;
+  path_info paths;
 
   /* Raw arc coverage counts.  */
   vector<gcov_type> counts;
@@ -399,6 +418,9 @@ struct coverage_info
   int calls_executed;
 
   char *name;
+
+  unsigned paths;
+  unsigned paths_covered;
 };
 
 /* Describes a file mentioned in the block graph.  Contains an array
@@ -601,6 +623,13 @@ static bool flag_conditions = 0;
 /* Show unconditional branches too.  */
 static int flag_unconditional = 0;
 
+/* Output path coverage.  */
+static bool flag_prime_paths = false;
+
+static bool flag_prime_paths_basic = false;
+static bool flag_prime_paths_edges = false;
+static bool flag_prime_paths_source = false;
+
 /* Output a gcov file if this is true.  This is on by default, and can
    be turned off by the -n option.  */
 
@@ -702,9 +731,11 @@ static unsigned find_source (const char *);
 static void read_graph_file (void);
 static int read_count_file (void);
 static void solve_flow_graph (function_info *);
+static void find_prime_paths (function_info *fn);
 static void find_exception_blocks (function_info *);
 static void add_branch_counts (coverage_info *, const arc_info *);
 static void add_condition_counts (coverage_info *, const block_info *);
+static void add_path_counts (coverage_info *, const function_info *);
 static void add_line_counts (coverage_info *, function_info *);
 static void executed_summary (unsigned, unsigned);
 static void function_summary (const coverage_info *);
@@ -980,6 +1011,7 @@ print_usage (int error_p)
                                     rather than percentages\n");
   fnotice (file, "  -g, --conditions                Include modified 
condition/decision\n\
                                     coverage in output\n");
+  fnotice (file, "  -e, --prime-paths               Show prime path 
coverage\n");
   fnotice (file, "  -d, --display-progress          Display progress 
information\n");
   fnotice (file, "  -D, --debug                            Display debugging 
dumps\n");
   fnotice (file, "  -f, --function-summaries        Output summaries for each 
function\n");
@@ -1034,6 +1066,12 @@ static const struct option options[] =
   { "branch-probabilities", no_argument,       NULL, 'b' },
   { "branch-counts",        no_argument,       NULL, 'c' },
   { "conditions",          no_argument,       NULL, 'g' },
+  /* The easier implementatio is --prime-paths-{report} */
+  { "prime-paths",         no_argument,       NULL, 'e' },
+  { "prime-paths-edges",    no_argument,       NULL, 900 },
+  { "prime-paths-basic",    no_argument,       NULL, 901 },
+  { "prime-paths-source",   no_argument,       NULL, 902 },
+  { "prime-paths-all",     no_argument,       NULL, 903 },
   { "json-format",         no_argument,       NULL, 'j' },
   { "human-readable",      no_argument,       NULL, 'H' },
   { "no-output",            no_argument,       NULL, 'n' },
@@ -1062,7 +1100,7 @@ process_args (int argc, char **argv)
 {
   int opt;
 
-  const char *opts = "abcdDfghHijklmno:pqrs:tuvwx";
+  const char *opts = "abcdDefghHijklmno:pqrs:tuvwx";
   while ((opt = getopt_long (argc, argv, opts, options, NULL)) != -1)
     {
       switch (opt)
@@ -1076,6 +1114,25 @@ process_args (int argc, char **argv)
        case 'c':
          flag_counts = 1;
          break;
+       case 'e':
+       case 900:
+         flag_prime_paths = true;
+         flag_prime_paths_edges = true;
+         break;
+       case 901:
+         flag_prime_paths = true;
+         flag_prime_paths_basic = true;
+         break;
+       case 902:
+         flag_prime_paths = true;
+         flag_prime_paths_source = true;
+         break;
+       case 903:
+         flag_prime_paths = true;
+         flag_prime_paths_edges = true;
+         flag_prime_paths_basic = true;
+         flag_prime_paths_source = true;
+         break;
        case 'f':
          flag_function_summary = 1;
          break;
@@ -1552,6 +1609,9 @@ process_all_functions (void)
          solve_flow_graph (fn);
          if (fn->has_catch)
            find_exception_blocks (fn);
+
+         /* if path-cover */
+         find_prime_paths (fn);
        }
       else
        {
@@ -2085,6 +2145,14 @@ read_graph_file (void)
              info->n_terms = gcov_read_unsigned ();
              fn->conditions[i] = info;
            }
+    }
+      else if (fn && tag == GCOV_TAG_PATHS)
+       {
+         unsigned npaths = gcov_read_unsigned ();
+         size_t nchunks = (npaths + ((sizeof (gcov_type) * 8) - 1))
+           / (sizeof (gcov_type) * 8);
+
+         fn->paths.covered.assign (nchunks, 0);
        }
       else if (fn && tag == GCOV_TAG_LINES)
        {
@@ -2241,6 +2309,16 @@ read_count_file (void)
            for (ix = 0; ix != fn->counts.size (); ix++)
              fn->counts[ix] += gcov_read_counter ();
        }
+      else if (tag == GCOV_TAG_FOR_COUNTER (GCOV_COUNTER_PATHS) && fn)
+       {
+         length = abs (read_length);
+         if (length != GCOV_TAG_COUNTER_LENGTH (fn->paths.covered.size ()))
+           goto mismatch;
+
+         if (read_length > 0)
+           for (ix = 0; ix != fn->paths.covered.size (); ix++)
+             fn->paths.covered[ix] = gcov_read_counter ();
+       }
       if (read_length < 0)
        read_length = 0;
       gcov_sync (base, read_length);
@@ -2524,6 +2602,37 @@ solve_flow_graph (function_info *fn)
       }
 }
 
+/* Find the prime paths of the function from the CFG and add to FN.  */
+static void
+find_prime_paths (function_info *fn)
+{
+  if (!flag_prime_paths)
+    return;
+
+  struct graph *cfg = new_graph (fn->blocks.size ());
+  for (block_info& block : fn->blocks)
+    {
+      cfg->vertices[block.id].data = &block;
+      for (arc_info *arc = block.succ; arc; arc = arc->succ_next)
+       add_edge (cfg, arc->src->id, arc->dst->id)->data = arc;
+    }
+
+  vec<vec<int>> prime_paths (struct graph*);
+  vec<vec<int>> paths = prime_paths (cfg);
+  for (vec<int>& path : paths)
+    {
+      if (path[path.length () - 1] == EXIT_BLOCK)
+       path.pop ();
+      if (path[0] == ENTRY_BLOCK)
+       path.ordered_remove (0);
+      fn->paths.paths.emplace_back (path.begin (), path.end ());
+    }
+
+  for (vec<int>& path : paths)
+    path.release ();
+  paths.release ();
+}
+
 /* Mark all the blocks only reachable via an incoming catch.  */
 
 static void
@@ -2584,6 +2693,13 @@ add_condition_counts (coverage_info *coverage, const 
block_info *block)
   coverage->conditions_covered += block->conditions.popcount ();
 }
 
+static void
+add_path_counts (coverage_info *coverage, const function_info *fn)
+{
+  coverage->paths += fn->paths.paths.size ();
+  coverage->paths_covered = fn->paths.covered_paths ();
+}
+
 /* Format COUNT, if flag_human_readable_numbers is set, return it human
    readable format.  */
 
@@ -2987,6 +3103,8 @@ accumulate_line_counts (source_info *src)
     {
       function_info *fn = *it;
 
+      add_path_counts (&src->coverage, fn);
+
       if (fn->src != src->index || !fn->is_group)
        continue;
 
@@ -3118,6 +3236,112 @@ output_branch_count (FILE *gcov_file, int ix, const 
arc_info *arc)
   return 1;
 }
 
+static void
+print_source_line (FILE *f, const vector<const char *> &source_lines,
+                  unsigned line);
+
+static int
+output_path_count (FILE *gcov_file, const function_info *fn,
+                  const vector<const char*>& lines)
+{
+  if (!flag_prime_paths)
+    return 0;
+
+  /* This not-executed nonsense would be better if allocated() wrote 0 instead
+     of 1. Why doesn't it? */
+  if (fn->blocks_executed > 0 && fn->paths.covered_paths () > 0)
+    fnotice (gcov_file, "paths covered %d of %d\n",
+           (int)fn->paths.covered_paths (),
+           (int)fn->paths.paths.size ());
+  else
+    fnotice (gcov_file, "paths covered 0 of %d\n",
+           (int)fn->paths.paths.size ());
+
+  for (size_t i = 0; i != fn->paths.paths.size (); ++i)
+    {
+      const auto& path = fn->paths.paths.at (i);
+      size_t bucket = i / (sizeof (gcov_type) * 8);
+      size_t bit = i % (sizeof (gcov_type) * 8);
+      bool iscovered = fn->paths.covered.at (bucket) & (1ULL << bit);
+      if (iscovered) continue;
+
+      if (flag_prime_paths_basic)
+       {
+         /* TODO: rename arg/flag basic-blocks */
+         fprintf (gcov_file, "(basic) path %2zu not covered:", i);
+         for (unsigned v : path)
+           fprintf (gcov_file, " %d", v);
+         fprintf (gcov_file, "\n");
+       }
+
+      if (flag_prime_paths_edges)
+       {
+         fprintf (gcov_file, "(edges) path %2zu not covered:", i);
+         for (size_t k = 1; k != path.size (); ++k)
+           {
+             auto& block = fn->blocks[path[k-1]];
+             gcc_assert (block.id == path[k-1]);
+             arc_info *arc = nullptr;
+             for (arc = block.succ; arc; arc = arc->succ_next)
+               {
+                 if (arc->src->id != path[k-1])
+                   printf ("ERROR: src is %u, not %u\n", arc->src->id, 
path[k-1]);
+                 if (arc->dst->id == path[k])
+                   break;
+               }
+             gcc_assert (arc);
+
+             if (block.locations.empty ())
+               fprintf (gcov_file, " (unknown)");
+             else
+               {
+               for (auto& loc : block.locations)
+                 fprintf (gcov_file, " %s:%u", 
sources[loc.source_file_idx].name, loc.lines.back ());
+               }
+             if (arc->true_value)
+               fprintf (gcov_file, "(true)");
+             else if (arc->false_value)
+               fprintf (gcov_file, "(false)");
+           }
+         auto& block = fn->blocks[path.back ()];
+         if (block.locations.empty ())
+           fprintf (gcov_file, " (unknown)");
+         else
+           {
+             for (auto& loc : block.locations)
+               fprintf (gcov_file, " %s:%u", 
sources[loc.source_file_idx].name, loc.lines.back ());
+           }
+         fprintf (gcov_file, "\n");
+       }
+
+      if (!flag_prime_paths_source)
+       continue;
+
+      /* Print the lines of the path, or (empty) if there are no lines
+        associated with the block.  No lines is common for scope closings,
+        e.g. past the for loop before scope exit.  */
+      for (unsigned v : path)
+       {
+         auto& loc = fn->blocks[v].locations;
+         if (loc.empty ())
+           fprintf (gcov_file, "BB %d: (empty)\n", v);
+         else
+           {
+             unsigned locn = 0;
+             for (auto& locs : fn->blocks[v].locations)
+               {
+                 for (unsigned line : locs.lines)
+                   fprintf (gcov_file, "BB %d: (%d) %d", v, locn, line),
+                           print_source_line (gcov_file, lines, line);
+                 locn++;
+               }
+           }
+       }
+      fputc ('\n', gcov_file);
+    }
+  return 1;
+}
+
 static const char *
 read_line (FILE *file)
 {
@@ -3413,6 +3637,7 @@ output_lines (FILE *gcov_file, const source_info *src)
            {
              function_info *fn = (*fns)[0];
              output_function_details (gcov_file, fn);
+             output_path_count (gcov_file, fn, source_lines);
            }
        }
 
@@ -3449,6 +3674,7 @@ output_lines (FILE *gcov_file, const source_info *src)
              fprintf (gcov_file, "%s:\n", fn_name.c_str ());
 
              output_function_details (gcov_file, fn);
+             output_path_count (gcov_file, fn, source_lines);
 
              /* Print all lines covered by the function.  */
              for (unsigned i = 0; i < lines.size (); i++)
diff --git a/gcc/ipa-inline.cc b/gcc/ipa-inline.cc
index e52757510ce..40fbca53c95 100644
--- a/gcc/ipa-inline.cc
+++ b/gcc/ipa-inline.cc
@@ -689,7 +689,7 @@ can_early_inline_edge_p (struct cgraph_edge *e)
     }
   gcc_assert (gimple_in_ssa_p (DECL_STRUCT_FUNCTION (e->caller->decl))
              && gimple_in_ssa_p (DECL_STRUCT_FUNCTION (callee->decl)));
-  if ((profile_arc_flag || condition_coverage_flag)
+  if ((profile_arc_flag || condition_coverage_flag || path_coverage_flag)
       && ((lookup_attribute ("no_profile_instrument_function",
                            DECL_ATTRIBUTES (caller->decl)) == NULL_TREE)
          != (lookup_attribute ("no_profile_instrument_function",
diff --git a/gcc/passes.cc b/gcc/passes.cc
index d73f8ba97b6..010a98d9bca 100644
--- a/gcc/passes.cc
+++ b/gcc/passes.cc
@@ -352,8 +352,8 @@ finish_optimization_passes (void)
   gcc::dump_manager *dumps = m_ctxt->get_dumps ();
 
   timevar_push (TV_DUMP);
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage
-      || flag_branch_probabilities)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage || flag_branch_probabilities)
     {
       dumps->dump_start (pass_profile_1->static_pass_number, NULL);
       end_branch_prob ();
diff --git a/gcc/path-coverage.cc b/gcc/path-coverage.cc
new file mode 100644
index 00000000000..2155ac94dc7
--- /dev/null
+++ b/gcc/path-coverage.cc
@@ -0,0 +1,301 @@
+/* Calculate prime path coverage.
+   Copyright (C) 2024-2024 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#define INCLUDE_SET
+#define INCLUDE_MAP
+#define INCLUDE_VECTOR
+#define INCLUDE_STRING
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "target.h"
+#include "function.h"
+#include "basic-block.h"
+#include "tree.h"
+#include "cfg.h"
+#include "cfghooks.h"
+#include "cfgloop.h"
+#include "gimple.h"
+#include "gimple-iterator.h"
+#include "gimplify.h"
+#include "fold-const.h"
+#include "coverage.h"
+#include "ssa.h"
+#include "bitmap.h"
+#include "vec.h"
+#include "sbitmap.h"
+#include "graphds.h"
+#include "hash-map.h"
+#include "cgraph.h"
+#include "selftest.h"
+
+/* Build a struct graph equivalent to the CFG for the function FN.  The caller
+   must free the returned graph with free_graph.  The data field of every
+   vertex and edge point to the basic blocks and edges in the CFG.  */
+struct graph*
+cfg_as_graph (struct function* fn)
+{
+    struct graph *g = new_graph (n_basic_blocks_for_fn (fn));
+    basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
+    basic_block exit = EXIT_BLOCK_PTR_FOR_FN (fn);
+
+    g->vertices[entry->index].data = entry;
+    g->vertices[exit->index].data = exit;
+    basic_block top = single_succ (entry);
+
+    add_edge (g, entry->index, top->index)->data = single_succ_edge (entry);
+
+    basic_block bb;
+    FOR_EACH_BB_FN (bb, fn)
+      {
+       g->vertices[bb->index].data = bb;
+       for (edge e : bb->succs)
+         add_edge (g, e->src->index, e->dest->index)->data = e;
+      }
+    return g;
+}
+
+vec<vec<int>> prime_paths (struct graph*);
+
+/* Find the prime paths for the function FN with the ENTRY and EXIT blocks
+   removed.  */
+static vec<vec<int>>
+find_prime_paths (struct function *fn)
+{
+  auto *cfg = cfg_as_graph (fn);
+  auto paths = prime_paths (cfg);
+
+  for (auto& path : paths)
+    {
+      if (path.last () == EXIT_BLOCK)
+       path.pop ();
+      if (path[0] == ENTRY_BLOCK)
+       path.ordered_remove (0);
+    }
+  for (size_t i = 0; i != paths.length (); i++)
+    if (paths[i].is_empty ())
+      paths.ordered_remove (i--);
+
+  return paths;
+}
+
+/* Return the edge between SRC and DST.  */
+static edge
+edge_between (struct function *fn, int src, int dst)
+{
+  basic_block bbsrc = BASIC_BLOCK_FOR_FN (fn, src);
+  basic_block bbdst = BASIC_BLOCK_FOR_FN (fn, dst);
+  for (edge e : bbsrc->succs)
+    if (e->dest == bbdst)
+      return e;
+  gcc_checking_assert (false);
+  return NULL;
+}
+
+/* Instrument for prime path coverage.  */
+void
+find_paths (struct function *fn)
+{
+  vec<vec<int>> paths = find_prime_paths (fn);
+  tree gcov_type_node = get_gcov_type ();
+  const size_t counterbits = TYPE_PRECISION (gcov_type_node);
+  const size_t ncounters = (paths.length () + (counterbits - 1)) / counterbits;
+  gcc_assert (sizeof (uint64_t) * 8 == counterbits);
+
+  /// /* TODO: check error */
+  coverage_counter_alloc (GCOV_COUNTER_PATHS, ncounters);
+
+  gcov_position_t offset {};
+  offset = gcov_write_tag (GCOV_TAG_PATHS);
+  gcov_write_unsigned (paths.length ());
+  gcov_write_length (offset);
+
+  fprintf (stderr, "instrumenting %s\n", function_name (fn));
+  std::map<basic_block, std::map<size_t, uint64_t>> flushes;
+  std::map<basic_block, std::map<size_t, uint64_t>> resets;
+  std::map<edge, std::map<size_t, uint64_t>> ands;
+  std::map<basic_block, std::map<size_t, uint64_t>> ors;
+
+  /* Now that we have all paths we must figure out what bitmasks must be
+     applied on the edges.  We also record in which basic block the path starts
+     (e.g.  accumulator resets) and ends (accumulator flushes).  */
+  for (size_t pathno = 0; pathno != paths.length (); ++pathno)
+    {
+      const vec<int>& path = paths[pathno];
+      size_t bucket = pathno / counterbits;
+      size_t bit = uint64_t (1) << (pathno % counterbits);
+
+      for (unsigned i = 1; i != path.length (); ++i)
+       {
+         edge e = edge_between (fn, path[i-1], path[i]);
+         ands[e][bucket] |= bit;
+       }
+
+      basic_block first = BASIC_BLOCK_FOR_FN (fn, path[0]);
+      basic_block last = BASIC_BLOCK_FOR_FN (fn, path[path.length () - 1]);
+
+      ors[first][bucket] |= bit;
+      resets[first][bucket] |= bit;
+      flushes[last][bucket] |= bit;
+    }
+
+  basic_block bb;
+  std::map<basic_block, std::map<size_t /* bucket */, tree>> SSAs;
+  std::map<basic_block, std::map<size_t, gphi*>> gphis;
+  std::map<basic_block, std::map<size_t, tree>> xtmps;
+  std::map<basic_block, std::map<tree, tree>> scheduled_ior;
+  std::set<size_t> needsphi;
+
+  /* Build a plan for the instrumentation.  We only want to emit instructions
+     and build phis for buckets that will actually be touched.  It constructs
+     and pushes SSAs which may be constructed slightly out of order.  */
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      auto& xssa = SSAs[bb];
+      auto& phis = gphis[bb];
+      auto& tmps = xtmps[bb];
+      needsphi.clear ();
+
+      /* Record, in bucket resolution, the paths that go through bb.  Every
+        bucket with a live path in it must have a phi, but don't create it
+        yet.  Adding a phi arg requires non-local (to the edge) information -
+        a bucket may be untouched on one edge and updated from another, in
+        which case the untouched edge must also be added to the phi args.  */
+      for (edge e : bb->preds)
+       {
+         std::map<size_t, uint64_t> &andv = ands[e];
+         for (auto kv : andv)
+           needsphi.insert (kv.first);
+       }
+
+      /* Create phi nodes for all the buckets touched by at least one incoming
+        edge.  */
+      for (edge e : bb->preds)
+       {
+         for (auto bucket : needsphi)
+           {
+             gphi* phi = phis[bucket];
+             if (!phi)
+               {
+                 tree xphi = make_ssa_name (gcov_type_node);
+                 phi = create_phi_node (xphi, bb);
+                 tmps[bucket] = xphi;
+                 phis[bucket] = phi;
+               }
+
+             tree xprev = build_zero_cst (gcov_type_node);
+             add_phi_arg (phi, xprev, e, UNKNOWN_LOCATION);
+           }
+       }
+
+      /* Then, make a pass over the paths that start in bb.  Buckets will these
+        paths will be bitwise-or'd with a bitmask of the paths that start
+        there, so that when all masks are applied the just-reset paths' bits
+        are set.  This replaces the SSA that the successors depend on.  */
+      for (auto kv : resets[bb])
+       {
+         size_t bucket = kv.first;
+         uint64_t bitmask = kv.second;
+         if (tmps.find (bucket) == tmps.end ())
+           {
+             /* No previous dependency set - just assign the bitmask.  */
+             tmps[bucket] = build_int_cst (gcov_type_node, bitmask);
+           }
+         else
+           {
+             /* Otherwise, schedule a bitwise-or and pass the new SSA
+                downstream.  */
+             tree lhs = make_ssa_name (gcov_type_node);
+             scheduled_ior[bb][lhs] = tmps[bucket];
+             tmps[bucket] = lhs;
+           }
+       }
+
+      for (auto kv : tmps)
+       xssa[kv.first] = kv.second;
+    }
+
+  FOR_EACH_BB_FN (bb, fn)
+    {
+      std::map<size_t, gphi*> &phis = gphis[bb];
+
+      /* Build the bitwise-and operations and tie the result to the pre-created
+        SSAs.  The bitmask unsets all bits except those paths bb is a part of. 
 */
+      for (edge e : bb->preds)
+       {
+         basic_block prev = e->src;
+         for (auto kv : ands[e])
+           {
+             size_t bucket = kv.first;
+             tree prevssa = SSAs.at (prev).at (bucket);
+             tree lhs = make_ssa_name (gcov_type_node);
+             tree cst = build_int_cst (gcov_type_node, kv.second);
+             gassign *ga = gimple_build_assign (lhs, BIT_AND_EXPR, prevssa, 
cst);
+             gsi_insert_on_edge (e, ga);
+
+             gphi* phi = phis[bucket];
+             gcc_assert (phi);
+             add_phi_arg (phi, lhs, e, UNKNOWN_LOCATION);
+           }
+       }
+
+      /* Build the bitwise-or operations which sets the bits corresponding to
+        the paths bb is the first node in.  */
+      gimple_stmt_iterator gsi = gsi_after_labels (bb);
+      auto& tmps = xtmps[bb];
+      for (auto kv : resets[bb])
+       {
+         size_t bucket = kv.first;
+         uint64_t bitmask = kv.second;
+         tree lhs = tmps[bucket];
+
+         if (scheduled_ior[bb].find (lhs) == scheduled_ior[bb].end ())
+           continue;
+
+         tree cst = build_int_cst (gcov_type_node, bitmask);
+         tree src = scheduled_ior[bb].at (lhs);
+         gassign *ga = gimple_build_assign (lhs, BIT_IOR_EXPR, cst, src);
+         gsi_insert_before (&gsi, ga, GSI_SAME_STMT);
+       }
+
+      /* Finally, flush to the global counters.  */
+      for (auto kv : flushes[bb])
+       {
+         tree cst = build_int_cst (gcov_type_node, kv.second);
+         tree tmp1 = make_temp_ssa_name (gcov_type_node, NULL, "flush.tmp1");
+         tree tmp2 = make_temp_ssa_name (gcov_type_node, NULL, "flush.tmp2");
+         tree tmp3 = make_temp_ssa_name (gcov_type_node, NULL, "flush.tmp3");
+
+         gcc_assert (SSAs[bb].at (kv.first));
+
+         tree counter = tree_coverage_counter_ref (GCOV_COUNTER_PATHS, 
kv.first);
+         gassign *ga1 = gimple_build_assign (tmp1, counter);
+         gassign *ga2 = gimple_build_assign (tmp2, BIT_AND_EXPR, SSAs[bb]. 
at(kv.first), cst);
+         gassign *ga3 = gimple_build_assign (tmp3, BIT_IOR_EXPR, tmp1, tmp2);
+         gassign *ga4 = gimple_build_assign (unshare_expr (counter), tmp3);
+         gsi_insert_before (&gsi, ga1, GSI_SAME_STMT);
+         gsi_insert_before (&gsi, ga2, GSI_SAME_STMT);
+         gsi_insert_before (&gsi, ga3, GSI_SAME_STMT);
+         gsi_insert_before (&gsi, ga4, GSI_SAME_STMT);
+       }
+    }
+
+  release_vec_vec (paths);
+}
diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
new file mode 100644
index 00000000000..d8d79c0a747
--- /dev/null
+++ b/gcc/prime-paths.cc
@@ -0,0 +1,2015 @@
+#define INCLUDE_ALGORITHM
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "obstack.h"
+#include "sbitmap.h"
+#include "vec.h"
+#include "graphds.h"
+#include "selftest.h"
+
+namespace
+{
+
+/* A trivial key/value pair for a short linear map type.  */
+struct xpair
+{
+  int key;
+  unsigned val;
+};
+
+/* A node in a trie, optimized for mid-sized alphabets possibly larger than
+   256 but not much more.  Finding the prime paths ends up creating a large
+   amount of these nodes so space and access costs matters a lot.
+
+   The node does not explicitly store its own key (CFG vertex ID/basic block
+   index), nor does it store pointers to its successors. Rather, it stores the
+   key+offset pairs for its successors the root trie object, and in a sense
+   behaves like near pointers.  This makes the trie vertices small and
+   reloctable, and removes the need for pointer chasing when releasing the
+   trie.
+
+   The union of near/far is essentially a short-vector optimization, switching
+   to a heap-allocated vector when necessary. This happens relatively rarely
+   (usually maxes out at 1-2%), and the vertices that have more than 2
+   sucessors also tend to have more than 4. The root vertex tends to use the
+   dynamic vector because the subpaths are recorded as the successors of the
+   root.
+
+   Conceptually, this is a small map from vertex-id -> index and the API is
+   modelled as such.  The insert and search functions are unrolled by hand when
+   using the small vector.  This has a noticable performance impact on insert
+   in particular, and is not too complex since we know we are limited to 2
+   elements.
+
+   Vertices are tagged with endofpath and inserted.  If endofpath is set, the
+   path from the root to this vertex is a complete path.  If inserted is set
+   then the vertex is a part of proper path (one given to insert) and not
+   generated as a suffix.  For example:
+
+   insert ([2 4 6])
+   insert ([9 7 2 4 6])
+   insert ([2 4 6 8])
+
+   The inserted flags for [2 4 6] are not cleared, because otherwise [2 4 6 8]
+   would be dropped when only following inserted vertices.  The endofpath flag
+   in [2 4 6] is cleared when the suffixes of [9 7 2 4 6] are inserted.
+
+   The node will be inserted into a vec, and should be trivial.  Instances
+   should be value-initialized to zero-intialized state.  */
+struct trie_node
+{
+  unsigned length () const
+  { return !heaped ? len : far.length (); }
+
+  const xpair *begin () const
+  { return !heaped ? near : far.begin (); }
+
+  const xpair *end () const
+  { return !heaped ? (near + len) : far.end(); }
+
+  /* Get the ith successor.  This is used for traversal and not lookup, and
+     should only be used by the iterator.  */
+  const xpair &at (unsigned i) const
+  { return !heaped ? near[i] : far[i]; }
+
+  const xpair *get (int key) const;
+  void put (int key, unsigned val);
+
+  unsigned near_lower_bound (int key) const;
+
+  /* Experimentally I found that using a union with 2 elements in the near
+     array to be faster than 4 or without the union (very slightly).  A lot of
+     trie vertices will be created, and vast majority of vertices will have 1
+     or 2 successors (straight line or if-then), and the cost of size and
+     copying adds up.  */
+  union
+  {
+    xpair near[2];
+    vec<xpair> far;
+  };
+  unsigned len : 8;
+  unsigned endofpath : 1;
+  unsigned inserted : 1;
+  unsigned heaped : 1;
+};
+
+/* Compare LHS.key < RHS.key, for use with vec.lower_bound.  */
+bool
+xpair_less (const xpair& lhs, const xpair& rhs)
+{
+  return lhs.key < rhs.key;
+}
+
+/* Compare LHS.key to RHS.key, for use with vec.bsearch.  */
+int
+xpair_cmp (const void *lhs, const void *rhs)
+{
+  return ((const xpair*)lhs)->key - ((const xpair*)rhs)->key;
+}
+
+/* Get a pointer to the element at KEY if it exists, otherwise NULL.  */
+const xpair*
+trie_node::get (int key) const
+{
+  if (!heaped)
+    {
+      if (len == 0) return NULL;
+      if (len >= 1 && key == near[0].key) return near + 0;
+      if (len >= 2 && key == near[1].key) return near + 1;
+      return NULL;
+    }
+  else
+    {
+      xpair kv;
+      kv.key = key;
+      // TODO: only switch to bsearch on a large threshold (> 32?)
+      return const_cast <vec<xpair>&> (far).bsearch (&kv, xpair_cmp);
+    }
+}
+
+/* Put ("emplace") VAL at KEY, extending the paths that pass through this
+   vertex.  This function assumes that KEY is not already a successor, and does
+   not perform this check.  get () should be called and checked for NULL
+   putting with this function.  Put maintains the order of the successors.  */
+void
+trie_node::put (int key, unsigned val)
+{
+  xpair kv;
+  kv.key = key;
+  kv.val = val;
+  if (!heaped)
+    {
+      const unsigned i = near_lower_bound (key);
+      if (len < 2)
+       {
+         near[1] = near[0];
+         near[i] = kv;
+         len += 1;
+       }
+      else
+       {
+         vec<xpair> xs {};
+         xs.reserve (13);
+         xs.quick_grow (3);
+         gcc_checking_assert (i <= 2);
+         if (i == 0)
+           {
+             xs[0] = kv;
+             xs[1] = near[0];
+             xs[2] = near[1];
+           }
+         else if (i == 1)
+           {
+             xs[0] = near[0];
+             xs[1] = kv;
+             xs[2] = near[1];
+           }
+         else
+           {
+             xs[0] = near[0];
+             xs[1] = near[1];
+             xs[2] = kv;
+           }
+
+         far = xs;
+         heaped = 1;
+       }
+    }
+  else
+    {
+      const unsigned i = far.lower_bound (kv, xpair_less);
+      far.safe_insert (i, kv);
+    }
+}
+
+/* Get the index to the last element that compares less than KEY, similar to
+   vec.lower_bound.  This assumes the near vector is active, and is for
+   internal use.  */
+unsigned
+trie_node::near_lower_bound (int key) const
+{
+  gcc_checking_assert (!heaped);
+  if (len == 0) return 0;
+  if (len >= 1 && key < near[0].key) return 0;
+  if (len >= 2 && key < near[1].key) return 1;
+  return len;
+}
+
+/* The trie is a major workhorse for this algorithm.  It has two key
+   properties - set-like subpath elimination and sorted output.
+
+   Many evaluated paths will be non-prime, that is, a sequence of vertices that
+   is also fully embedded in a longer sequence of vertices.  For example the
+   path [3 4 5 8] is a subpath of both [2 3 4 5 8] and [3 4 5 8 10].  The
+   insert_with_suffix function maintains this property so that inserting a
+   subpath into the trie is effectively a no-op, and inserting a superpath will
+   effectively remove (unmark) the subpath.  Sometimes it can be guaranteed
+   that no redundant (subpaths) will be generated, in which case the insert
+   function can be used.  The insert function is really only set insert, only
+   becoming a no-op for identical paths, which will be a lot faster.
+
+   Paths can be extracted with an iterator, which will output paths in
+   lexicographically sorted order.  This is an important property because the
+   index of a path in the sorted set will be used by the coverage to record
+   when a path is taken and completed.  The iterator has different behavior
+   than the standard C++ iterators, and to avoid mixups the interface is
+   deliberately different.  The iterator has a (large) stack which is not cheap
+   to copy, and if the stack is shallow copied it would mean iterator copies
+   have non-local effects.  */
+struct trie
+{
+  struct iter;
+  trie ();
+
+  bool insert (const vec<int>&);
+  bool insert (const array_slice<const int>);
+  bool hard_insert (const array_slice<const int>);
+  bool insert_with_suffix (const array_slice<const int>);
+  bool insert_suffix (const array_slice<const int>);
+
+  iter paths (vec<int>&) const;
+  iter paths (vec<int>&, int from) const;
+
+  vec<vec<int>> paths () const;
+  bool contains (array_slice<const int>) const;
+
+  void release ();
+  vec<trie_node> vertices;
+
+  /* An iterator for the paths of the trie.  The iterator yields all paths in
+     lexicographical order.  The iterator will be invalidated on any insertion
+     into the trie.  The iterator should not be constructed directly, but
+     through the paths functions on the trie.  It is essentially an explicit
+     stack depth-first traversal.
+
+     The iter fills a user-provided buffer which should only be read as a when
+     the iter is active.  Whenever next returns true the buffer is filled with
+     the current path.  Uses will generally look like this:
+
+     vec<int> path {};
+     auto iter = trie.paths (path);
+     while (iter.next ())
+       use_path (path);
+*/
+  struct iter
+  {
+    iter (vec<int>&, const vec<trie_node>&);
+    iter (int first, vec<int>& path, const vec<trie_node> &vertices);
+    ~iter ()
+    { stack.release (); }
+
+    bool next ();
+    bool next (int);
+    bool next (bool);
+
+
+    /* This is the analog of the stack frame when implementing a recursive
+       depth-first path traversal and collecting paths to the leafs:
+
+       for (auto successor : vertex[ix])
+        {
+          path.push (successor.value);
+          collect (successor.ix, successor.begin, successor.end, path)
+          path.pop ();
+        }
+
+       Using size_t + 2x unsigned helped make the frame more compact and faster
+       than pointers.  */
+    struct frame
+    {
+      /* The index of this frame's vertex, so that vertices[ix].  */
+      size_t ix;
+      /* The index of the current active successor of vertices[ix].  */
+      unsigned itr;
+      /* The end of vertices[ix] successors.  When itr == end, vertex[ix] is
+        exhausted.  */
+      unsigned end;
+    };
+
+    /* User provided buffer to fill with the paths.  */
+    vec<int> &path;
+    /* Direct reference to the trie vertices vector.  */
+    const vec<trie_node> &vertices;
+    /* The call stack.  */
+    vec<frame> stack;
+    /* Yield flag.  If this is true then next () is permitted to and return a
+       new value.  If this is false, a value has already been yielded and next
+       must first reset the state before building the next value.  */
+    bool yield = true;
+
+    /* Delete the copy constructor and assignment as they are wrong as they
+       would be entangled with the original buffer and cause bad bugs.  They
+       are not at all necessary for the iterator to work well.  */
+    iter (const iter&) = delete;
+    iter& operator = (const iter&) = delete;
+  };
+};
+
+/* Construct an iterator filling BUFFER.  */
+trie::iter
+trie::paths (vec<int> &buffer) const
+{
+  buffer.truncate (0);
+  return iter (buffer, vertices);
+}
+
+/* Construct an iterator filling BUFFER for paths starting at FROM.  */
+trie::iter
+trie::paths (vec<int>& buffer, int from) const
+{
+  buffer.truncate (0);
+  return iter (from, buffer, vertices);
+}
+
+/* Default construct a new path set.  */
+trie::trie () : vertices (vec<trie_node> {})
+{
+  vertices.safe_push (trie_node {});
+  vertices[0].inserted = true;
+}
+
+/* Release the memory used by this trie.  */
+void
+trie::release ()
+{
+  for (trie_node &v : vertices)
+    if (v.heaped)
+      v.far.release ();
+  vertices.release ();
+}
+
+/* Insert PATH into the trie.  */
+bool
+trie::insert (const vec<int>& path)
+{
+  return insert (array_slice <const int> (path));
+}
+
+/* Insert the PATH into the trie.  Duplicate entries will not be entered twice.
+   If PATH is a subpath of another path this will not be detected or if there
+   is a path previously inserted that is a subpath of PATH then this redundancy
+   will not be eliminated.  For that behavior, call insert_with_suffix.  */
+bool
+trie::insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+       {
+         index = xp->val;
+         current.inserted = true;
+       }
+      else
+       {
+         current.endofpath = false;
+         current.inserted = true;
+         unsigned ix = vertices.length ();
+         vertices.safe_grow_cleared (ix + path.size () - i);
+
+         for (size_t k = i; k != path.size (); ++k)
+           {
+             vertices[index].put (path[k], ix);
+             vertices[ix].inserted = true;
+             index = ix;
+             ix += 1;
+           }
+
+         vertices[index].endofpath = true;
+         return true;
+       }
+    }
+
+  return false;
+}
+
+/* hard_insert is like insert, except it does not overwrite any endofpath
+   flags, and records the endofpath flag even when a superpath of PATH has been
+   inserted previously.  This effectively disables subpath elimination.  */
+bool
+trie::hard_insert (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+       {
+         index = xp->val;
+         current.inserted = true;
+       }
+      else
+       {
+         current.inserted = true;
+         unsigned ix = vertices.length ();
+         vertices.safe_grow_cleared (ix + path.size () - i);
+
+         for (size_t k = i; k != path.size (); ++k)
+           {
+             vertices[index].put (path[k], ix);
+             vertices[ix].inserted = true;
+             index = ix;
+             ix += 1;
+           }
+
+         vertices[index].endofpath = true;
+         return true;
+       }
+    }
+
+  vertices[index].endofpath = true;
+  return false;
+}
+
+/* Insert a suffixes of PATH.  This is identical to insert except that it is
+   assumed that PATH is a subpath, and that the inserted path should clear the
+   inserted and endofpath flags.  This function should only be called by
+   insert_with_suffix.  */
+bool
+trie::insert_suffix (array_slice<const int> path)
+{
+  size_t index = 0;
+  for (size_t i = 0; i != path.size (); ++i)
+    {
+      trie_node &current = vertices[index];
+      const auto *xp = current.get (path[i]);
+      if (xp)
+       {
+         index = xp->val;
+         vertices[index].endofpath = false;
+       }
+      else
+       {
+         current.endofpath = false;
+         unsigned ix = vertices.length ();
+         vertices.safe_grow_cleared (ix + path.size () - i);
+
+         for (size_t k = i; k != path.size (); ++k)
+           {
+             vertices[index].put (path[k], ix);
+             index = ix;
+             ix += 1;
+           }
+
+         return true;
+       }
+    }
+
+  return false;
+}
+
+/* Insert PATH and all its subpaths into the trie.  This function implements
+   the redundancy property of the trie - if an inserted path is either a
+   subpath or superpath of some other path then only the longest will keep its
+   inserted flag.  */
+bool
+trie::insert_with_suffix (array_slice<const int> path)
+{
+  bool inserted = insert (path);
+  path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+  while (inserted && !path.empty ())
+    {
+      inserted = insert_suffix (path);
+      path = array_slice<const int> (path.begin () + 1, path.size () - 1);
+    }
+  return inserted;
+}
+
+vec<vec<int>>
+trie::paths () const
+{
+  vec<int> path {};
+  vec<vec<int>> all {};
+  auto iter = paths (path);
+  while (iter.next ())
+    all.safe_push (path.copy ());
+  return all;
+}
+
+/* Check if the trie contains PATH.  */
+bool
+trie::contains (array_slice<const int> path) const
+{
+  size_t index = 0;
+  for (int id : path)
+    {
+      const trie_node &current = vertices[index];
+      if (!current.inserted)
+       return false;
+      const auto *xp = current.get (id);
+      if (!xp)
+       return false;
+      index = xp->val;
+    }
+  return vertices[index].inserted && vertices[index].endofpath;
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH.  */
+trie::iter::iter (vec<int>& path, const vec<trie_node>& vertices)
+     : path (path), vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+  frame f;
+  f.ix = 0;
+  f.itr = 0;
+  f.end = vertices[0].length ();
+  stack.quick_push (f);
+}
+
+/* Create an iterator over VERTICES with the caller-provided buffer PATH for
+   all paths and subpaths (suffixes) starting in FROM.  Note that FROM will not
+   be in the output buffer PATH, mainly because non-rooted paths are used when
+   splicing with paths that end in FROM.  */
+trie::iter::iter (int from, vec<int>& path, const vec<trie_node>& vertices)
+     : path (path), vertices (vertices), stack (vec<frame> {})
+{
+  gcc_checking_assert (!vertices.is_empty ());
+  stack.reserve (13);
+
+  auto *xp = vertices[0].get (from);
+  if (!xp)
+    {
+      /* No paths start with FROM, so construct an iterator where next ()
+        always returns false.  */
+      frame f;
+      f.ix = 0;
+      f.itr = 0;
+      f.end = 0;
+      stack.safe_push (f);
+      return;
+    }
+
+  frame f;
+  f.ix = xp->val;
+  f.itr = 0;
+  f.end = vertices[f.ix].length ();
+  stack.safe_push (f);
+}
+
+/* Find the next complete prime path in the trie and write it to the caller's
+   buffer.  Returns true if a path is written and false if the iterator is
+   exhausted, in which case no path is written and the contents of the buffer
+   is garbage.  */
+bool
+trie::iter::next ()
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (vertex.endofpath && yield
+         && (top.itr != top.end || vertex.length () == 0))
+       {
+         yield = false;
+         return true;
+       }
+
+      yield = true;
+
+      if (top.itr != top.end)
+       {
+         const xpair succ = vertex.at (top.itr);
+         const trie_node &next = vertices[succ.val];
+         top.itr++;
+
+         if (!next.inserted)
+           continue;
+
+         frame f {};
+         f.ix = succ.val;
+         f.itr = 0;
+         f.end = next.length ();
+         path.safe_push (succ.key);
+         stack.safe_push (f);
+       }
+      else
+       {
+         stack.pop ();
+         if (stack.is_empty ())
+           return false;
+         path.pop ();
+       }
+    }
+}
+
+/* Find the next path in the trie that would continue (but does not
+   include) LIMIT.  If the trie contains the paths [2 4 6 8 9] [2 4 6 8 10] and
+   [2 4 5 8], iter.next (8) would yield [2 4 6] and [2 4 5].  Returns true if a
+   path is written and false if the iterator is exhausted, in which case no
+   path is written and the contents of the buffer is garbage.  */
+bool
+trie::iter::next (int limit)
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && top.itr != top.end)
+       {
+         const xpair succ = vertex.at (top.itr);
+         const trie_node &next = vertices[succ.val];
+         const int key = succ.key;
+         const int val = succ.val;
+         top.itr++;
+
+         if (!next.inserted)
+           continue;
+
+         if (key == limit)
+           {
+             if (path.is_empty ())
+               continue;
+             yield = false;
+             return true;
+           }
+
+         frame f {};
+         f.ix = val;
+         f.itr = 0;
+         f.end = next.length ();
+         path.safe_push (key);
+         stack.safe_push (f);
+       }
+      else
+       {
+         yield = true;
+         stack.pop ();
+         if (stack.is_empty ())
+           return false;
+         path.pop ();
+       }
+    }
+}
+
+/* Find the next path in among all paths including subpaths/suffixes.  This is
+   mainly useful in combination with trie.paths (from) for finding the paths
+   that go through some vertex.  */
+bool
+trie::iter::next (bool)
+{
+  while (true)
+    {
+      frame& top = stack.last ();
+      const trie_node &vertex = vertices[top.ix];
+
+      if (yield && vertex.length () == 0)
+       {
+         yield = false;
+         return true;
+       }
+
+      yield = true;
+
+      if (top.itr != top.end)
+       {
+         const xpair succ = vertex.at (top.itr);
+         const trie_node &next = vertices[succ.val];
+         top.itr++;
+
+         frame f {};
+         f.ix = succ.val;
+         f.itr = 0;
+         f.end = next.length ();
+         path.safe_push (succ.key);
+         stack.safe_push (f);
+       }
+      else
+       {
+         stack.pop ();
+         if (stack.is_empty ())
+           return false;
+         path.pop ();
+       }
+    }
+}
+
+/* Return the index of NEEDLE in HAYSTACK, or (size_t)-1 if not found.  */
+template <typename T>
+size_t
+index_of (T needle, const vec <T>& haystack)
+{
+  size_t len = haystack.length ();
+  for (size_t i = 0; i != len; ++i)
+    if (haystack[i] == needle)
+      return i;
+  return (size_t)-1;
+}
+
+/* Check if there is an edge in GRAPH from SRC to DST.  */
+bool
+edge_p (const struct graph *graph, int src, int dest)
+{
+  for (struct graph_edge *e = graph->vertices[src].succ; e; e = e->succ_next)
+    if (e->dest == dest)
+      return true;
+  return false;
+}
+
+/* Check if PATH is a cycle starting (and ending) with V.  */
+bool
+cycle_p (const vec<int>& path, int v)
+{
+  return path[0] == v && path[path.length ()-1] == v;
+}
+
+/* Duplicates must be removed, but there are usually just a few (the product of
+   entries and exits into a loop - multiple exits (break, returns) are common
+   but multiple entries are rarer (usually just goto)). It happens when two
+   prime paths diverge somehere after the slice [Ven..Vex]  */
+void
+scc_entry_exit_paths (const vec<vec<int>>& prime_paths, int entry, int exit,
+                     trie &trie)
+{
+  if (entry == exit)
+    {
+      trie.hard_insert (array_slice <const int> (&entry, 1));
+      return;
+    }
+
+  for (const vec<int>& path : prime_paths)
+    {
+      const size_t Ven = index_of (entry, path);
+      const size_t Vex = index_of (exit, path);
+
+      if (Ven == (size_t)-1 || Vex == (size_t)-1 || Vex <= Ven)
+       continue;
+
+      const size_t len = (Vex + 1) - Ven;
+      array_slice <const int> p (path.begin () + Ven, len);
+      trie.hard_insert (p);
+    }
+}
+
+/* Find the SCC exit paths, the incomplete simple paths that starts in a
+   non-entry vertex in the SCC, and ends in EXIT, and is not a cycle.
+   INTERNAL_PP are the internal prime paths for the SCC with EXIT as an exit
+   vertex.
+
+   Fazli claims the path must not be a subpath of another exit path in the SCC,
+   but this is only half true: see 004b.  Subpaths must survive if they end in
+   a different exit vertex than the superpath.
+*/
+void
+scc_exit_paths (const vec<vec<int>> &prime_paths, int exit, trie &out)
+{
+  trie trie;
+  for (const vec<int> &path : prime_paths)
+    {
+      const size_t Vex = index_of (exit, path);
+      if (Vex == (size_t)-1 || cycle_p (path, exit))
+       continue;
+      array_slice <const int> p (path.begin (), Vex + 1);
+      trie.insert_with_suffix (p);
+    }
+
+  /* TODO: splice () */
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    out.hard_insert (path);
+}
+
+/* Find the SCC entry paths, the incomplete simple paths that start in the
+   entry vertex ENTRY and is not a cycle.  INTERNAL_PP are the internal prime
+   paths for the SCC with ENTRY as an entry vertex.  The paths are inserted
+   into TRIE.  */
+void
+scc_entry_paths (const vec<vec<int>> &internal_pp, int entry, trie &trie)
+{
+  for (const vec<int> &path : internal_pp)
+    {
+      const size_t Ven = index_of (entry, path);
+      if (Ven == (size_t)-1 || cycle_p (path, entry))
+       continue;
+      array_slice <const int> p (path.begin () + Ven, path.length () - Ven);
+      // TODO: does this need insert_with_suffix?
+      // TODO: does this need a !exit_p (p.back ()) check?
+      trie.insert (p);
+    }
+}
+
+/* Worker for cfg_complete_prime_paths.  ITR is the current id for the current
+   path.  END is end of the path so that when ITR == END, the walk is
+   completed.  CFG is the graph being analyzed.  SBMVEC is the SCC indexed
+   vector of bitmaps where nth bit is set if the nth vertex is in that SCC.
+   PATH is the vertices that make up this walk so far.  TRIE is the output trie
+   where paths are inserted.  SCC_ENEX_PATHS are the entry-exit paths found by
+   the scc_entry_exit_paths function.  */
+void
+cfg_complete_prime_paths1 (const int *itr, const int *end,
+                          const struct graph *cfg,
+                          const sbitmap *sbmvec,
+                          const vec<vec<vec<int>>> &scc_enex_paths,
+                          vec<int> &path, trie &trie)
+{
+  if (itr == end)
+    {
+      trie.insert_with_suffix (path);
+      return;
+    }
+
+  const auto pathlen = path.length ();
+  for (const vec<int> &enex : scc_enex_paths[*itr])
+    {
+      if (!edge_p (cfg, path.last (), enex[0]))
+       continue;
+
+      path.safe_splice (enex);
+      cfg_complete_prime_paths1 (itr + 1, end, cfg, sbmvec,
+                                scc_enex_paths, path, trie);
+      path.truncate (pathlen);
+    }
+}
+
+/* Find the complete prime paths of the CFG, the prime paths that start in the
+   entry vertex and end in the exit vertex.  SMBVEC is an sbitmap vector where
+   SBMVEC[i] is the ith SCC, and the nth in SBMVEC[i] is set if vertices[n] is
+   in SCC[i].  */
+trie
+cfg_complete_prime_paths (const struct graph *cfg,
+                         const sbitmap *sbmvec,
+                         const vec<trie> &scc_entry_exit_paths,
+                         const trie &ccfg_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 16> path {};
+  auto_vec<int, 16> cfgpp {};
+  auto_vec<vec<vec<int>>, 8> scc_enex (scc_entry_exit_paths.length ());
+
+  for (size_t i = 0; i != scc_entry_exit_paths.length (); ++i)
+    {
+      scc_enex.quick_push (vec<vec<int>> {});
+      auto iter = scc_entry_exit_paths[i].paths (path);
+      while (iter.next ())
+       scc_enex[i].safe_push (path.copy ());
+    }
+  path.truncate (0);
+
+  auto iter = ccfg_prime_paths.paths (cfgpp);
+  while (iter.next ())
+    {
+      for (const vec<int> &enex : scc_enex[cfgpp[0]])
+       {
+         path.truncate (0);
+         path.safe_splice (enex);
+         cfg_complete_prime_paths1 (cfgpp.begin () + 1, cfgpp.end (), cfg,
+                                    sbmvec, scc_enex, path, trie);
+       }
+    }
+
+  for (vec<vec<int>> &v : scc_enex)
+    release_vec_vec (v);
+  return trie;
+}
+
+/* Find the SCC exit prime paths, the prime paths that start from a strongly
+   connected component and end in the end vertex. SCCS is a vector where
+   SCCS[i] = SCC (vertex_i) so that if vertex[2].component == 5 then SCCS[2] ==
+   5.  SCC_EXIT_PATHS is the output of scc_exit_paths ().  COMPLETE_PRIME_PATHS
+   is the output of cfg_complete_prime_paths ().  */
+trie
+scc_exit_prime_paths (const vec<int> &sccs,
+                     const trie &scc_exit_paths,
+                     const trie &complete_prime_paths)
+{
+  trie trie;
+  auto_vec<int, 8> path {};
+  auto_vec<int, 8> r {};
+  auto_vec<int, 8> q {};
+
+  auto exiter = scc_exit_paths.paths (q);
+  while (exiter.next ())
+    {
+      const int Vex = q.last ();
+      auto iter = complete_prime_paths.paths (r, Vex);
+      while (iter.next (true))
+       {
+         /* There could be multiple Vex in the SCC. Even if
+            scc_exit_paths did not kill the subpaths, this trie probably
+            would. It can be assumed that all vertices in q are in the same
+            SCC.
+
+            This is an important note, as the Fazli and Afsharchi paper does
+            not properly capture this subtlety.
+
+            Could take the populated vec OR the cfg and use helper function.  
*/
+         auto p0 = Vex;
+         auto p1 = r[0];
+
+         if (sccs[p0] == sccs[p1])
+           continue;
+
+         path.truncate (0);
+         path.reserve (q.length () + r.length ());
+         path.splice (q);
+         path.splice (r);
+         /* This can probably insert without subpath elimination because:
+            1. Conflicts are *really* rare (see patmatch in tree.c), but they
+               do happen.
+            2. The output of this function is "filtered" through another trie
+               anyway so the redundant paths generated here will be
+               eliminated in the consumers at a very low extra cost.  */
+         trie.insert (path);
+       }
+    }
+
+  return trie;
+}
+
+/* Check if PATH in CFG enters the VERTEX's SCC through VERTEX.  */
+bool
+enters_through_p (const struct graph *cfg, const vec<int> &path, int vertex)
+{
+  gcc_checking_assert (!path.is_empty ());
+  const int last = path.address()[path.length ()-1];
+  if (cfg->vertices[last].component == cfg->vertices[vertex].component)
+    return false;
+  return edge_p (cfg, last, vertex);
+};
+
+/* Worker for scc_entry_prime_paths.  */
+void
+scc_entry_prime_paths1 (const struct graph *cfg, const trie
+                       &scc_entry_paths, const trie &prime_paths,
+                       trie &trie, vec<int> &buffer)
+{
+  auto_vec<int, 8> p {};
+  auto_vec<int, 8> q {};
+  auto itr = scc_entry_paths.paths (q);
+  while (itr.next ())
+    {
+      int Ven = q[0];
+      /* TODO: This might benefit from a reversed trie lookup.  */
+      auto xitr = prime_paths.paths (p);
+      while (xitr.next (Ven))
+       {
+         if (!enters_through_p (cfg, p, Ven))
+           continue;
+
+         buffer.truncate (0);
+         buffer.reserve (p.length () + q.length ());
+         buffer.splice (p);
+         buffer.splice (q);
+         trie.insert_with_suffix (buffer);
+       }
+    }
+}
+
+/* Find the entry prime paths - the prime paths that start in the root and end
+   in a strongly connected component.  */
+void
+scc_entry_prime_paths (const struct graph *cfg,
+                      const trie &scc_entry_paths,
+                      const trie &complete_prime_paths,
+                      const trie &exit_prime_paths,
+                      trie &trie)
+{
+  auto_vec<int> buffer {};
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, complete_prime_paths, trie, 
buffer);
+  scc_entry_prime_paths1 (cfg, scc_entry_paths, exit_prime_paths, trie, 
buffer);
+}
+
+/* Build a new control flow graph from the strongly connected components, so
+   that every node in the CCFG is a strongly connected component in the
+   original CFG.  NSSC is the number of vertices in the new graph, and the
+   return value of graphds_ssc.  */
+struct graph*
+build_ccfg (struct graph *cfg, int nscc)
+{
+  struct graph *ccfg = new_graph (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      struct vertex v = cfg->vertices[i];
+      for (struct graph_edge *e = v.succ; e; e = e->succ_next)
+       {
+         int src = v.component;
+         int dest = cfg->vertices[e->dest].component;
+         if (src != dest && !edge_p (ccfg, src, dest))
+           add_edge (ccfg, src, dest);
+       }
+    }
+
+  return ccfg;
+}
+
+/* Create a new graph object from CFG where the edges between strongly
+   connected components removed.  */
+struct graph*
+disconnect_sccs (struct graph *cfg)
+{
+  struct graph *ccfg = new_graph (cfg->n_vertices);
+  const struct vertex *vertices = cfg->vertices;
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      ccfg->vertices[i].data = &cfg->vertices[i];
+      for (struct graph_edge *e = vertices[i].succ; e; e = e->succ_next)
+       if (vertices[e->src].component == vertices[e->dest].component)
+         add_edge (ccfg, e->src, e->dest)->data = e;
+    }
+  return ccfg;
+}
+
+/* Check if vertex i is the entry vertex of a strongly connected component.  It
+   is an entry vertex if either: there are no predecessors (i.e. the root
+   vertex is always an entry vertex) or if a predecessor belongs to a
+   different SCC.  */
+static bool
+scc_entry_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].pred)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].pred; e; e = e->pred_next)
+    if (cfg->vertices[e->src].component != scc)
+      return true;
+  return false;
+}
+
+/* Check if vertex i is the exit vertex of a strongly connected component.  It
+   is an exit vertex if either: there are no successors (i.e. the sink is
+   always an exit vertex) or if a successor belongs to a different SCC.  */
+static bool
+scc_exit_vertex_p (struct graph *cfg, size_t i)
+{
+  if (!cfg->vertices[i].succ)
+    return true;
+  const int scc = cfg->vertices[i].component;
+  for (struct graph_edge *e = cfg->vertices[i].succ; e; e = e->succ_next)
+    if (cfg->vertices[e->dest].component != scc)
+      return true;
+  return false;
+}
+
+/* Find all the simple paths in CFG starting at NODE and insert into TRIE.
+   This is a DFS where the search stops when entering a vertex already in SEEN.
+   PATH is path taken from the from the root to NODE.  */
+void
+simple_paths1 (const struct graph *cfg, int node, sbitmap seen, vec<int> &path,
+              trie &trie)
+{
+  if (!bitmap_set_bit (seen, node))
+    {
+      if (path[0] == node)
+       path.quick_push (node);
+      trie.insert (path);
+      if (path[0] == node)
+       path.pop ();
+      return;
+    }
+  path.quick_push (node);
+
+  struct graph_edge *succs = cfg->vertices[node].succ;
+  if (!succs)
+    {
+      trie.insert (path);
+      bitmap_clear_bit (seen, node);
+      path.pop ();
+      return;
+    }
+
+  for (struct graph_edge *e = succs; e; e = e->succ_next)
+    simple_paths1 (cfg, e->dest, seen, path, trie);
+
+  bitmap_clear_bit (seen, node);
+  path.pop ();
+}
+
+/* Find all the simple paths in CFG starting at ROOT and insert into TRIE.  A
+   simple path is a sequence of vertices without any duplicated vertices (i.e.
+   no loops).  SEEN should be an sbitmap of CFG->n_vertices size.  PATH and
+   SEEN will be cleared upon function entry and provide buffer reuse.  */
+void
+simple_paths (struct graph *cfg, int root, sbitmap seen, vec<int> &path,
+             trie &trie)
+{
+  bitmap_clear (seen);
+  path.reserve (cfg->n_vertices);
+  path.truncate (0);
+  simple_paths1 (cfg, root, seen, path, trie);
+}
+
+/* Find all the simple paths in CFG starting at ROOT.  */
+trie
+simple_paths (struct graph *cfg, int root = 0)
+{
+  trie trie;
+  auto_sbitmap seen (cfg->n_vertices);
+  auto_vec<int> path {};
+  simple_paths (cfg, root, seen, path, trie);
+  return trie;
+}
+
+/* This struct is just a collection of reusable buffers which drastically
+   reduces allocation pressure.  This has a big impact on the performance of
+   the function.  As a bonus it can handle tidying up as well.  */
+struct pp_buffers
+{
+  pp_buffers (struct graph *cfg) : seen (cfg->n_vertices) {}
+  auto_sbitmap seen;
+  auto_vec<int, 8> path;
+};
+
+} // namespace
+
+/* Fazli & Afsarchi compositional prime paths generation.  */
+vec<vec<int>>
+prime_paths (struct graph *cfg)
+{
+    vec<vec<int>> paths {};
+    const int nscc = graphds_scc (cfg, NULL);
+    struct graph *disconnected = disconnect_sccs (cfg);
+    struct graph *ccfg = build_ccfg (cfg, nscc);
+
+    pp_buffers ctx (cfg);
+
+    /* Make a mapping vertex -> SCC that vertex is a part of.  This could
+       already be looked up directly on the cfg->vertices, but that interface
+       is slightly more clunky when passed down to helpers.
+       TODO: array slice? */
+    auto_vec<int, 8> scc_map (cfg->n_vertices);
+    scc_map.safe_grow (cfg->n_vertices);
+    /* Store an SCC-ID -> vertices mapping to quickly find the vertices that
+       make up a strongly connected component.  This is the inverse of 
scc_map.  */
+    sbitmap *sbmvec = sbitmap_vector_alloc (ccfg->n_vertices, cfg->n_vertices);
+    bitmap_vector_clear (sbmvec, ccfg->n_vertices);
+
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      {
+        int scc_id = cfg->vertices[i].component;
+        scc_map[i] = scc_id;
+        bitmap_set_bit (sbmvec[scc_id], i);
+      }
+
+    trie prime_paths;
+    vec<vec<vec<int>>> scc_internal_pp {};
+    scc_internal_pp.safe_grow (nscc);
+    for (int i = 0; i != nscc; ++i)
+      {
+       unsigned V;
+       trie internal_pp;
+       sbitmap_iterator bi;
+       EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, V, bi)
+         simple_paths (disconnected, V, ctx.seen, ctx.path, internal_pp);
+
+       scc_internal_pp[i] = internal_pp.paths ();
+       for (const auto &pp : scc_internal_pp[i])
+         prime_paths.insert_with_suffix (pp);
+       internal_pp.release ();
+      }
+
+    auto_vec<trie, 8> scc_enex_paths (nscc);
+    scc_enex_paths.safe_grow_cleared (nscc);
+    trie scc_en_paths;
+    trie scc_ex_paths;
+
+    for (int i = 0; i != ccfg->n_vertices; ++i)
+      {
+       unsigned int Ven, Vex;
+       sbitmap_iterator bien, biex;
+       EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, Ven, bien)
+         {
+           if (!scc_entry_vertex_p (cfg, Ven))
+             continue;
+
+           EXECUTE_IF_SET_IN_BITMAP (sbmvec[i], 0, Vex, biex)
+             {
+               if (!scc_exit_vertex_p (cfg, Vex))
+                 continue;
+               scc_entry_exit_paths (scc_internal_pp[i], Ven, Vex,
+                                     scc_enex_paths[i]);
+             }
+         }
+      }
+
+    for (int i = 0; i != cfg->n_vertices; ++i)
+      {
+       const int scc = scc_map[i];
+       if (scc_entry_vertex_p (cfg, i))
+         scc_entry_paths (scc_internal_pp[scc], i, scc_en_paths);
+
+       if (scc_exit_vertex_p (cfg, i))
+         scc_exit_paths (scc_internal_pp[scc], i, scc_ex_paths);
+      }
+    /* TODO: pass this trie directly, dont go via vec-vec */
+    trie ccfg_prime_paths = simple_paths (ccfg, ccfg->n_vertices - 1);
+    trie complete_prime_paths = cfg_complete_prime_paths (cfg, sbmvec,
+                                                         scc_enex_paths,
+                                                         ccfg_prime_paths);
+    trie exit_prime_paths = scc_exit_prime_paths (scc_map, scc_ex_paths,
+                                                 complete_prime_paths);
+    scc_entry_prime_paths (cfg, scc_en_paths, complete_prime_paths,
+                          exit_prime_paths, prime_paths);
+
+
+    // TODO: trie.merge
+    auto_vec<int> p {};
+    auto epp = exit_prime_paths.paths (p);
+    auto cpp = complete_prime_paths.paths (p);
+
+    while (epp.next ())
+      prime_paths.insert_with_suffix (p);
+    while (cpp.next ())
+      prime_paths.insert_with_suffix (p);
+
+    paths = prime_paths.paths ();
+
+    for (trie &p : scc_enex_paths)
+      p.release ();
+    scc_en_paths.release ();
+    ccfg_prime_paths.release ();
+    scc_ex_paths.release ();
+    prime_paths.release ();
+    release_vec_vec (scc_internal_pp);
+    sbitmap_vector_free (sbmvec);
+    free_graph (ccfg);
+    free_graph (disconnected);
+    return paths;
+}
+
+#if CHECKING_P
+
+static int
+lexicographical_cmp (const void *vlhs, const void *vrhs)
+{
+    const vec<int> *lhs = (const vec<int>*)vlhs;
+    const vec<int> *rhs = (const vec<int>*)vrhs;
+
+    const int *first1 = lhs->begin ();
+    const int *first2 = rhs->begin ();
+    const int *last1 = lhs->end ();
+    const int *last2 = rhs->end ();
+
+    for (; (first1 != last1) && (first2 != last2); ++first1, ++first2)
+    {
+       if (*first1 < *first2)
+           return -1;
+       if (*first2 < *first1)
+           return 1;
+    }
+
+    if (first1 == last1 && first2 != last2)
+       return -1;
+    if (first1 == last1 && first2 == last2)
+       return 0;
+    if (first1 != last1 && first2 == last2)
+       return 1;
+
+    gcc_assert (false);
+}
+
+static void
+prime_paths1 (struct graph *graph, int entry, sbitmap visited,
+             vec<int> &path, vec<vec<int>> &paths)
+{
+  if (!bitmap_set_bit (visited, entry))
+    {
+      vec<int> *last = paths.safe_push (path.copy ());
+      if (entry == path[0])
+       last->safe_push (entry);
+      return;
+    }
+  path.safe_push (entry);
+  struct vertex *vertices = graph->vertices;
+
+  if (!vertices[entry].succ)
+    {
+      paths.safe_push (path.copy ());
+      bitmap_clear_bit (visited, entry);
+      path.pop ();
+      return;
+    }
+
+  for (struct graph_edge *e = vertices[entry].succ; e; e = e->succ_next)
+    prime_paths1 (graph, e->dest, visited, path, paths);
+
+  bitmap_clear_bit (visited, entry);
+  path.pop ();
+}
+
+static bool
+subsequence_p (array_slice<const int> needle, array_slice<const int> haystack)
+{
+  /* Only pick true subsequences; equal sequences are NOT considered 
subsequences */
+  if (needle.size () > haystack.size ())
+    return false;
+
+  size_t fst = 0;
+  for (; fst != haystack.size () - needle.size (); ++fst)
+    if (haystack[fst] == needle[0])
+      break;
+
+  if (fst + needle.size () > haystack.size ())
+    return false;
+
+  for (size_t i = 0; i != needle.size (); ++i)
+    if (needle[i] != haystack[fst+i])
+      return false;
+
+  return true;
+}
+
+static bool
+subsequence_p (const vec<int> &needle, const vec<int> &haystack)
+{
+  return subsequence_p (array_slice <const int> (needle),
+                       array_slice <const int> (haystack));
+}
+
+static bool
+equal_p (array_slice<const int> lhs, array_slice<const int> rhs)
+{
+  if (lhs.size () != rhs.size ())
+    return false;
+
+  size_t length = lhs.size();
+  for (size_t i = 0; i != length; ++i)
+    if (lhs[i] != rhs[i])
+      return false;
+  return true;
+}
+
+static void
+remove_duplicates (vec<vec<int>> &paths)
+{
+  paths.qsort (lexicographical_cmp);
+  auto cmp = [](const vec<int> &lhs, const vec<int> &rhs)
+    {
+      return equal_p (array_slice<const int> (lhs),
+                     array_slice<const int> (rhs));
+    };
+  auto it = std::unique (paths.begin (), paths.end (), cmp);
+  paths.truncate (it - paths.begin ());
+}
+
+static vec<vec<int>>
+reference_prime_paths (struct graph *g)
+{
+  auto_vec<int> path {};
+  vec<vec<int>> paths {};
+  auto_sbitmap visited (g->n_vertices);
+  bitmap_clear (visited);
+
+  for (int i = 0; i != g->n_vertices; i++)
+    {
+      gcc_assert (bitmap_empty_p (visited));
+      prime_paths1 (g, i, visited, path, paths);
+    }
+
+  for (size_t i = 0; i != paths.length (); ++i)
+    {
+      vec<int> &current = paths[i];
+      bool subseq = false;
+      for (size_t k = 0; k != paths.length (); ++k)
+       {
+         if (k != i)
+           subseq = subseq || subsequence_p (current, paths[k]);
+       }
+
+      if (subseq)
+       paths.unordered_remove (i--);
+    }
+
+  remove_duplicates (paths);
+  return paths;
+}
+
+vec<vec<int>>
+naive_prime_paths (struct graph *g)
+{
+  return reference_prime_paths (g);
+}
+
+namespace selftest
+{
+
+static bool
+any_equal_p (const array_slice<const int> &needle,
+            const vec<vec<int>> &haystack)
+{
+  for (const vec<int> &x : haystack)
+    if (equal_p (needle, array_slice <const int> (x)))
+      return true;
+  return false;
+}
+
+#define SLICE(a) array_slice <const int> ((a), (sizeof (a) / sizeof (a)[0]))
+
+static size_t
+count (const trie &trie)
+{
+  size_t n = 0;
+  auto_vec<int> path {};
+  auto iter = trie.paths (path);
+  while (iter.next ())
+    n += 1;
+  return n;
+}
+
+static vec<vec<int>>
+simple_paths (struct graph *cfg, trie &trie, int root = 0)
+{
+  auto_sbitmap seen (cfg->n_vertices);
+  auto_vec<int> path;
+  simple_paths (cfg, root, seen, path, trie);
+  return trie.paths ();
+}
+
+/* Create a CFG that roughly corresponds to this program:
+
+int binary_search(int a[], int len, int from, int to, int key)
+{
+    int low = from;
+    int high = to - 1;
+
+    while (low <= high)
+    {
+       int mid = (low + high) >> 1;
+       long midVal = a[mid];
+
+       if (midVal < key)
+           low = mid + 1;
+       else if (midVal > key)
+           high = mid - 1;
+       else
+           return mid; // key found
+    }
+    return -1;
+}
+
+*/
+static struct graph*
+binary_search_cfg ()
+{
+    struct graph *g = new_graph (11);
+    add_edge (g, 0, 1);
+    add_edge (g, 1, 2);
+    add_edge (g, 2, 3);
+    add_edge (g, 2, 4);
+    add_edge (g, 3, 10);
+    add_edge (g, 4, 5);
+    add_edge (g, 4, 6);
+    add_edge (g, 5, 7);
+    add_edge (g, 6, 8);
+    add_edge (g, 6, 9);
+    add_edge (g, 7, 2);
+    add_edge (g, 8, 10);
+    add_edge (g, 9, 7);
+    graphds_scc (g, NULL);
+    return g;
+}
+
+/* Test a full run of the algorithm against a known graph (binary-search).  */
+static void
+test_prime_paths ()
+{
+    struct graph *g = binary_search_cfg ();
+    vec<vec<int>> paths = prime_paths (g);
+    const int p01[] = { 0, 1, 2, 3, 10 };
+    const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+    const int p03[] = { 5, 7, 2, 4, 6, 9 };
+    const int p04[] = { 4, 6, 9, 7, 2, 4 };
+    const int p05[] = { 2, 4, 6, 9, 7, 2 };
+    const int p06[] = { 6, 9, 7, 2, 4, 6 };
+    const int p07[] = { 9, 7, 2, 4, 6, 9 };
+    const int p08[] = { 7, 2, 4, 6, 9, 7 };
+    const int p09[] = { 6, 9, 7, 2, 4, 5 };
+    const int p10[] = { 4, 5, 7, 2, 4 };
+    const int p11[] = { 2, 4, 5, 7, 2 };
+    const int p12[] = { 5, 7, 2, 4, 5 };
+    const int p13[] = { 7, 2, 4, 5, 7 };
+    const int p14[] = { 4, 6, 9, 7, 2, 3, 10 };
+    const int p15[] = { 5, 7, 2, 4, 6, 8, 10 };
+    const int p16[] = { 9, 7, 2, 4, 6, 8, 10 };
+    const int p17[] = { 4, 5, 7, 2, 3, 10 };
+    const int p18[] = { 0, 1, 2, 4, 6, 9, 7 };
+    const int p19[] = { 0, 1, 2, 4, 5, 7 };
+
+    ASSERT_EQ (paths.length (), 19);
+    ASSERT_TRUE (any_equal_p (SLICE (p01), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p02), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p03), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p04), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p05), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p06), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p07), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p08), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p09), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p10), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p11), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p12), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p13), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p14), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p15), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p16), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p17), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p18), paths));
+    ASSERT_TRUE (any_equal_p (SLICE (p19), paths));
+
+    free_graph (g);
+    release_vec_vec (paths);
+}
+
+/* The strongly connected component graph for binary_search looks like
+    this, using the vertex numbers from the original graph:
+
+    START
+      |
+      1
+      |
+      2 (SCC)
+     / \
+    3   8
+     \ /
+     END
+
+  The components gets renumbered by graphds_scc, so the ccfg looks like
+  this.  The actual numbers don't matter as long as the structure of the
+  graph is preserved, and this test is now sensitive to the numbering set
+  by graphds_scc.  It does not have to be - if that function should reverse
+  the numbering this test must be updated.
+
+      5
+      |
+      4
+      |
+      3 (SCC)
+     / \
+    2   1
+     \ /
+      0
+*/
+static void
+test_build_ccfg ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  const int nscc = graphds_scc (cfg, NULL);
+  struct graph *ccfg = build_ccfg (cfg, nscc);
+  ASSERT_EQ (6, nscc);
+
+  ASSERT_TRUE (edge_p (ccfg, 5, 4));
+  ASSERT_TRUE (edge_p (ccfg, 4, 3));
+  ASSERT_TRUE (edge_p (ccfg, 3, 2));
+  ASSERT_TRUE (edge_p (ccfg, 3, 1));
+  ASSERT_TRUE (edge_p (ccfg, 2, 0));
+  ASSERT_TRUE (edge_p (ccfg, 1, 0));
+
+  free_graph (cfg);
+  free_graph (ccfg);
+}
+
+/* This test checks some basic assumptions on finding the strongly connected
+   components and disconnecting the graph by removing all edges between SCCs.
+   Creating a single auxillary graph simplifies the bookkeeping.  */
+static void
+test_split_components ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  int nscc = graphds_scc (cfg, NULL);
+  struct graph *ccfg = disconnect_sccs (cfg);
+
+  vec<vec<int>> entries {};
+  vec<vec<int>> exits {};
+  entries.safe_grow_cleared (nscc);
+  exits.safe_grow_cleared (nscc);
+
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    {
+      if (scc_entry_vertex_p (cfg, i))
+       entries[cfg->vertices[i].component].safe_push (i);
+      if (scc_exit_vertex_p (cfg, i))
+       exits[cfg->vertices[i].component].safe_push (i);
+    }
+
+  const int p10[] = { 10 };
+  const int p08[] = { 8 };
+  const int p03[] = { 3 };
+  const int p02[] = { 2 };
+  const int p01[] = { 1 };
+  const int p00[] = { 0 };
+  const int p26[] = { 2, 6 };
+
+  ASSERT_EQ (entries.length (), 6);
+  ASSERT_TRUE (any_equal_p (SLICE (p10), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p02), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p01), entries));
+  ASSERT_TRUE (any_equal_p (SLICE (p00), entries));
+
+  ASSERT_EQ (exits.length (), 6);
+  ASSERT_TRUE (any_equal_p (SLICE (p10), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p26), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p01), exits));
+  ASSERT_TRUE (any_equal_p (SLICE (p00), exits));
+
+  /* The result of disconnect_sccs () disconnects the graph into its strongly
+     connected components. The subgraphs are either singletons (a single
+     vertex with no edges) or graphs with cycles.  The SCC internal prime
+     paths can be found by running a DFS from every SCC vertex, terminating
+     on a duplicated vertex.  This may create some redundant paths still,
+     which must be filtered out.
+
+     Singletons can either be detected and skipped (requires counting the
+     components) or filtered after.  For this test case they are skipped
+     because other graph inconsistencies are easier to detect.  */
+
+  /* Count and check singleton components.  */
+  vec<int> scc_size {};
+  scc_size.safe_grow_cleared (nscc);
+  for (int i = 0; i != cfg->n_vertices; ++i)
+    scc_size[cfg->vertices[i].component]++;
+  ASSERT_EQ (nscc, 6);
+  ASSERT_EQ (scc_size[0], 1);
+  ASSERT_EQ (scc_size[1], 1);
+  ASSERT_EQ (scc_size[2], 1);
+  ASSERT_EQ (scc_size[3], 6);
+  ASSERT_EQ (scc_size[4], 1);
+  ASSERT_EQ (scc_size[5], 1);
+
+  /* Manually unroll the loop finding the simple paths starting at the
+     vertices in the SCCs.  In this case there is only the one SCC.  */
+  trie ccfg_paths;
+  simple_paths (ccfg, ccfg_paths, 2);
+  simple_paths (ccfg, ccfg_paths, 4);
+  simple_paths (ccfg, ccfg_paths, 5);
+  simple_paths (ccfg, ccfg_paths, 6);
+  simple_paths (ccfg, ccfg_paths, 7);
+  simple_paths (ccfg, ccfg_paths, 9);
+  /* then in+out of trie */
+  vec<vec<int>> xscc_internal_pp = ccfg_paths.paths ();
+  trie scc_internal_pp;
+  for (auto &p : xscc_internal_pp)
+    scc_internal_pp.insert_with_suffix (p);
+
+  const int pp01[] = { 5, 7, 2, 4, 6, 9 };
+  const int pp02[] = { 4, 5, 7, 2, 4 };
+  const int pp03[] = { 4, 6, 9, 7, 2, 4 };
+  const int pp04[] = { 2, 4, 5, 7, 2 };
+  const int pp05[] = { 2, 4, 6, 9, 7, 2 };
+  const int pp06[] = { 5, 7, 2, 4, 5 };
+  const int pp07[] = { 6, 9, 7, 2, 4, 6 };
+  const int pp08[] = { 7, 2, 4, 5, 7 };
+  const int pp09[] = { 9, 7, 2, 4, 6, 9 };
+  const int pp10[] = { 7, 2, 4, 6, 9, 7 };
+  const int pp11[] = { 6, 9, 7, 2, 4, 5 };
+
+  ASSERT_EQ (count (scc_internal_pp), 11);
+  ASSERT_TRUE (scc_internal_pp.contains (pp01));
+  ASSERT_TRUE (scc_internal_pp.contains (pp02));
+  ASSERT_TRUE (scc_internal_pp.contains (pp03));
+  ASSERT_TRUE (scc_internal_pp.contains (pp04));
+  ASSERT_TRUE (scc_internal_pp.contains (pp05));
+  ASSERT_TRUE (scc_internal_pp.contains (pp06));
+  ASSERT_TRUE (scc_internal_pp.contains (pp07));
+  ASSERT_TRUE (scc_internal_pp.contains (pp08));
+  ASSERT_TRUE (scc_internal_pp.contains (pp09));
+  ASSERT_TRUE (scc_internal_pp.contains (pp10));
+  ASSERT_TRUE (scc_internal_pp.contains (pp11));
+
+  free_graph (ccfg);
+  free_graph (cfg);
+}
+
+/* The remaining tests deconstructs the algorithm and runs only a single phase
+   with good inputs at that point.  This makes it easier to pinpoint where
+   things go wrong, and helps show in steps how the algorithm works and the
+   phases relate.
+
+   The phases and their inputs and outputs are bazed on Fazli & Afsarchi.  */
+
+static void
+test_scc_internal_prime_paths ()
+{
+  /* This graph has only the SCC subgraph.  The result of running prime-paths
+     on it should be the SCC internal prime paths of the full graph.  */
+  struct graph *scc = new_graph (11);
+  add_edge (scc, 0, 2);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  vec<vec<int>> paths = prime_paths (scc);
+  const int p01[] = { 5, 7, 2, 4, 6, 9 };
+  const int p02[] = { 4, 6, 9, 7, 2, 4 };
+  const int p03[] = { 2, 4, 6, 9, 7, 2 };
+  const int p04[] = { 6, 9, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6, 9 };
+  const int p06[] = { 7, 2, 4, 6, 9, 7 };
+  const int p07[] = { 6, 9, 7, 2, 4, 5 };
+  const int p08[] = { 4, 5, 7, 2, 4 };
+  const int p09[] = { 2, 4, 5, 7, 2 };
+  const int p10[] = { 5, 7, 2, 4, 5 };
+  const int p11[] = { 7, 2, 4, 5, 7 };
+
+  ASSERT_TRUE (any_equal_p (SLICE (p01), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p02), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p03), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p04), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p05), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p06), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p07), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p08), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p09), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p10), paths));
+  ASSERT_TRUE (any_equal_p (SLICE (p11), paths));
+
+  release_vec_vec (paths);
+  free_graph (scc);
+}
+
+/* Test the entry/exit path helpers for the strongly connected component in
+   binary_search.  The SCC has one entry (2, the loop header) and two exits (2,
+   6, the loop exit and return).  */
+static void
+test_scc_entry_exit_paths ()
+{
+  struct graph *scc = new_graph (11);
+  add_edge (scc, 2, 4);
+  add_edge (scc, 4, 5);
+  add_edge (scc, 4, 6);
+  add_edge (scc, 5, 7);
+  add_edge (scc, 6, 9);
+  add_edge (scc, 9, 7);
+  add_edge (scc, 7, 2);
+
+  trie scc_internal_trie;
+  simple_paths (scc, scc_internal_trie, 2);
+  simple_paths (scc, scc_internal_trie, 4);
+  simple_paths (scc, scc_internal_trie, 5);
+  simple_paths (scc, scc_internal_trie, 6);
+  simple_paths (scc, scc_internal_trie, 7);
+  simple_paths (scc, scc_internal_trie, 9);
+  vec<vec<int>> scc_prime_paths = scc_internal_trie.paths ();
+
+  trie entry_exits {};
+  scc_entry_exit_paths (scc_prime_paths, 2, 2, entry_exits);
+  scc_entry_exit_paths (scc_prime_paths, 2, 6, entry_exits);
+
+  const int p01[] = { 2 };
+  const int p02[] = { 2, 4, 6 };
+
+  ASSERT_EQ (count (entry_exits), 2);
+  ASSERT_TRUE (entry_exits.contains (p01));
+  ASSERT_TRUE (entry_exits.contains (p02));
+
+  trie exits;
+  scc_exit_paths (scc_prime_paths, 2, exits);
+  scc_exit_paths (scc_prime_paths, 6, exits);
+
+  const int p03[] = { 4, 6, 9, 7, 2 };
+  const int p04[] = { 5, 7, 2, 4, 6 };
+  const int p05[] = { 9, 7, 2, 4, 6 };
+  const int p06[] = { 4, 5, 7, 2 };
+
+  ASSERT_EQ (count (exits), 4);
+  ASSERT_TRUE (exits.contains (p03));
+  ASSERT_TRUE (exits.contains (p04));
+  ASSERT_TRUE (exits.contains (p05));
+  ASSERT_TRUE (exits.contains (p06));
+
+  trie entries;
+  scc_entry_paths (scc_prime_paths, 2, entries);
+
+  const int p07[] = { 2, 4, 6, 9, 7 };
+  const int p08[] = { 2, 4, 5, 7 };
+  ASSERT_EQ (count (entries), 2);
+  ASSERT_TRUE (entries.contains (p07));
+  ASSERT_TRUE (entries.contains (p08));
+
+  free_graph (scc);
+  release_vec_vec (scc_prime_paths);
+  exits.release ();
+  entry_exits.release ();
+}
+
+static void
+test_complete_prime_paths ()
+{
+  const int ccfgpp0[] = { 0, 1, 2, 3, 5 };
+  const int ccfgpp1[] = { 0, 1, 2, 4, 5 };
+  trie ccfg_prime_paths {};
+  ccfg_prime_paths.insert (ccfgpp0);
+  ccfg_prime_paths.insert (ccfgpp1);
+
+  const int scc0[] = { 2 };
+  const int scc1[] = { 2, 4, 6 };
+
+  struct graph *cfg = binary_search_cfg ();
+  const int ccfg_single[] = { 0, 1, 3, 8, 10 };
+
+  vec<trie> ccfg_paths {};
+  ccfg_paths.safe_grow_cleared (6);
+  ccfg_paths[0].insert (array_slice <const int> (ccfg_single + 0, 1));
+  ccfg_paths[1].insert (array_slice <const int> (ccfg_single + 1, 1));
+  ccfg_paths[3].insert (array_slice <const int> (ccfg_single + 2, 1));
+  ccfg_paths[4].insert (array_slice <const int> (ccfg_single + 3, 1));
+  ccfg_paths[5].insert (array_slice <const int> (ccfg_single + 4, 1));
+
+  /* The paths for the SCC would need to be updated in ccfg pre pass.  This
+     feels like a clumsy interface and should maybe be disconnected from the
+     struct graph.  */
+  ccfg_paths[2].hard_insert (scc0);
+  ccfg_paths[2].hard_insert (scc1);
+
+  sbitmap *sbmvec = sbitmap_vector_alloc (6, cfg->n_vertices);
+  bitmap_vector_clear (sbmvec, 6);
+  bitmap_set_bit (sbmvec[0], 0);
+  bitmap_set_bit (sbmvec[1], 1);
+  bitmap_set_bit (sbmvec[3], 3);
+  bitmap_set_bit (sbmvec[4], 8);
+  bitmap_set_bit (sbmvec[5], 10);
+
+  bitmap_set_bit (sbmvec[2], 2);
+  bitmap_set_bit (sbmvec[2], 4);
+  bitmap_set_bit (sbmvec[2], 6);
+
+  trie cpp = cfg_complete_prime_paths (cfg, sbmvec, ccfg_paths,
+                                          ccfg_prime_paths);
+
+  const int p01[] = { 0, 1, 2, 3, 10 };
+  const int p02[] = { 0, 1, 2, 4, 6, 8, 10 };
+
+  ASSERT_EQ (count (cpp), 2);
+  ASSERT_TRUE (cpp.contains(p01));
+  ASSERT_TRUE (cpp.contains(p02));
+
+  free_graph (cfg);
+  ccfg_paths.release ();
+  cpp.release ();
+}
+
+static vec<int>
+binary_search_scc_map ()
+{
+  vec<int> sccs {};
+  sccs.safe_grow (11);
+  sccs[0] = 0;
+  sccs[1] = 1;
+  sccs[2] = 2;
+  sccs[3] = 3;
+  sccs[4] = 2;
+  sccs[5] = 2;
+  sccs[6] = 2;
+  sccs[7] = 2;
+  sccs[8] = 4;
+  sccs[9] = 2;
+  sccs[10] = 5;
+  return sccs;
+}
+
+static void
+test_exit_prime_paths ()
+{
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert_with_suffix (cpp01);
+  complete_prime_paths.insert_with_suffix (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2 };
+  const int ep02[] = { 5, 7, 2, 4, 6 };
+  const int ep03[] = { 9, 7, 2, 4, 6 };
+  const int ep04[] = { 4, 5, 7, 2 };
+  trie exit_paths;
+  exit_paths.insert (ep01);
+  exit_paths.insert (ep02);
+  exit_paths.insert (ep03);
+  exit_paths.insert (ep04);
+
+  vec<int> sccs = binary_search_scc_map ();
+  trie epp = scc_exit_prime_paths (sccs, exit_paths, complete_prime_paths);
+
+  const int pp01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int pp02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int pp03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int pp04[] = { 4, 5, 7, 2, 3, 10 };
+
+  ASSERT_EQ (count (epp), 4);
+  ASSERT_TRUE (epp.contains (pp01));
+  ASSERT_TRUE (epp.contains (pp02));
+  ASSERT_TRUE (epp.contains (pp03));
+  ASSERT_TRUE (epp.contains (pp04));
+
+  sccs.release ();
+  epp.release ();
+  exit_paths.release ();
+}
+
+static void
+test_entry_prime_paths ()
+{
+  struct graph *cfg = binary_search_cfg ();
+
+  static int sccep01[] = { 2, 4, 6, 9, 7 };
+  static int sccep02[] = { 2, 4, 5, 7 };
+  trie scc_entry_paths;
+  scc_entry_paths.insert (sccep01);
+  scc_entry_paths.insert (sccep02);
+
+  const int cpp01[] = { 0, 1, 2, 3, 10 };
+  const int cpp02[] = { 0, 1, 2, 4, 6, 8, 10 };
+  trie complete_prime_paths {};
+  complete_prime_paths.insert (cpp01);
+  complete_prime_paths.insert (cpp02);
+
+  const int ep01[] = { 4, 6, 9, 7, 2, 3, 10 };
+  const int ep02[] = { 5, 7, 2, 4, 6, 8, 10 };
+  const int ep03[] = { 9, 7, 2, 4, 6, 8, 10 };
+  const int ep04[] = { 4, 5, 7, 2, 3, 10 };
+  trie exit_prime_paths {};
+  exit_prime_paths.insert (ep01);
+  exit_prime_paths.insert (ep02);
+  exit_prime_paths.insert (ep03);
+  exit_prime_paths.insert (ep04);
+
+  vec<int> sccs = binary_search_scc_map ();
+
+  trie epp;
+  scc_entry_prime_paths (cfg, scc_entry_paths, complete_prime_paths,
+                        exit_prime_paths, epp);
+
+  /* The 0 (start node) does not show up in the Fazli & Afschardi paper and
+     kinda, but has no real impact on the result.  The prime-paths functions
+     do not care about these vertices, but the path-coverage instrumentation
+     filters out the ENTRY/EXIT blocks from all the paths.  */
+  const int pp01[] = { 0, 1, 2, 4, 6, 9, 7 };
+  const int pp02[] = { 0, 1, 2, 4, 5, 7 };
+  ASSERT_EQ (count (epp), 2);
+  ASSERT_TRUE (epp.contains (pp01));
+  ASSERT_TRUE (epp.contains (pp02));
+}
+
+static void
+test_prime_path_algorithm_equivalence ()
+{
+  struct graph *cfg = binary_search_cfg ();
+  auto fast = prime_paths (cfg);
+  auto slow = reference_prime_paths (cfg);
+
+  ASSERT_EQ (fast.length (), slow.length ());
+  for (const vec<int>& path : fast)
+    ASSERT_TRUE (any_equal_p (path, slow));
+  for (const vec<int>& path : slow)
+    ASSERT_TRUE (any_equal_p (path, fast));
+
+  free_graph (cfg);
+}
+
+/* A straight-line graph with one vertex should yield a single path of length 1
+   with just the non-exit non-entry node in it.  */
+void
+test_singleton_path ()
+{
+  struct graph *cfg = new_graph (3);
+  add_edge (cfg, 0, 2);
+  add_edge (cfg, 2, 1);
+
+  auto best = prime_paths (cfg);
+  auto slow = reference_prime_paths (cfg);
+
+  ASSERT_EQ (best.length (), 1);
+  ASSERT_EQ (best[0].length (), 3);
+  ASSERT_EQ (best[0][0], 0);
+  ASSERT_EQ (best[0][1], 2);
+  ASSERT_EQ (best[0][2], 1);
+
+  ASSERT_EQ (slow.length (), 1);
+  ASSERT_EQ (slow[0].length (), 3);
+  ASSERT_EQ (slow[0][0], 0);
+  ASSERT_EQ (slow[0][1], 2);
+  ASSERT_EQ (slow[0][2], 1);
+
+  free_graph (cfg);
+}
+
+void
+path_coverage_cc_tests ()
+{
+  test_prime_paths ();
+  test_build_ccfg ();
+  test_split_components ();
+  test_scc_internal_prime_paths ();
+  test_scc_entry_exit_paths ();
+  test_complete_prime_paths ();
+  test_exit_prime_paths ();
+  test_entry_prime_paths ();
+  test_prime_path_algorithm_equivalence ();
+  test_singleton_path ();
+}
+
+} // namespace selftest
+
+#endif /* #if CHECKING_P */
diff --git a/gcc/profile.cc b/gcc/profile.cc
index 2b90e6cc510..5d85b99d2de 100644
--- a/gcc/profile.cc
+++ b/gcc/profile.cc
@@ -1456,6 +1456,11 @@ branch_prob (bool thunk)
                    flag_bits |= GCOV_ARC_FAKE;
                  if (e->flags & EDGE_FALLTHRU)
                    flag_bits |= GCOV_ARC_FALLTHROUGH;
+                 if (e->flags & EDGE_TRUE_VALUE)
+                   flag_bits |= GCOV_ARC_TRUE;
+                 if (e->flags & EDGE_FALSE_VALUE)
+                   flag_bits |= GCOV_ARC_FALSE;
+
                  /* On trees we don't have fallthru flags, but we can
                     recompute them from CFG shape.  */
                  if (e->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE)
@@ -1541,7 +1546,7 @@ branch_prob (bool thunk)
 
   remove_fake_edges ();
 
-  if (condition_coverage_flag || profile_arc_flag)
+  if (condition_coverage_flag || path_coverage_flag || profile_arc_flag)
       gimple_init_gcov_profiler ();
 
   if (condition_coverage_flag)
@@ -1593,6 +1598,10 @@ branch_prob (bool thunk)
        instrument_values (values);
     }
 
+  void find_paths(struct function*);
+  if (path_coverage_flag)
+    find_paths(cfun);
+
   free_aux_for_edges ();
 
   values.release ();
diff --git a/gcc/selftest-run-tests.cc b/gcc/selftest-run-tests.cc
index 1c99de1e6c2..b64d008bfa5 100644
--- a/gcc/selftest-run-tests.cc
+++ b/gcc/selftest-run-tests.cc
@@ -103,6 +103,7 @@ selftest::run_tests ()
   tree_cfg_cc_tests ();
   tree_diagnostic_path_cc_tests ();
   attribs_cc_tests ();
+  path_coverage_cc_tests ();
 
   /* This one relies on most of the above.  */
   function_tests_cc_tests ();
diff --git a/gcc/selftest.h b/gcc/selftest.h
index 3bddaf1c322..28091d6d852 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -246,6 +246,7 @@ extern void json_cc_tests ();
 extern void optinfo_emit_json_cc_tests ();
 extern void opts_cc_tests ();
 extern void ordered_hash_map_tests_cc_tests ();
+extern void path_coverage_cc_tests ();
 extern void predict_cc_tests ();
 extern void pretty_print_cc_tests ();
 extern void range_tests ();
diff --git a/gcc/testsuite/gcc.misc-tests/gcov-25.c 
b/gcc/testsuite/gcc.misc-tests/gcov-25.c
new file mode 100644
index 00000000000..1abf9ab72d2
--- /dev/null
+++ b/gcc/testsuite/gcc.misc-tests/gcov-25.c
@@ -0,0 +1,311 @@
+/* { dg-options "--coverage -fpath-coverage" } */
+/* { dg-do run { target native } } */
+
+void
+pathcov001a ()
+{
+  /* Empty functions should not be problematic.  */
+}
+
+/* Straight line, which should have only one path.  */
+/* BEGIN paths
+    summary: 1/1
+*/
+void
+pathcov001b ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* Same as b, but not executed.  */
+/* BEGIN paths
+    summary: 0/1
+    expect: :33
+*/
+void
+pathcov001c ()
+/* END */
+{
+  int a = 0;
+  ++a;
+}
+
+/* 002 is a simple baseline test, with no complicated control flow and no
+   loops, run with different combinations of inputs that tests the paths in
+   isolation.  */
+/* BEGIN paths
+    summary: 0/2
+    expect: :48(true) :49 :52
+    expect: :48(false) :51 :52
+*/
+void
+pathcov002a (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+    summary: 1/2
+    expect: :63(false) :66 :67
+*/
+void
+pathcov002c (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* BEGIN paths
+    summary: 1/2
+    expect: :78(true) :79 :82
+*/
+void
+pathcov002b (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Identical to 002*, but run for both inputs.  This should achieve full
+   coverage.
+
+   BEGIN paths
+    summary: 2/2
+*/
+void
+pathcov002d (int a)
+/* END */
+{
+  int v = 0;
+  if (a)
+    v++;
+  else
+    v--;
+}
+
+/* Test individual control flow structures in isolation.  */
+
+/* BEGIN paths
+    summary: 0/2
+    expect: :112(true) :113 :114
+    expect: :112(false) :114
+*/
+void
+pathcov003a (int a)
+/* END */
+{
+  if (a)
+    a++;
+}
+
+/* BEGIN paths
+    summary: 0/2
+    expect: :125(true) :126 :129
+    expect: :125(false) :128 :129
+*/
+void
+pathcov003b (int a)
+/* END */
+{
+  if (a)
+    a++;
+  else
+    a--;
+}
+
+/* BEGIN paths
+    summary: 0/3
+    expect: :141(true) :142 :147
+    expect: :141(false) :143(true) :144 :147
+    expect: :141(false) :143(false) :146 :147
+*/
+void
+pathcov003c (int a)
+/* END */
+{
+  if (a > 10)
+    a++;
+  else if (a > 20)
+    a--;
+  else
+    a += 2;
+}
+
+/* BEGIN paths
+    summary: 0/5
+    expect: :162 :162(true) :163
+    expect: :162 :162(false) :164
+    expect: :163 :162(true) :163
+    expect: :163 :162(false) :164
+    expect: :162(true) :163 :162
+*/
+void
+pathcov003d (int a)
+/* END */
+{
+  int x = 0;
+  for (int i = 0; i < a; ++i)
+    ++x;
+}
+
+/* BEGIN paths
+    summary: 0/5
+    expect: :180 :180(true) :181
+    expect: :180 :180(false) :182
+    expect: :181 :180(true) :181
+    expect: :181 :180(false) :182
+    expect: :180(true) :181 :180
+*/
+void
+pathcov003e (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  while (i++ < a)
+    x++;
+}
+
+/* BEGIN paths
+    summary: 0/2
+    expect: :194 :197(false) :198
+    expect: :197(true) :197
+*/
+void
+pathcov003f (int a)
+/* END */
+{
+  int x = 0;
+  int i = 0;
+  do {
+      x++;
+  } while (i++ < a);
+}
+
+/* BEGIN paths
+    summary: 0/5
+    expect: :213 :216(true) :220
+    expect: :213 :216(false) :222
+    expect: :216(true) :220 :216
+    expect: :220 :216(true) :220
+    expect: :220 :216(false) :222
+*/
+void
+pathcov003g (int a)
+/* END */
+{
+  int i = 0;
+  int x = 0;
+
+top:
+  if (i < a)
+    {
+      x++;
+      i++;
+      goto top;
+    }
+}
+
+void
+pathcov004a (int a, int b, int c, int d)
+{
+  if (a)
+    {}
+  else
+    {
+      while (b && c)
+       {
+         if (d)
+           break;
+       }
+    }
+}
+
+/* BEGIN paths
+    summary: 0/14
+*/
+void
+pathcov004b (int a, int b, int c)
+/* END */
+{
+  while (a)
+    {
+      if (b)
+       return;
+      while (c--)
+       a++;
+    }
+}
+
+void
+pathcov004c (int a, int b, int c)
+/* END */
+{
+  if (a)
+    goto loop;
+
+  while (b > 0)
+    {
+      b--;
+loop:
+      while (c--)
+       a++;
+    }
+}
+
+void
+pathcov004d (int a, int b, int c)
+/* END */
+{
+  while (a-- > 0) c++;
+  while (b-- > 0) c++;
+}
+
+void *alloc (int) { return 0; }
+void *getcwd (void *, int) { return 0; }
+void release (void *) {}
+
+void *gnu_getcwd()
+{
+  int size = 100;
+  void *buffer = alloc (size);
+
+  while (1) {
+    void *value = getcwd (buffer, size);
+    if (value != 0) return buffer;
+    size *= 2;
+    release (buffer);
+    buffer = (void *) alloc (size);
+  }
+}
+
+int
+main ()
+{
+  pathcov001a ();
+  pathcov001b ();
+  /* not called: pathcov001c (); */
+
+  /* not called: pathcov002a (); */
+  pathcov002b (0);
+  pathcov002c (1);
+  pathcov002d (0);
+  pathcov002d (1);
+}
+
+/* { dg-final { run-gcov prime-paths { --prime-paths gcov-25.c } } } */
diff --git a/gcc/testsuite/lib/gcov.exp b/gcc/testsuite/lib/gcov.exp
index dd47d66d1b2..7a70e58720a 100644
--- a/gcc/testsuite/lib/gcov.exp
+++ b/gcc/testsuite/lib/gcov.exp
@@ -495,6 +495,89 @@ proc verify-calls { testname testcase file } {
     return $failed
 }
 
+proc verify-prime-paths { testname testcase file } {
+    set failed 0
+    set fd [open $file r]
+
+    set expected_n -1
+    set expected_m -1
+    set recording 0
+    set expected ""
+
+    while { [gets $fd line] >= 0 } {
+        regexp "^\[^:\]+: *(\[0-9\]+):" "$line" all lineno
+        set prefix "$testname line $lineno"
+
+        if {[regexp "BEGIN *paths" $line]} {
+            set recording 1
+            set expected ""
+            set expected_n -1
+            set expected_m -1
+            set seen ""
+            continue
+        }
+        
+        if { $recording != 1 } {
+            continue
+        }
+
+        if [regexp {summary: *(\d+)/(\d+)} $line _ n m] {
+            set expected_n $n
+            set expected_m $m
+        }
+
+        if [regexp "expect: *(.*)" $line all ln] {
+            set cases ""
+            set ln [regsub -all {\s+} $ln " "]
+            foreach case [split  $ln " "] {
+                lappend cases $case
+            }
+            lappend expected $cases
+        }
+
+        if [regexp "END" $line] {
+            if {$recording != 1} {
+                incr failed
+                fail "unexpected END at line $lineno, missing BEGIN"
+
+                # Abort the test if there is a mismatch, to avoid creating
+                # unecessary errors.  At this point the test itself is broken.
+                break
+            }
+            set recording 0
+
+            if {[llength $expected] > 0} {
+                incr failed
+                fail "expected: '$expected'"
+            }
+        }
+
+        if [regexp {paths covered (\d+) of (\d+)} $line _ n m] {
+            if { $n ne $expected_n || $m ne $expected_m } {
+                incr failed
+                fail "expected $expected_n/$expected_m covered paths, was 
$n/$m"
+            }
+        }
+
+        if [regexp {path *\d+ not covered: (.*)} $line _ path] {
+            set pathl ""
+            foreach ln [split $path " "] {
+                if [regexp {.*(:.*) *} $ln _ key] {
+                    lappend pathl $key
+                }
+            }
+            set i [lsearch $expected $pathl]
+            set expected [lreplace $expected $i $i]
+        }
+    }
+    #incr failed
+
+    # if recording: fail () (unterminated)
+
+    close $fd
+    return $failed
+}
+
 proc gcov-pytest-format-line { args } {
     global subdir
 
@@ -568,6 +651,7 @@ proc run-gcov { args } {
     set gcov_verify_calls 0
     set gcov_verify_branches 0
     set gcov_verify_conditions 0
+    set gcov_verify_prime_paths 0
     set gcov_verify_lines 1
     set gcov_verify_intermediate 0
     set gcov_remove_gcda 0
@@ -580,6 +664,8 @@ proc run-gcov { args } {
          set gcov_verify_branches 1
        } elseif { $a == "conditions" } {
          set gcov_verify_conditions 1
+       } elseif { $a == "prime-paths" } {
+         set gcov_verify_prime_paths 1
        } elseif { $a == "intermediate" } {
          set gcov_verify_intermediate 1
          set gcov_verify_calls 0
@@ -659,6 +745,11 @@ proc run-gcov { args } {
     } else {
        set cdfailed 0
     }
+    if { $gcov_verify_prime_paths } {
+       set ppfailed [verify-prime-paths $testname $testcase $testcase.gcov]
+    } else {
+       set ppfailed 0
+    }
     if { $gcov_verify_calls } {
        set cfailed [verify-calls $testname $testcase $testcase.gcov]
     } else {
@@ -673,12 +764,12 @@ proc run-gcov { args } {
 
     # Report whether the gcov test passed or failed.  If there were
     # multiple failures then the message is a summary.
-    set tfailed [expr $lfailed + $bfailed + $cdfailed + $cfailed + $ifailed]
+    set tfailed [expr $lfailed + $bfailed + $cdfailed + $ppfailed + $cfailed + 
$ifailed]
     if { $xfailed } {
        setup_xfail "*-*-*"
     }
     if { $tfailed > 0 } {
-       fail "$testname gcov: $lfailed failures in line counts, $bfailed in 
branch percentages, $cdfailed in condition/decision, $cfailed in return 
percentages, $ifailed in intermediate format"
+       fail "$testname gcov: $lfailed failures in line counts, $bfailed in 
branch percentages, $cdfailed in condition/decision, $ppfailed in prime-paths, 
$cfailed in return percentages, $ifailed in intermediate format"
        if { $xfailed } {
            clean-gcov $testcase
        }
diff --git a/gcc/tree-profile.cc b/gcc/tree-profile.cc
index b87c121790c..4594d26f00c 100644
--- a/gcc/tree-profile.cc
+++ b/gcc/tree-profile.cc
@@ -1908,7 +1908,7 @@ tree_profiling (void)
          thunk = true;
          /* When generate profile, expand thunk to gimple so it can be
             instrumented same way as other functions.  */
-         if (profile_arc_flag || condition_coverage_flag)
+         if (profile_arc_flag || condition_coverage_flag || path_coverage_flag)
            expand_thunk (node, false, true);
          /* Read cgraph profile but keep function as thunk at profile-use
             time.  */
@@ -1953,7 +1953,8 @@ tree_profiling (void)
   release_profile_file_filtering ();
 
   /* Drop pure/const flags from instrumented functions.  */
-  if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+  if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+      || flag_test_coverage)
     FOR_EACH_DEFINED_FUNCTION (node)
       {
        if (!gimple_has_body_p (node->decl)
@@ -1985,7 +1986,8 @@ tree_profiling (void)
 
       push_cfun (DECL_STRUCT_FUNCTION (node->decl));
 
-      if (profile_arc_flag || condition_coverage_flag || flag_test_coverage)
+      if (profile_arc_flag || condition_coverage_flag || path_coverage_flag
+         || flag_test_coverage)
        FOR_EACH_BB_FN (bb, cfun)
          {
            gimple_stmt_iterator gsi;
@@ -2070,7 +2072,8 @@ pass_ipa_tree_profile::gate (function *)
      disabled.  */
   return (!in_lto_p && !flag_auto_profile
          && (flag_branch_probabilities || flag_test_coverage
-             || profile_arc_flag || condition_coverage_flag));
+             || profile_arc_flag || condition_coverage_flag
+             || path_coverage_flag));
 }
 
 } // anon namespace
-- 
2.39.2

[PATCH] [RFC] Prime path coverage in gcc/gcov

Reply via email to