> ALGORITHM
> 
> Since the numbers of paths grows so fast, we need a good
> algorithm. The naive approach of generating all paths and discarding
> redundancies (see reference_prime_paths in the diff) simply doesn't
> complete for even pretty simple functions with a few ten thousand
> paths (granted, the implementation is also poor, but only serves as a
> reference). Fazli & Afsharchi in their paper "Time and Space-Efficient
> Compositional Method for Prime and Test Paths Generation" describe a
> neat algorithm which drastically improves on for most programs, and
> brings complexity down to something managable. This patch implements
> that algorithm with a few minor tweaks.
> 
> The algorithm first finds the strongly connected components (SCC) of the graph
> and creates a new graph where the vertices are the SCCs of the CFG. Within
> these vertices different paths are found - regular prime paths, paths that
> start in the SCCs entries, and paths that end in the SCCs exits. These per-SCC
> paths are combined with paths through the CFG which greatly reduces of paths
> needed to be evaluated just to be thrown away.
> 
> Using this algorithm we can find the prime paths for somewhat
> complicated functions in a reasonable time. Please note that some
> programs don't benefit from this at all. We need to find the prime
> paths within a SCC, so if a single SCC is very large the function
> degenerates to the naive implementation. This can probably be much
> improved on, but is an exercise for later.

Interesting I was only aware of the old paper by Ball and Larus
https://ieeexplore.ieee.org/abstract/document/566449
> 
> --
> 
> OVERALL ARCHITECTURE
> 
> Like the other coverages in gcc, this operates on the CFG in the profiling
> phase, just after branch and condition coverage, in phases:
> 
> 1. All prime paths are generated, counted, and enumerated from the CFG
> 2. The paths are evaluted and counter instructions and accumulators are
>    emitted
> 3. gcov reads the CFG and computes the prime paths (same as step 1)
> 4. gcov prints a report
> 
> Simply writing out all the paths in the .gcno file is not really viable,
> the files would be too big. Additionally, there are limits to the
> practicality of measuring (and reporting) on millions of paths, so for
> most programs where coverage is feasible, computing paths should be
> plenty fast. As a result, path coverage really only adds 1 bit to the
> counter, rounded up to nearest 64 ("bucket"), so 64 paths takes up 8
> bytes, 65 paths take up 16 bytes.

path coverage can also be used to determine corelated branches (where
outcome if first if predetrmines probability of the second if) which can
be used for optimization: if such paths are detected tail duplication
will likely help propagating some extra invariants.

For this we also need to know actual frequencies of the paths, not only
bit if it was or was not taken.
> 
> Recording paths is really just massaging large bitsets. Per function,
> ceil(paths/64 or 32) buckets (gcov_type) are allocated. Paths are
> sorted, so the first path maps to the lowest bit, the second path to the
> second lowest bit, and so on. On taking an edge and entering a basic
> block, a few bitmasks are applied to unset the bits corresponding to the
> paths outside the block and set the bits of the paths that start in that
> block. Finally, the right buckets are masked and written to the global
> accumulators for the paths that end in the block. Full coverage is
> achieved when all bits are set.
> 
> gcc does not really inform gcov of abnormal paths, so paths with

Adding abnormal edges is probably not hard, but I am not sure how
realistic is to cover them all.  Even EH introduces edges that can not
really be taken at runtime.

> abnormal paths are ignored. This probably possible, but requires some
> changes to the graph gcc writes to the .gcno file.

If I recall correctly, Ball&Larus simply stores counts into hashtable
assuming that most paths are not taken dynamically.

> +@item -e
> +@itemx --prime-paths
> +Write path coverage to the output file, and write path summary info to
> +the standard output.  This option allows you to see how many prime paths
> +were taken at least once.  For the regular output this option only
> +includes the number of paths covered.  For more fine grained information
> +on paths you can use @option{--prime-paths-lines} or
> +@option{--prime-paths-source}.  With @option{--json-format} all path
> +details are included in the output.  This requires you to compile the
> +source with @option{-fpath-coverage}.
> +
> +@item --prime-paths-lines [=@var{type}]
> +Write path coverage to the output file, and write path summary info to
> +the standard output.  This option allows you to see how many prime paths
> +were taken at least once, and dense report on the covered or uncovered
> +paths and how to cover them.  This mode is useful for automated
> +reporting and progress tracking. @var{type} may be omitted, or one of:

I wonder if some short explanation of "prime path" can fit here (or
somewhre in doc files?) I think most users would have to google what it
is.

> +-fpath-coverage

Maybe we should stick either ot path or prime path and use it in both
options?

> +/* Build a struct graph equivalent to the CFG for the function FN.  The 
> caller
> +   must free the returned graph with free_graph.  The data field of every
> +   vertex and edge point to the basic blocks and edges in the CFG.
> +
> +   The CFG recording and gcov is not aware of abnormal edges, so they are
> +   ignored here, too.  This means some paths are lost, e.g. those that 
> involve
> +   setjmp/longjmp.  They are still paths but would need more support from
> +   profile.cc and gcov.cc to work.  */
> +struct graph*
> +cfg_as_graph (struct function* fn)
> +{
> +  struct graph *g = new_graph (n_basic_blocks_for_fn (fn));
> +  basic_block entry = ENTRY_BLOCK_PTR_FOR_FN (fn);
> +  basic_block exit = EXIT_BLOCK_PTR_FOR_FN (fn);
> +
> +  g->vertices[entry->index].data = entry;
> +  g->vertices[exit->index].data = exit;
> +  basic_block top = single_succ (entry);
> +  add_edge (g, entry->index, top->index)->data = single_succ_edge (entry);
> +
> +  const unsigned ignore = EDGE_FAKE | EDGE_ABNORMAL | EDGE_ABNORMAL_CALL;
> +  basic_block bb;
> +  FOR_EACH_BB_FN (bb, fn)
> +    {
> +      g->vertices[bb->index].data = bb;
> +      for (edge e : bb->succs)
> +     if (!(e->flags & ignore))
> +       add_edge (g, e->src->index, e->dest->index)->data = e;
> +    }
> +  return g;
> +}

Overall it would be nice to have base class with API for (multi)graph
that can be used by CFG, callgraph and gcov's CFG so we can run same
algorthms on them.  

We do have quite some duplication between graph algorithms we run on
these. Moreover IPA summaries would benefit from a simplified and
memory-efecient CFG, too.
> +/* Return the edge between SRC and DST.  */
> +edge
> +edge_between (struct function *fn, int src, int dst)
> +{
> +  basic_block bbsrc = BASIC_BLOCK_FOR_FN (fn, src);
> +  basic_block bbdst = BASIC_BLOCK_FOR_FN (fn, dst);
> +  for (edge e : bbsrc->succs)
> +    if (e->dest == bbdst)
> +      return e;
> +  gcc_checking_assert (false);
gcc_unreachable would b perhaps more readable (but will stay in code
with checking disabled).
> +  return NULL;
> +}
> +
> +/* Visitor for topsort.  */
> +void
> +topsort1 (basic_block b, vec<basic_block>& L, sbitmap visited)

Uppercase for variable name is not really GNU coding style friendly.
Also more desciprive name would be useful :)
> +{
> +  if (bitmap_bit_p (visited, b->index))
> +    return;
> +
> +  for (edge e : b->succs)
> +    if (!(e->flags & EDGE_DFS_BACK))
> +      topsort1 (e->dest, L, visited);

Please avoid recursion here. Perhaps you can use post_order_compute
instead??
> +
> +/* Find the union of possible preserved by bitwise-ANDs for the bucket BUCKET
> +   of BB.  If this returns zero no paths go through BB at BUCKET.  */

I am not sure what this means. Perhaps it can be rephrased?

> +  gimple *def = SSA_NAME_DEF_STMT (local);
> +  gphi *phi = nullptr;
> +  if (is_a <gphi *> (def))
> +    phi = as_a <gphi *> (def);
gphi *phi = dyn_cast <gphi *> (def)
> +
> +/* Insert instructions update the global counter at BUCKET with the contents 
> of
                          updating?
> +   (LOCAL & MASK).  LOCAL is the SSA counter for this bucket, MASK the bits
> +   that should be updated (that is, the paths that end in this basic block).
> +   If ATOMIC_IOR is non-null the it uses atomics.  GCOV_TYPE_NODE should be a
> +   tree of the gcov type node for creating SSA names.
> +
> +   global[BUCKET] |= (LOCAL & MASK)
> +
> +   If MASK is null, no mask is applied and it becomes:
> +
> +   global[BUCKET] |= LOCAL
> +
> +   This function should be used on every block except returns_twice blocks 
> (see
> +   flush_on_edge) as all incoming edges can use the same flushing code.  */
> +void
> +flush_on_gsi (gimple_stmt_iterator *gsi, size_t bucket, tree local, tree 
> mask,
> +           tree atomic_ior, tree gcov_type_node)
> +{
> +  tree relaxed = nullptr;

I wonder if NULL_TREE or nullptr is preferred these days :)

Otherwise the profiling code reads well. Especially given it does
relatively coplex job.
> diff --git a/gcc/prime-paths.cc b/gcc/prime-paths.cc
> new file mode 100644
> index 00000000000..fe6bff74dd4
> --- /dev/null
> +++ b/gcc/prime-paths.cc
> @@ -0,0 +1,2031 @@

Missing the mandatory comment about file being part of GCC and covered
by GPL. Please check the ohter new files too.

> +/* Merge the tries T1, T2, T3, and set of paths VECS into the larges trie.
> +   Returns a reference to the trie merged into.  Merging tries and resolving
> +   redundant paths is the slowest step (at least in the sense it works on the
> +   largest input), and merging into a partial result reduces the work
> +   accordingly.  For large problems this is a massive improvement, which in 
> the
> +   worst cases (where all tries but one are empty or almost empty) speed up
> +   30-40%.  */
> +trie&
> +merge (trie &t1, trie &t2, trie &t3, vec<vec<vec<int>>> &vecs)

Seems you are first to use three nested vectors, congratulations :)
> +{
> +  trie *dst = nullptr;
> +  const size_t s1 = t1.size ();
> +  const size_t s2 = t2.size ();
> +  const size_t s3 = t3.size ();
> +
> +  if (s1 >= s2 && s1 >= s3)
> +    {
> +      dst = &t1;
> +      t1.merge (t2);
> +      t1.merge (t3);
> +    }
> +  else if (s2 >= s1 && s2 >= s3)
> +    {
> +      dst = &t2;
> +      t2.merge (t1);
> +      t2.merge (t3);
> +    }
> +  else
> +    {
> +      dst = &t3;
> +      t3.merge (t1);
> +      t3.merge (t2);
> +    }
> +
> +  gcc_checking_assert (dst);
> +  for (const vec<vec<int>> &v2 : vecs)
> +    for (const vec<int> &v1 : v2)
> +      dst->insert_with_suffix (v1);
> +  return *dst;
> +}
> +
> +/* Create a CFG that roughly corresponds to this program:
> +
> +int binary_search(int a[], int len, int from, int to, int key)

Maybe std::binary_search would work too?

Honza

Reply via email to