dcoughlin added a comment.

This is super-exciting work!

Some high-level notes:

- The running-time numbers you report are very high. At a ~20x slowdown, the 
benefits from improved solver reasoning will have to be very, very large to 
justify the performance cost. It is worth thinking about ways to limit the 
number of solver calls. Ultimately, I suspect @zaks.anna's suggestion of using 
z3 for refutation is the right way to go -- this would invoke the expensive 
solver only after the cheaper solver has found a potentially buggy path and try 
to show that it is infeasible.

- That said, I think there are still significant optimization opportunities. It 
looks like when performing a query you create a new solver for each set of 
constraints. My understanding (and perhaps @nlopes can correct me here) is that 
these days Z3 is quite good at incremental solves so you may be leaving a lot 
of performance on the table. For example, in `getSymVal()` you create one 
solver to query for a satisfying assignment and then later create a completely 
new one to determine whether it is the only satisfying assignment. Since those 
two queries share all of their conjuncts but one it might be faster to reuse 
the first solver and add the new assertion for the second. Similarly, since 
analyzer exploration is a DFS, you could imagine building up the path condition 
incrementally. Z3 has solver APIs for pushing and popping assertions. (Again, 
@nlopes may have a better idea of whether this would pay off in practice.)

- It would be good to measure the increased peak memory usage with the Z3 
constraint solver. The analyzer is often used as part of an IDE and so it needs 
to be able to run in memory constrained environments (such as laptops). 
Further, since multiple invocations of clang are often run simultaneously, 
memory is often a more precious resource than CPU time.

- As @a.sidorin noted, there is a significant test and maintenance cost to 
keeping two constraint managers around. Since the testing matrix would double, 
anything we can do to limit the need to modify/duplicate tests would be a huge 
savings. Is it possible to use lit substitution to call the analyzer twice for 
each run line: once with the range constraint manager and once with z3 (for 
builds where z3 is requested)? This, combined with automatically adding the 
#defines that @NoQ suggested would provide a mechanism to share most tests 
between the constraint managers.

- I don't think it is reasonable to ask all analyzer users or even all clang 
bots to install Z3, so we'll have to make sure the tests degrade gracefully 
when Z3 is not available. That said, we could set up dedicated Z3 bots to make 
sure that there is continuous integration for the feature.

- Right now the CMake uses find_package and builds with Z3 if it is found to be 
installed on the building system. I think this is too aggressive. It would be 
better to have the build explicitly opt in to using Z3. Without this, a user 
could accidentally build a clang that dynamically links to their local Z3 but 
then fails to launch with a load error when distributed. Similarly, it would be 
good to be able to explicitly set the the location of the Z3 to be used at 
build time for systems with multiple Z3s installed.


https://reviews.llvm.org/D28952



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to