[PATCH] D53814: Allow the analyzer to output to a SARIF file

Aaron Ballman via Phabricator via cfe-commits Mon, 29 Oct 2018 10:07:44 -0700

aaron.ballman added inline comments.


================
Comment at: Analysis/diagnostics/sarif-check.py:22
+passes = 0
+with open(testfile) as testfh:
+    lineno = 0
----------------
george.karpenkov wrote:
> Wow, this is super neat!
> Since you are quite active in LLVM community, would you think it's better to 
> have this tool in `llvm/utils` next to FileCheck? The concept is very 
> general, and I think a lot of people really feel FileCheck limitations.
The concept was pulled from test\TableGen\JSON-check.py, so it likely could be 
generalized. However, I suspect that each JSON test may want to expose 
different helper capabilities, so perhaps it won't be super general? I don't 
know enough about good Python design to know.


================
Comment at: Analysis/diagnostics/sarif-diagnostics-taint-test.c:18
+// Test the basics for sanity.
+// CHECK: sarifLog['version'].startswith("2.0.0")
+// CHECK: sarifLog['runs'][0]['tool']['fullName'] == "clang static analyzer"
----------------
george.karpenkov wrote:
> Would it make more sense to just use `diff` + json pretty-formatter to write 
> a test?
> With this test I can't even quite figure out how the output should look like.
I'm not super comfortable with that approach, but perhaps I'm thinking of 
something different than what you're proposing. The reason I went with this 
approach is because diff would be fragile (depends heavily on field ordering, 
which the JSON support library doesn't give control over) and the physical 
layout of the file isn't what needs to be tested anyway. SARIF has a fair 
amount of optional data that can be provided as well, so using a purely textual 
diff worried me that exporting additional optional data in the future would 
require extensive unrelated changes to all SARIF diffs in the test directory.

The goal for this test was to demonstrate that we can validate that the 
interesting bits of information are present in the output without worrying 
about the details.

Also, the python approach allows us to express relationships between data items 
more easily than a textual diff tool would. I've not used that here, but I 
could imagine a test where we want to check that each code location has a 
corresponding file entry in another list.


================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:74
+  // Add the rest of the path components, encoding any reserved characters.
+  std::for_each(std::next(sys::path::begin(Filename)), 
sys::path::end(Filename),
+                [&Ret](StringRef Component) {
----------------
george.karpenkov wrote:
> Nitpicking style, but I don't see why for-each loop, preferably with a range 
> wrapping the iterators would not be more readable.
I tend to prefer using algorithms when the logic is simple -- it makes it more 
clear that the loop is an unimportant detail. I don't have strong opinions on 
this particular loop, however.


================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:88
+                  // URI encode the part.
+                  for (char C : Component) {
+                    // RFC 3986 claims alpha, numeric, and this handful of
----------------
ZaMaZaN4iK wrote:
> I don't know about const corectness policy in LLVM. But I prefer here const 
> char C .
We typically do not put top-level `const` on locals, so I'd prefer to leave it 
off here rather than be inconsistent.


================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:182
+  case PathDiagnosticPiece::Kind::Note:
+    // FIXME: What should be reported here?
+    break;
----------------
george.karpenkov wrote:
> "Note" are notes which do not have to be attached to a particular path 
> element.
Good to know!


================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:243
+
+  llvm::for_each(Diags, [&](const PathDiagnostic *D) {
+    Results.push_back(createResult(*D, Files));
----------------
george.karpenkov wrote:
> I like closures, but what's wrong with just using a `for` loop here?
Same as above: clarity of exposition. This one I'd feel pretty strongly about 
keeping as an algorithm given how trivial the loop body is.


================
Comment at: StaticAnalyzer/Core/SarifDiagnostics.cpp:254
+    std::vector<const PathDiagnostic *> &Diags, FilesMade *) {
+  // FIXME: if the file already exists, do we overwrite it with a single run,
+  // or do we append a run into the file if it's a valid SARIF log?
----------------
george.karpenkov wrote:
> Usually we overwrite the file and note that on stderr in such cases.
We took the decision internally to overwrite as well, but the SARIF format 
allows for multiple runs within the same output file (so you can compare 
analysis results for the same project over time). I think this will eventually 
have to be user-controlled because I can see some users wanting to append to 
the run and others wanting to overwrite. However, these log files can become 
quite large in practice (GBs of data), so "read in the JSON and add to it" may 
be implausible, hence why I punted for now.

I'll update the comment so it's not a FIXME.


https://reviews.llvm.org/D53814



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D53814: Allow the analyzer to output to a SARIF file

Reply via email to