honggyu.kim created this revision. honggyu.kim added reviewers: jordan_rose, krememek, zaks.anna, danielmarjamaki, babati, dcoughlin. honggyu.kim added subscribers: cfe-commits, phillip.power, seaneveson, j.trofimovich, hk.kang, eszasip, dkrupp, o.gyorgy, xazax.hun, premalatha_mvs.
This patch brings bug identification method from D10305 to the existing infrastructure. By applying this patch, two different bug reports can be compared with existing CmpRuns.py. Currently, "issue_hash" in plist file is just line offset from the beginning of function. But it even cannot distinguish those kind of simple cases that are completely different bugs. BUG 1. garbage return value ``` 1 int main() 2 { 3 int a; 4 return a; 5 } test.c:4:3: warning: Undefined or garbage value returned to caller return a; ^~~~~~~~ ``` BUG 2. garbage assignment ``` 1 int main() 2 { 3 int a; 4 int b = a; 5 return b; 6 } test.c:4:3: warning: Assigned value is garbage or undefined int b = a; ^~~~~ ~ ``` Moreover, The following case are regarded as different bugs when it is compared with BUG 1. BUG 3. a single line of comment is added based on BUG 1 code. ``` 1 int main() 2 { 3 // main function 4 int a; 5 return a; 6 } test.c:5:3: warning: Undefined or garbage value returned to caller return a; ^~~~~~~~ ``` The comparison result is as follows: ``` REMOVED: 'test.c:4:3, Logic error: Undefined or garbage value returned to caller' ADDED: 'test.c:5:3, Logic error: Undefined or garbage value returned to caller' TOTAL REPORTS: 1 TOTAL DIFFERENCES: 2 ``` This patch brought the bug identification method and code from D10305, and it generates the "issue_hash" with the following information: 1. column number 2. source line string after removing whitespace 3. bug type (bug message) This patch is not the final solution, but it enhances "issue_hash" to distinguish such kind of cases by generating stronger hash value. http://reviews.llvm.org/D12906 Files: lib/StaticAnalyzer/Core/PlistDiagnostics.cpp
Index: lib/StaticAnalyzer/Core/PlistDiagnostics.cpp =================================================================== --- lib/StaticAnalyzer/Core/PlistDiagnostics.cpp +++ lib/StaticAnalyzer/Core/PlistDiagnostics.cpp @@ -22,6 +22,11 @@ #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/SmallVector.h" #include "llvm/Support/Casting.h" +#include "llvm/Support/LineIterator.h" +#include "clang/AST/ASTContext.h" +#include "llvm/Support/MD5.h" +#include <sstream> + using namespace clang; using namespace ento; using namespace markup; @@ -285,6 +290,57 @@ } } +static std::string GetNthLineOfFile(llvm::MemoryBuffer *Buffer, int Line) { + if (!Buffer) + return ""; + + llvm::line_iterator LI(*Buffer, false); + for (; !LI.is_at_eof() && LI.line_number() != Line; ++LI) + ; + + return LI->str(); +} + +static std::string NormalizeLine(const SourceManager *SM, FullSourceLoc &L, + const Decl *D) { + static const std::string whitespaces = " \t\n"; + + const LangOptions &Opts = D->getASTContext().getLangOpts(); + std::string str = GetNthLineOfFile(SM->getBuffer(L.getFileID(), L), L.getExpansionLineNumber()); + unsigned col = str.find_first_not_of(whitespaces); + + SourceLocation StartOfLine = SM->translateLineCol(SM->getFileID(L), L.getExpansionLineNumber(), col); + llvm::MemoryBuffer *Buffer = SM->getBuffer(SM->getFileID(StartOfLine), StartOfLine); + if (!Buffer) return {}; + + const char *BufferPos = SM->getCharacterData(StartOfLine); + + Token Token; + Lexer Lexer(SM->getLocForStartOfFile(SM->getFileID(StartOfLine)), Opts, + Buffer->getBufferStart(), BufferPos, Buffer->getBufferEnd()); + + size_t nextStart = 0; + std::ostringstream lineBuff; + while (!Lexer.LexFromRawLexer(Token) && nextStart < 2) { + if (Token.isAtStartOfLine() && nextStart++ > 0) continue; + lineBuff << std::string(SM->getCharacterData(Token.getLocation()), Token.getLength()); + } + + return lineBuff.str(); +} + +static llvm::SmallString<32> GetHashOfContent(StringRef Content) { + llvm::MD5 Hash; + llvm::MD5::MD5Result MD5Res; + llvm::SmallString<32> Res; + + Hash.update(Content); + Hash.final(MD5Res); + llvm::MD5::stringifyResult(MD5Res, Res); + + return Res; +} + void PlistDiagnostics::FlushDiagnosticsImpl( std::vector<const PathDiagnostic *> &Diags, FilesMade *filesMade) { @@ -420,9 +476,12 @@ EmitString(o, declName) << '\n'; } - // Output the bug hash for issue unique-ing. Currently, it's just an - // offset from the beginning of the function. - if (const Stmt *Body = DeclWithIssue->getBody()) { + // Output the bug hash for issue unique-ing. + // Currently, it contains the following information: + // 1. column number + // 2. source line string after removing whitespace + // 3. bug type + if (DeclWithIssue->getBody()) { // If the bug uniqueing location exists, use it for the hash. // For example, this ensures that two leaks reported on the same line @@ -433,19 +492,22 @@ if (UPDLoc.isValid()) { FullSourceLoc UL(SM->getExpansionLoc(UPDLoc.asLocation()), *SM); - FullSourceLoc UFunL(SM->getExpansionLoc( - D->getUniqueingDecl()->getBody()->getLocStart()), *SM); o << " <key>issue_hash</key><string>" - << UL.getExpansionLineNumber() - UFunL.getExpansionLineNumber() + << GetHashOfContent( + std::to_string(UL.getExpansionColumnNumber()) + "$" + + ::NormalizeLine(SM, UL, DeclWithIssue) + "$" + + D->getBugType().str()) << "</string>\n"; // Otherwise, use the location on which the bug is reported. } else { FullSourceLoc L(SM->getExpansionLoc(D->getLocation().asLocation()), *SM); - FullSourceLoc FunL(SM->getExpansionLoc(Body->getLocStart()), *SM); o << " <key>issue_hash</key><string>" - << L.getExpansionLineNumber() - FunL.getExpansionLineNumber() + << GetHashOfContent( + std::to_string(L.getExpansionColumnNumber()) + "$" + + ::NormalizeLine(SM, L, DeclWithIssue) + "$" + + D->getBugType().str()) << "</string>\n"; }
_______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits