[clang] 30e5c7e - [analyzer] NFCi: Refactor CStringChecker: use strongly typed internal API

2020-04-09 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-04-09T16:06:32+02:00
New Revision: 30e5c7e82fa1c5318540feb83d54757c632e2599

URL: 
https://github.com/llvm/llvm-project/commit/30e5c7e82fa1c5318540feb83d54757c632e2599
DIFF: 
https://github.com/llvm/llvm-project/commit/30e5c7e82fa1c5318540feb83d54757c632e2599.diff

LOG: [analyzer] NFCi: Refactor CStringChecker: use strongly typed internal API

Summary:
I wanted to extend the diagnostics of the CStringChecker with taintedness.
This requires the CStringChecker to be refactored to support a more flexible
reporting mechanism.

This patch does only refactorings, such:
 - eliminates always false parameters (like WarnAboutSize)
 - reduces the number of parameters
 - makes strong types differentiating *source* and *destination* buffers
   (same with size expressions)
 - binds the argument expression and the index, making diagnostics accurate
   and easy to emit
 - removes a bunch of default parameters to make it more readable
 - remove random const char* warning message parameters, making clear where
   and what is going to be emitted

Note that:
 - CheckBufferAccess now checks *only* one buffer, this removed about 100 LOC
   code duplication
 - not every function was refactored to use the /new/ strongly typed API, since
   the CString related functions are really closely coupled monolithic beasts,
   I will refactor them separately
 - all tests are preserved and passing; only the message changed at some places.
   In my opinion, these messages are holding the same information.

I would also highlight that this refactoring caught a bug in
clang/test/Analysis/string.c:454 where the diagnostic did not reflect reality.
This catch backs my effort on simplifying this monolithic CStringChecker.

Reviewers: NoQ, baloghadamsoftware, Szelethus, rengolin, Charusso

Reviewed By: NoQ

Subscribers: whisperity, xazax.hun, szepet, rnkovacs, a.sidorin,
mikhail.ramalho, donat.nagy, dkrupp, Charusso, martong, cfe-commits

Tags: #clang

Differential Revision: https://reviews.llvm.org/D74806

Added: 


Modified: 
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
clang/test/Analysis/bsd-string.c
clang/test/Analysis/bstring.c
clang/test/Analysis/null-deref-ps-region.c
clang/test/Analysis/string.c

Removed: 




diff  --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 4558b6936e7f..40467b346f69 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -30,6 +30,47 @@ using namespace clang;
 using namespace ento;
 
 namespace {
+struct AnyArgExpr {
+  // FIXME: Remove constructor in C++17 to turn it into an aggregate.
+  AnyArgExpr(const Expr *Expression, unsigned ArgumentIndex)
+  : Expression{Expression}, ArgumentIndex{ArgumentIndex} {}
+  const Expr *Expression;
+  unsigned ArgumentIndex;
+};
+
+struct SourceArgExpr : AnyArgExpr {
+  using AnyArgExpr::AnyArgExpr; // FIXME: Remove using in C++17.
+};
+
+struct DestinationArgExpr : AnyArgExpr {
+  using AnyArgExpr::AnyArgExpr; // FIXME: Same.
+};
+
+struct SizeArgExpr : AnyArgExpr {
+  using AnyArgExpr::AnyArgExpr; // FIXME: Same.
+};
+
+using ErrorMessage = SmallString<128>;
+enum class AccessKind { write, read };
+
+static ErrorMessage createOutOfBoundErrorMsg(StringRef FunctionDescription,
+ AccessKind Access) {
+  ErrorMessage Message;
+  llvm::raw_svector_ostream Os(Message);
+
+  // Function classification like: Memory copy function
+  Os << toUppercase(FunctionDescription.front())
+ << &FunctionDescription.data()[1];
+
+  if (Access == AccessKind::write) {
+Os << " overflows the destination buffer";
+  } else { // read access
+Os << " accesses out-of-bound array element";
+  }
+
+  return Message;
+}
+
 enum class ConcatFnKind { none = 0, strcat = 1, strlcat = 2 };
 class CStringChecker : public Checker< eval::Call,
  check::PreStmt,
@@ -113,12 +154,9 @@ class CStringChecker : public Checker< eval::Call,
   void evalMemmove(CheckerContext &C, const CallExpr *CE) const;
   void evalBcopy(CheckerContext &C, const CallExpr *CE) const;
   void evalCopyCommon(CheckerContext &C, const CallExpr *CE,
-  ProgramStateRef state,
-  const Expr *Size,
-  const Expr *Source,
-  const Expr *Dest,
-  bool Restricted = false,
-  bool IsMempcpy = false) const;
+  ProgramStateRef state, SizeArgExpr Size,
+  DestinationArgExpr Dest, SourceArgExpr Source,
+  bool Restricted, bool IsMempcpy) const;
 
   void evalMemcmp(CheckerContext &C, const CallExpr *CE) const;
 
@@ -195,40 +233,17 @@ class CStringChecker : public Checker< eval::Call,
   

[clang] d96a47c - [analyzer] ctu-on-demand-parsing tests: replace linux -> system-linux

2020-07-13 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-07-13T14:29:47+02:00
New Revision: d96a47c61625f853ec42a151ae3783e30a3943f3

URL: 
https://github.com/llvm/llvm-project/commit/d96a47c61625f853ec42a151ae3783e30a3943f3
DIFF: 
https://github.com/llvm/llvm-project/commit/d96a47c61625f853ec42a151ae3783e30a3943f3.diff

LOG: [analyzer] ctu-on-demand-parsing tests: replace linux -> system-linux

Differential Revision: https://reviews.llvm.org/D83555

Added: 


Modified: 
clang/test/Analysis/ctu-on-demand-parsing.c
clang/test/Analysis/ctu-on-demand-parsing.cpp

Removed: 




diff  --git a/clang/test/Analysis/ctu-on-demand-parsing.c 
b/clang/test/Analysis/ctu-on-demand-parsing.c
index 5adce7f36963..07a72a104646 100644
--- a/clang/test/Analysis/ctu-on-demand-parsing.c
+++ b/clang/test/Analysis/ctu-on-demand-parsing.c
@@ -19,7 +19,7 @@
 // RUN:   -verify ctu-on-demand-parsing.c
 //
 // FIXME: Path handling should work on all platforms.
-// REQUIRES: linux
+// REQUIRES: system-linux
 
 void clang_analyzer_eval(int);
 

diff  --git a/clang/test/Analysis/ctu-on-demand-parsing.cpp 
b/clang/test/Analysis/ctu-on-demand-parsing.cpp
index 058269662fb3..e4e998c8f64c 100644
--- a/clang/test/Analysis/ctu-on-demand-parsing.cpp
+++ b/clang/test/Analysis/ctu-on-demand-parsing.cpp
@@ -30,7 +30,7 @@
 // CHECK: CTU loaded AST file: {{.*}}ctu-chain.cpp
 //
 // FIXME: Path handling should work on all platforms.
-// REQUIRES: linux
+// REQUIRES: system-linux
 
 #include "ctu-hdr.h"
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] e22cae3 - [analyzer][NFC] Add unittest for FalsePositiveRefutationBRVisitor

2020-06-29 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-06-29T16:54:17+02:00
New Revision: e22cae32c5c4cf8c49b674cea34c105a6cb015f9

URL: 
https://github.com/llvm/llvm-project/commit/e22cae32c5c4cf8c49b674cea34c105a6cb015f9
DIFF: 
https://github.com/llvm/llvm-project/commit/e22cae32c5c4cf8c49b674cea34c105a6cb015f9.diff

LOG: [analyzer][NFC] Add unittest for FalsePositiveRefutationBRVisitor

Adds the test infrastructure for testing the FalsePositiveRefutationBRVisitor.
It will be extended in the D78457 patch, which demonstrates and fixes a bug in
the visitor.

Differential Revision: https://reviews.llvm.org/D78704

Added: 
clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp

Modified: 
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
clang/unittests/StaticAnalyzer/CMakeLists.txt
clang/unittests/StaticAnalyzer/CheckerRegistration.h

Removed: 




diff  --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp 
b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index ad79f7cb9359..719edfdb51e6 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -52,6 +52,7 @@
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/StringRef.h"
 #include "llvm/Support/Casting.h"
@@ -2812,12 +2813,19 @@ UndefOrNullArgVisitor::VisitNode(const ExplodedNode *N, 
BugReporterContext &BRC,
 // Implementation of FalsePositiveRefutationBRVisitor.
 
//===--===//
 
+#define DEBUG_TYPE "FalsePositiveRefutationBRVisitor"
+STATISTIC(CrosscheckedBugReports,
+  "The # of bug reports which were checked for infeasible 
constraints");
+STATISTIC(CrosscheckInvalidatedBugReports,
+  "The # of bug reports invalidated due to infeasible constraints");
+
 FalsePositiveRefutationBRVisitor::FalsePositiveRefutationBRVisitor()
 : Constraints(ConstraintRangeTy::Factory().getEmptyMap()) {}
 
 void FalsePositiveRefutationBRVisitor::finalizeVisitor(
 BugReporterContext &BRC, const ExplodedNode *EndPathNode,
 PathSensitiveBugReport &BR) {
+  ++CrosscheckedBugReports;
   // Collect new constraints
   VisitNode(EndPathNode, BRC, BR);
 
@@ -2848,8 +2856,10 @@ void FalsePositiveRefutationBRVisitor::finalizeVisitor(
   if (!isSat.hasValue())
 return;
 
-  if (!isSat.getValue())
+  if (!isSat.getValue()) {
+++CrosscheckInvalidatedBugReports;
 BR.markInvalid("Infeasible constraints", 
EndPathNode->getLocationContext());
+  }
 }
 
 PathDiagnosticPieceRef FalsePositiveRefutationBRVisitor::VisitNode(

diff  --git a/clang/unittests/StaticAnalyzer/CMakeLists.txt 
b/clang/unittests/StaticAnalyzer/CMakeLists.txt
index 564eba879f24..0e6d8763d96d 100644
--- a/clang/unittests/StaticAnalyzer/CMakeLists.txt
+++ b/clang/unittests/StaticAnalyzer/CMakeLists.txt
@@ -7,6 +7,7 @@ add_clang_unittest(StaticAnalysisTests
   AnalyzerOptionsTest.cpp
   CallDescriptionTest.cpp
   CallEventTest.cpp
+  FalsePositiveRefutationBRVisitorTest.cpp
   ParamRegionTest.cpp
   RangeSetTest.cpp
   RegisterCustomCheckersTest.cpp

diff  --git a/clang/unittests/StaticAnalyzer/CheckerRegistration.h 
b/clang/unittests/StaticAnalyzer/CheckerRegistration.h
index 0bbed9b7784f..e02b856e0e9c 100644
--- a/clang/unittests/StaticAnalyzer/CheckerRegistration.h
+++ b/clang/unittests/StaticAnalyzer/CheckerRegistration.h
@@ -14,6 +14,7 @@
 #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
 #include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
 #include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
 
 namespace clang {
 namespace ento {
@@ -65,10 +66,21 @@ class TestAction : public ASTFrontendAction {
   }
 };
 
+inline SmallString<80> getCurrentTestNameAsFileName() {
+  const ::testing::TestInfo *Info =
+  ::testing::UnitTest::GetInstance()->current_test_info();
+
+  SmallString<80> FileName;
+  (Twine{Info->name()} + ".cc").toVector(FileName);
+  return FileName;
+}
+
 template 
 bool runCheckerOnCode(const std::string &Code, std::string &Diags) {
+  const SmallVectorImpl &FileName = getCurrentTestNameAsFileName();
   llvm::raw_string_ostream OS(Diags);
-  return tooling::runToolOnCode(std::make_unique>(OS), 
Code);
+  return tooling::runToolOnCode(std::make_unique>(OS), Code,
+FileName);
 }
 
 template 
@@ -77,5 +89,22 @@ bool runCheckerOnCode(const std::string &Code) {
   return runCheckerOnCode(Code, Diags);
 }
 
+template 
+bool runCheckerOnCodeWithArgs(const std::string &Code,
+  const std::vector &Args,
+  std::string &Diags) {
+  const SmallVectorImpl &FileName = getCurrentTestNameAsFileName();
+  llvm::raw_string_ostream OS(Diags);
+  return tooling::runToolOnCodeWithArgs

[clang] fe0a555 - [analyzer][NFC] Add unittest for FalsePositiveRefutationBRVisitor

2020-06-29 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-06-29T18:18:43+02:00
New Revision: fe0a555aa3c6f37a5b5d79942215b07587893efa

URL: 
https://github.com/llvm/llvm-project/commit/fe0a555aa3c6f37a5b5d79942215b07587893efa
DIFF: 
https://github.com/llvm/llvm-project/commit/fe0a555aa3c6f37a5b5d79942215b07587893efa.diff

LOG: [analyzer][NFC] Add unittest for FalsePositiveRefutationBRVisitor

Adds the test infrastructure for testing the FalsePositiveRefutationBRVisitor.
It will be extended in the D78457 patch, which demonstrates and fixes a bug in
the visitor.

Differential Revision: https://reviews.llvm.org/D78704

Added: 
clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp

Modified: 
clang/unittests/StaticAnalyzer/CMakeLists.txt
clang/unittests/StaticAnalyzer/CheckerRegistration.h

Removed: 




diff  --git a/clang/unittests/StaticAnalyzer/CMakeLists.txt 
b/clang/unittests/StaticAnalyzer/CMakeLists.txt
index 564eba879f24..0e6d8763d96d 100644
--- a/clang/unittests/StaticAnalyzer/CMakeLists.txt
+++ b/clang/unittests/StaticAnalyzer/CMakeLists.txt
@@ -7,6 +7,7 @@ add_clang_unittest(StaticAnalysisTests
   AnalyzerOptionsTest.cpp
   CallDescriptionTest.cpp
   CallEventTest.cpp
+  FalsePositiveRefutationBRVisitorTest.cpp
   ParamRegionTest.cpp
   RangeSetTest.cpp
   RegisterCustomCheckersTest.cpp

diff  --git a/clang/unittests/StaticAnalyzer/CheckerRegistration.h 
b/clang/unittests/StaticAnalyzer/CheckerRegistration.h
index 0bbed9b7784f..e02b856e0e9c 100644
--- a/clang/unittests/StaticAnalyzer/CheckerRegistration.h
+++ b/clang/unittests/StaticAnalyzer/CheckerRegistration.h
@@ -14,6 +14,7 @@
 #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
 #include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
 #include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
 
 namespace clang {
 namespace ento {
@@ -65,10 +66,21 @@ class TestAction : public ASTFrontendAction {
   }
 };
 
+inline SmallString<80> getCurrentTestNameAsFileName() {
+  const ::testing::TestInfo *Info =
+  ::testing::UnitTest::GetInstance()->current_test_info();
+
+  SmallString<80> FileName;
+  (Twine{Info->name()} + ".cc").toVector(FileName);
+  return FileName;
+}
+
 template 
 bool runCheckerOnCode(const std::string &Code, std::string &Diags) {
+  const SmallVectorImpl &FileName = getCurrentTestNameAsFileName();
   llvm::raw_string_ostream OS(Diags);
-  return tooling::runToolOnCode(std::make_unique>(OS), 
Code);
+  return tooling::runToolOnCode(std::make_unique>(OS), Code,
+FileName);
 }
 
 template 
@@ -77,5 +89,22 @@ bool runCheckerOnCode(const std::string &Code) {
   return runCheckerOnCode(Code, Diags);
 }
 
+template 
+bool runCheckerOnCodeWithArgs(const std::string &Code,
+  const std::vector &Args,
+  std::string &Diags) {
+  const SmallVectorImpl &FileName = getCurrentTestNameAsFileName();
+  llvm::raw_string_ostream OS(Diags);
+  return tooling::runToolOnCodeWithArgs(
+  std::make_unique>(OS), Code, Args, FileName);
+}
+
+template 
+bool runCheckerOnCodeWithArgs(const std::string &Code,
+  const std::vector &Args) {
+  std::string Diags;
+  return runCheckerOnCodeWithArgs(Code, Args, Diags);
+}
+
 } // namespace ento
 } // namespace clang

diff  --git 
a/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp 
b/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
new file mode 100644
index ..ad202f844597
--- /dev/null
+++ b/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
@@ -0,0 +1,175 @@
+//===- unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp 
--===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM 
Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===--===//
+
+#include "CheckerRegistration.h"
+#include "Reusables.h"
+#include "clang/Frontend/CompilerInstance.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
+#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
+#include "clang/StaticAnalyzer/Core/Checker.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
+#include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
+#include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
+#include "llvm/Config/config.h"
+#include "gtest/gtest.h"
+
+// FIXME: Use GTEST_SKIP() instead if GTest is updated to version 1.10.0
+#ifdef LLVM_WITH_Z3
+#define SKIP_WITHOUT_Z3
+#else
+#define SKIP_WITHOUT_Z3 return
+#endif
+
+namespace clang {
+namespace ento {
+namespace {
+
+class FalsePositiveGenerator : public Checker {
+  using Self = FalsePositiveGenerat

[clang] de361df - [analyzer][Z3-refutation] Fix a refutation BugReporterVisitor bug

2020-06-29 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-06-29T18:51:24+02:00
New Revision: de361df3f6d0f20bf16a8deb9e6219556d028b81

URL: 
https://github.com/llvm/llvm-project/commit/de361df3f6d0f20bf16a8deb9e6219556d028b81
DIFF: 
https://github.com/llvm/llvm-project/commit/de361df3f6d0f20bf16a8deb9e6219556d028b81.diff

LOG: [analyzer][Z3-refutation] Fix a refutation BugReporterVisitor bug

FalsePositiveRefutationBRVisitor had a bug where the constraints were not
properly collected thus crosschecked with Z3.
This patch demonstratest and fixes that bug.

Bug:
The visitor wanted to collect all the constraints on a BugPath.
Since it is a visitor, it stated the visitation of the BugPath with the node
before the ErrorNode. As a final step, it visited the ErrorNode explicitly,
before it processed the collected constraints.

In principle, the ErrorNode should have visited before every other node.
Since the constraints were collected into a map, mapping each symbol to its
RangeSet, if the map already had a mapping with the symbol, then it was skipped.

This behavior was flawed if:
We already had a constraint on a symbol, but at the end in the ErrorNode we have
a tighter constraint on that. Therefore, this visitor would not utilize that
tighter constraint during the crosscheck validation.

Differential Revision: https://reviews.llvm.org/D78457

Added: 


Modified: 
clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
clang/lib/StaticAnalyzer/Core/BugReporter.cpp
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp

Removed: 




diff  --git 
a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h 
b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
index de0ee5de81b5..365b1ff1bfe3 100644
--- a/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
+++ b/clang/include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h
@@ -386,6 +386,8 @@ class FalsePositiveRefutationBRVisitor final : public 
BugReporterVisitor {
 
   void finalizeVisitor(BugReporterContext &BRC, const ExplodedNode 
*EndPathNode,
PathSensitiveBugReport &BR) override;
+  void addConstraints(const ExplodedNode *N,
+  bool OverwriteConstraintsOnExistingSyms);
 };
 
 

diff  --git a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp 
b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
index efae511da83b..f4fd495956aa 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp
@@ -38,6 +38,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
+#include "clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h"
 #include "llvm/ADT/ArrayRef.h"
@@ -2783,7 +2784,7 @@ Optional 
PathDiagnosticBuilder::findValidReport(
 R->clearVisitors();
 R->addVisitor(std::make_unique());
 
-// We don't overrite the notes inserted by other visitors because the
+// We don't overwrite the notes inserted by other visitors because the
 // refutation manager does not add any new note to the path
 generateVisitorsDiagnostics(R, BugPath->ErrorNode, BRC);
   }

diff  --git a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp 
b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
index ad79f7cb9359..ef4d38ff498f 100644
--- a/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
+++ b/clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
@@ -2819,7 +2819,7 @@ void FalsePositiveRefutationBRVisitor::finalizeVisitor(
 BugReporterContext &BRC, const ExplodedNode *EndPathNode,
 PathSensitiveBugReport &BR) {
   // Collect new constraints
-  VisitNode(EndPathNode, BRC, BR);
+  addConstraints(EndPathNode, /*OverwriteConstraintsOnExistingSyms=*/true);
 
   // Create a refutation manager
   llvm::SMTSolverRef RefutationSolver = llvm::CreateZ3Solver();
@@ -2830,30 +2830,30 @@ void FalsePositiveRefutationBRVisitor::finalizeVisitor(
 const SymbolRef Sym = I.first;
 auto RangeIt = I.second.begin();
 
-llvm::SMTExprRef Constraints = SMTConv::getRangeExpr(
+llvm::SMTExprRef SMTConstraints = SMTConv::getRangeExpr(
 RefutationSolver, Ctx, Sym, RangeIt->From(), RangeIt->To(),
 /*InRange=*/true);
 while ((++RangeIt) != I.second.end()) {
-  Constraints = RefutationSolver->mkOr(
-  Constraints, SMTConv::getRangeExpr(RefutationSolver, Ctx, Sym,
- RangeIt->From(), RangeIt->To(),
- /*InRange=*/true));
+  SMTConstraints

[clang] 63d3aeb - [analyzer] Fix out-of-tree only clang build by not relaying on private header

2020-07-31 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2020-07-31T10:28:14+02:00
New Revision: 63d3aeb529a7b0fb95c2092ca38ad21c1f5cfd74

URL: 
https://github.com/llvm/llvm-project/commit/63d3aeb529a7b0fb95c2092ca38ad21c1f5cfd74
DIFF: 
https://github.com/llvm/llvm-project/commit/63d3aeb529a7b0fb95c2092ca38ad21c1f5cfd74.diff

LOG: [analyzer] Fix out-of-tree only clang build by not relaying on private 
header

It turned out that the D78704 included a private LLVM header, which is excluded
from the LLVM install target.
I'm substituting that `#include` with the public one by moving the necessary
`#define` into that. There was a discussion about this at D78704 and on the
cfe-dev mailing list.

I'm also placing a note to remind others of this pitfall.

Reviewed By: mgorny

Differential Revision: https://reviews.llvm.org/D84929

Added: 


Modified: 
clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
llvm/include/llvm/Config/config.h.cmake
llvm/include/llvm/Config/llvm-config.h.cmake

Removed: 




diff  --git 
a/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp 
b/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
index 7c151c182113..e67dcacca0a9 100644
--- a/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
+++ b/clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
@@ -16,7 +16,7 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
 #include "clang/StaticAnalyzer/Frontend/AnalysisConsumer.h"
 #include "clang/StaticAnalyzer/Frontend/CheckerRegistry.h"
-#include "llvm/Config/config.h"
+#include "llvm/Config/llvm-config.h"
 #include "gtest/gtest.h"
 
 // FIXME: Use GTEST_SKIP() instead if GTest is updated to version 1.10.0

diff  --git a/llvm/include/llvm/Config/config.h.cmake 
b/llvm/include/llvm/Config/config.h.cmake
index 4d76b27df6b6..b8c7e070eb34 100644
--- a/llvm/include/llvm/Config/config.h.cmake
+++ b/llvm/include/llvm/Config/config.h.cmake
@@ -1,6 +1,9 @@
 #ifndef CONFIG_H
 #define CONFIG_H
 
+// Include this header only under the llvm source tree.
+// This is a private header.
+
 /* Exported configuration */
 #include "llvm/Config/llvm-config.h"
 
@@ -335,9 +338,6 @@
 /* Whether GlobalISel rule coverage is being collected */
 #cmakedefine01 LLVM_GISEL_COV_ENABLED
 
-/* Define if we have z3 and want to build it */
-#cmakedefine LLVM_WITH_Z3 ${LLVM_WITH_Z3}
-
 /* Define to the default GlobalISel coverage file prefix */
 #cmakedefine LLVM_GISEL_COV_PREFIX "${LLVM_GISEL_COV_PREFIX}"
 

diff  --git a/llvm/include/llvm/Config/llvm-config.h.cmake 
b/llvm/include/llvm/Config/llvm-config.h.cmake
index 01892bc5b610..ee299876825e 100644
--- a/llvm/include/llvm/Config/llvm-config.h.cmake
+++ b/llvm/include/llvm/Config/llvm-config.h.cmake
@@ -79,6 +79,9 @@
  */
 #cmakedefine01 LLVM_FORCE_ENABLE_STATS
 
+/* Define if we have z3 and want to build it */
+#cmakedefine LLVM_WITH_Z3 ${LLVM_WITH_Z3}
+
 /* Define if LLVM was built with a dependency to the libtensorflow dynamic 
library */
 #cmakedefine LLVM_HAVE_TF_API
 



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] f8643a9 - [analyzer] Prefer wrapping SymbolicRegions by ElementRegions

2022-09-12 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2022-09-13T08:58:46+02:00
New Revision: f8643a9b31c4029942f67d4534c9139b45173504

URL: 
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504
DIFF: 
https://github.com/llvm/llvm-project/commit/f8643a9b31c4029942f67d4534c9139b45173504.diff

LOG: [analyzer] Prefer wrapping SymbolicRegions by ElementRegions

It turns out that in certain cases `SymbolRegions` are wrapped by
`ElementRegions`; in others, it's not. This discrepancy can cause the
analyzer not to recognize if the two regions are actually referring to
the same entity, which then can lead to unreachable paths discovered.

Consider this example:

```lang=C++
struct Node { int* ptr; };
void with_structs(Node* n1) {
  Node c = *n1; // copy
  Node* n2 = &c;
  clang_analyzer_dump(*n1); // lazy...
  clang_analyzer_dump(*n2); // lazy...
  clang_analyzer_dump(n1->ptr); // rval(n1->ptr): reg_$2}.ptr>
  clang_analyzer_dump(n2->ptr); // rval(n2->ptr): reg_$1},0 S64b,struct Node}.ptr>
  clang_analyzer_eval(n1->ptr != n2->ptr); // UNKNOWN, bad!
  (void)(*n1);
  (void)(*n2);
}
```

The copy of `n1` will insert a new binding to the store; but for doing
that it actually must create a `TypedValueRegion` which it could pass to
the `LazyCompoundVal`. Since the memregion in question is a
`SymbolicRegion` - which is untyped, it needs to first wrap it into an
`ElementRegion` basically implementing this untyped -> typed conversion
for the sake of passing it to the `LazyCompoundVal`.
So, this is why we have `Element{SymRegion{.}, 0,struct Node}` for `n1`.

The problem appears if the analyzer evaluates a read from the expression
`n1->ptr`. The same logic won't apply for `SymbolRegionValues`, since
they accept raw `SubRegions`, hence the `SymbolicRegion` won't be
wrapped into an `ElementRegion` in that case.

Later when we arrive at the equality comparison, we cannot prove that
they are equal.

For more details check the corresponding thread on discourse:
https://discourse.llvm.org/t/are-symbolicregions-really-untyped/64406

---

In this patch, I'm eagerly wrapping each `SymbolicRegion` by an
`ElementRegion`; basically canonicalizing to this form.
It seems reasonable to do so since any object can be thought of as a single
array of that object; so this should not make much of a difference.

The tests also underpin this assumption, as only a few were broken by
this change; and actually fixed a FIXME along the way.

About the second example, which does the same copy operation - but on
the heap - it will be fixed by the next patch.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D132142

Added: 
clang/test/Analysis/trivial-copy-struct.cpp

Modified: 
clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h
clang/lib/StaticAnalyzer/Checkers/ExprInspectionChecker.cpp
clang/lib/StaticAnalyzer/Checkers/NullabilityChecker.cpp
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
clang/lib/StaticAnalyzer/Core/MemRegion.cpp
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
clang/test/Analysis/ctor.mm
clang/test/Analysis/expr-inspection.c
clang/test/Analysis/memory-model.cpp
clang/test/Analysis/ptr-arith.c
clang/test/Analysis/ptr-arith.cpp

Removed: 




diff  --git a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h 
b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
index 4c9c8239113e3..3ae28c1dba3eb 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
+++ b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
@@ -42,6 +42,18 @@ class SValExplainer : public FullSValVisitor {
 return false;
   }
 
+  bool isThisObject(const ElementRegion *R) {
+if (const auto *Idx = R->getIndex().getAsInteger()) {
+  if (const auto *SR = R->getSuperRegion()->getAs()) {
+QualType Ty = SR->getPointeeStaticType();
+bool IsNotReinterpretCast = R->getValueType() == Ty;
+if (Idx->isZero() && IsNotReinterpretCast)
+  return isThisObject(SR);
+  }
+}
+return false;
+  }
+
 public:
   SValExplainer(ASTContext &Ctx) : ACtx(Ctx) {}
 
@@ -144,7 +156,7 @@ class SValExplainer : public FullSValVisitor {
   // Add the relevant code once it does.
 
   std::string VisitSymbolicRegion(const SymbolicRegion *R) {
-// Explain 'this' object here.
+// Explain 'this' object here - if it's not wrapped by an ElementRegion.
 // TODO: Explain CXXThisRegion itself, find a way to test it.
 if (isThisObject(R))
   return "'this' object";
@@ -174,6 +186,13 @@ class SValExplainer : public 
FullSValVisitor {
   std::string VisitElementRegion(const ElementRegion *R) {
 std::string Str;
 llvm::raw_string_ostream OS(Str);
+
+// Explain 'this' object here.
+// They are represen

[clang] afcd862 - [analyzer] LazyCompoundVals should be always bound as default bindings

2022-09-12 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2022-09-13T08:58:46+02:00
New Revision: afcd862b2e0a561bf03b1e7b83e6eec8e7143098

URL: 
https://github.com/llvm/llvm-project/commit/afcd862b2e0a561bf03b1e7b83e6eec8e7143098
DIFF: 
https://github.com/llvm/llvm-project/commit/afcd862b2e0a561bf03b1e7b83e6eec8e7143098.diff

LOG: [analyzer] LazyCompoundVals should be always bound as default bindings

`LazyCompoundVals` should only appear as `default` bindings in the
store. This fixes the second case in this patch-stack.

Depends on: D132142

Reviewed By: xazax.hun

Differential Revision: https://reviews.llvm.org/D132143

Added: 


Modified: 
clang/lib/StaticAnalyzer/Core/RegionStore.cpp
clang/test/Analysis/trivial-copy-struct.cpp

Removed: 




diff  --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp 
b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 58f3b1cebee6..eeadd1579a06 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -2400,7 +2400,11 @@ RegionStoreManager::bind(RegionBindingsConstRef B, Loc 
L, SVal V) {
 
   // Clear out bindings that may overlap with this binding.
   RegionBindingsRef NewB = removeSubRegionBindings(B, cast(R));
-  return NewB.addBinding(BindingKey::Make(R, BindingKey::Direct), V);
+
+  // LazyCompoundVals should be always bound as 'default' bindings.
+  auto KeyKind = isa(V) ? BindingKey::Default
+ : BindingKey::Direct;
+  return NewB.addBinding(BindingKey::Make(R, KeyKind), V);
 }
 
 RegionBindingsRef

diff  --git a/clang/test/Analysis/trivial-copy-struct.cpp 
b/clang/test/Analysis/trivial-copy-struct.cpp
index 85efe854c38e..b9a2b1d9b201 100644
--- a/clang/test/Analysis/trivial-copy-struct.cpp
+++ b/clang/test/Analysis/trivial-copy-struct.cpp
@@ -27,10 +27,10 @@ void copy_on_heap(Node* n1) {
   clang_analyzer_dump(n2); // expected-warning-re 
{{&HeapSymRegion{conj_${{[0-9]+}}{Node *, LC{{[0-9]+}}, S{{[0-9]+}}, 
#{{[0-9]+}}
 
   clang_analyzer_dump(n1->ptr); // expected-warning-re 
{{&SymRegion{reg_${{[0-9]+}}},0 S64b,struct Node}.ptr>}}}
-  clang_analyzer_dump(n2->ptr); // expected-warning {{Unknown}} FIXME: This 
should be the same as above.
+  clang_analyzer_dump(n2->ptr); // expected-warning-re 
{{&SymRegion{reg_${{[0-9]+}}},0 S64b,struct Node}.ptr>}}}
 
   if (n1->ptr != n2->ptr)
-clang_analyzer_warnIfReached(); // expected-warning {{REACHABLE}} FIXME: 
This should not be reachable.
+clang_analyzer_warnIfReached(); // unreachable
   (void)(n1->ptr);
   (void)(n2->ptr);
 }



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] 7cddf9c - [analyzer] Dump the environment entry kind as well

2022-09-13 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2022-09-13T09:04:27+02:00
New Revision: 7cddf9cad18a65217c8ba0661fefcf78d841a16b

URL: 
https://github.com/llvm/llvm-project/commit/7cddf9cad18a65217c8ba0661fefcf78d841a16b
DIFF: 
https://github.com/llvm/llvm-project/commit/7cddf9cad18a65217c8ba0661fefcf78d841a16b.diff

LOG: [analyzer] Dump the environment entry kind as well

By this change the `exploded-graph-rewriter` will display the class kind
of the expression of the environment entry. It makes easier to decide if
the given entry corresponds to the lvalue or to the rvalue of some
expression.

It turns out the rewriter already had support for visualizing it, but
probably was never actually used?

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D132109

Added: 


Modified: 
clang/lib/StaticAnalyzer/Core/Environment.cpp
clang/test/Analysis/expr-inspection.c

Removed: 




diff  --git a/clang/lib/StaticAnalyzer/Core/Environment.cpp 
b/clang/lib/StaticAnalyzer/Core/Environment.cpp
index 719793fa224c..3d017b81762a 100644
--- a/clang/lib/StaticAnalyzer/Core/Environment.cpp
+++ b/clang/lib/StaticAnalyzer/Core/Environment.cpp
@@ -274,7 +274,8 @@ void Environment::printJson(raw_ostream &Out, const 
ASTContext &Ctx,
 
   const Stmt *S = I->first.getStmt();
   Indent(Out, InnerSpace, IsDot)
-  << "{ \"stmt_id\": " << S->getID(Ctx) << ", \"pretty\": ";
+  << "{ \"stmt_id\": " << S->getID(Ctx) << ", \"kind\": \""
+  << S->getStmtClassName() << "\", \"pretty\": ";
   S->printJson(Out, nullptr, PP, /*AddQuotes=*/true);
 
   Out << ", \"value\": ";

diff  --git a/clang/test/Analysis/expr-inspection.c 
b/clang/test/Analysis/expr-inspection.c
index 47ac0c251c1f..91c27f5b149f 100644
--- a/clang/test/Analysis/expr-inspection.c
+++ b/clang/test/Analysis/expr-inspection.c
@@ -36,7 +36,7 @@ void foo(int x) {
 // CHECK-NEXT:   ]},
 // CHECK-NEXT:   "environment": { "pointer": "{{0x[0-9a-f]+}}", "items": [
 // CHECK-NEXT: { "lctx_id": {{[0-9]+}}, "location_context": "#0 Call", 
"calling": "foo", "location": null, "items": [
-// CHECK-NEXT:   { "stmt_id": {{[0-9]+}}, "pretty": 
"clang_analyzer_printState", "value": "&code{clang_analyzer_printState}" }
+// CHECK-NEXT:   { "stmt_id": {{[0-9]+}}, "kind": "ImplicitCastExpr", 
"pretty": "clang_analyzer_printState", "value": 
"&code{clang_analyzer_printState}" }
 // CHECK-NEXT: ]}
 // CHECK-NEXT:   ]},
 // CHECK-NEXT:   "constraints": [



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] b8e1da0 - [analyzer] Initialize ShouldEmitErrorsOnInvalidConfigValue analyzer option

2022-09-14 Thread Balazs Benics via cfe-commits

Author: Balazs Benics
Date: 2022-09-14T16:45:44+02:00
New Revision: b8e1da050673470f20a75d4ecca2c0a00d1a8691

URL: 
https://github.com/llvm/llvm-project/commit/b8e1da050673470f20a75d4ecca2c0a00d1a8691
DIFF: 
https://github.com/llvm/llvm-project/commit/b8e1da050673470f20a75d4ecca2c0a00d1a8691.diff

LOG: [analyzer] Initialize ShouldEmitErrorsOnInvalidConfigValue analyzer option

Downstream users who doesn't make use of the clang cc1 frontend for
commandline argument parsing, won't benefit from the Marshalling
provided default initialization of the AnalyzerOptions entries. More
about this later.
Those analyzer option fields, as they are bitfields, cannot be default
initialized at the declaration (prior c++20), hence they are initialized
at the constructor.
The only problem is that `ShouldEmitErrorsOnInvalidConfigValue` was
forgotten.

In this patch I'm proposing to initialize that field with the rest.

Note that this value is read by
`CheckerRegistry.cpp:insertAndValidate()`.
The analyzer options are initialized by the marshalling at
`CompilerInvocation.cpp:GenerateAnalyzerArgs()` by the expansion of the
`ANALYZER_OPTION_WITH_MARSHALLING` xmacro to the appropriate default
value regardless of the constructor initialized list which I'm touching.
Due to that this only affects users using CSA as a library, without
serious effort, I believe we cannot test this.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D133851

Added: 


Modified: 
clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.h

Removed: 




diff  --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.h 
b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
index 2baa7f2e0475..ea80761fe31e 100644
--- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
+++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.h
@@ -289,7 +289,8 @@ class AnalyzerOptions : public 
RefCountedBase {
 ShowCheckerHelpAlpha(false), ShowCheckerHelpDeveloper(false),
 ShowCheckerOptionList(false), ShowCheckerOptionAlphaList(false),
 ShowCheckerOptionDeveloperList(false), ShowEnabledCheckerList(false),
-ShowConfigOptionsList(false), AnalyzeAll(false),
+ShowConfigOptionsList(false),
+ShouldEmitErrorsOnInvalidConfigValue(false), AnalyzeAll(false),
 AnalyzerDisplayProgress(false), eagerlyAssumeBinOpBifurcation(false),
 TrimGraph(false), visualizeExplodedGraphWithGraphViz(false),
 UnoptimizedCFG(false), PrintStats(false), NoRetryExhausted(false),



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> @DonatNagyE  The most straightforward issue that I see is that (if I 
> understand the code correctly) the intersected constraint (the value of the 
> variable `Constraint` at the end of 
> `handleEquivalentAlternativeSymOperands()`) is just discarded after checking 
> that it's not empty. It's plausible that this discarded information will be 
> re-calculated by this process in the follow-up range calculations and don't 
> have a concrete testcase where it's relevant that we "leave food on the 
> table" (I can try to find one if you'd find it helpful) but I think that it 
> may be useful to think about "preserving" this `Constraint` that we 
> calculated. (This is not a blocking issue: I can support merging the current 
> logic if you don't see a reasonably simple improvement in this area.)

I tried adding this hunk at the end of the function, but the assertion never 
fired:
```c++
  // If we learned something, save it.
  if (Constraint != OriginalConstraint) {
assert(false);
State = assign(EquivalenceClass::find(State, SymSym), Constraint);
return static_cast(State);
  }
```

> There is also the choice that you limited the runtime to `|eqclass(X)| + 
> |eqclass(Y)|` by only checking the alternative constraints where one of the 
> two representatives is the original one. I think that here it's a good 
> decision to ensure a reasonable runtime by potentially discarding some 
> information, but I feel that it'd be good to document this decision with a 
> testcase. For example, I think
> 
> ```
> void gh_62215_left_and_right(int x, int y, int z, int w) {
>   if (x != y) return; // x == y
>   if (z != w) return; // z == w
>   if (z <= x) return; // z > x
>   if (w >= y) return; // w < y
> }
> ```

Fair point. Actually, originally I wanted a full cross-product of the 
alternatives, which I refined to a half-cross-product by following Gábor's 
comments. We shall come back here and do something more refined if there is a 
practical need.
I added a FIXME comment explaining our options, along with the test case you 
proposed. Thanks.

> Finally there is the "why do we need this trickery at all?" question that was 
> also raised by @Xazax-hun. I agree with him that on a longer term, it'd be 
> good to "strengthen" the equality classes and ensure that we assign the same 
> RangeSet to each member of an EQClass (or, in other words, we assign the 
> RangeSet to an EQClass instead of an individual symbol). This would 
> potentially improve the runtime (although I'm not sure that performance is 
> especially relevant here) and handle the left-and-right testcase adequately.

I think we haven't discussed yet the approach of applying the constraint to 
every eqclass member. That would feel like defeating the purpose of eqclasses 
at all.
To me, our best option so far is to keep the representative symbols alive and 
canonicalize (replace) all sub-symbols in constraints with representatives 
whenever we merge eqclasses or add constraints.

Thanks for the reviews folks!

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/71284

>From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Sat, 4 Nov 2023 13:44:28 +0100
Subject: [PATCH 1/3] [analyzer][solver] On SymSym RelOps, check EQClass
 members for contradictions

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`,
then check the eqclasses of X and Y respectively and for `X' RELOP Y'`
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives,
then intersecting all these RangeSets should never be empty - aka. there
should be a value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by
`gh_62215_contradicting_nested_right_equivalent`, where we would need to
apply the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become
unreachable after this. Line 127 will remain still reachable, but keep
in mind that when cross-checking with Z3 (aka. Z3 refutation), then all
4 reports would be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215
---
 .../Core/RangeConstraintManager.cpp   | 53 +++
 clang/test/Analysis/constraint-assignor.c | 48 +
 2 files changed, 101 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp 
b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
index 5de99384449a4c8..d631369e895d3a9 100644
--- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2067,6 +2067,12 @@ class ConstraintAssignor : public 
ConstraintAssignorBase {
 return Assignor.assign(CoS, NewConstraint);
   }
 
+  /// Check if using an equivalent operand alternative would lead to
+  /// contradiction.
+  /// If a contradiction is witnessed, returns false; otherwise returns true.
+  bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym,
+  RangeSet Constraint);
+
   /// Handle expressions like: a % b != 0.
   template 
   bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) {
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =
+SymbolicRangeInferrer::inferRange(RangeFactory, State, &AltSymSym);
+Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant);
+
+// Check if we witnessed a contradiction with the equivalent alternative
+// operand.
+if (Constraint.isEmpty()) {
+  State = nullptr;
+  return false;
+}
+  }
+  return true;
+}
+
 bool ConstraintAssignor::assignSymSymExprToRangeSet(const SymSymExpr *Sym,
 RangeSet Constraint) {
   if (!handleRemainderOp(Sym, Constraint))
 return false;
 
+  if (const auto *SymSym = dyn_cast(Sym);
+  SymSym && !handleEquivalentAlternativeSymOperands(SymSym, Constraint)) {
+return false;
+  }
+
   std::optional ConstraintAsBool = interpreteAsBool(Constraint);
 
   if (!ConstraintAsBool)
diff --git a/clang/test/Analysis/constraint-assignor.c 
b/clang/test/Analysis/constraint-assignor.c
index 8210e576c98bd21..d5078b406e12601 100644
--- a/clang/test/Analysis/constraint-assignor.c
+++ b/clang/test/Analysis/constraint-assignor.c
@@ -82,3 +82,51 @@ void remainder_with_adjustment_of_composit_lhs(int x, int y) 
{
   clang_analyzer_eval(x + y != -1);// expected-warning{{TRUE}}
   (void)(x * y); // keep the constraints alive.
 }
+
+void gh_62215(int x, int y, int z) {
+  if (x != y) return; // x ==

[clang] [analyzer] Trust base to derived casts for dynamic types (PR #69057)

2023-11-06 Thread Balazs Benics via cfe-commits


@@ -492,11 +492,13 @@ void check_required_cast() {
 
 void check_cast_behavior(OSObject *obj) {
   OSArray *arr1 = OSDynamicCast(OSArray, obj);
-  clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}}
-// expected-note@-1{{TRUE}}
-// expected-note@-2{{Assuming 'arr1' is 
not equal to 'obj'}}
-// expected-warning@-3{{FALSE}}
-// expected-note@-4   {{FALSE}}
+  clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1
+  // expected-warning@#check_cast_behavior_1 {{TRUE}}
+  // expected-note@#check_cast_behavior_1{{TRUE}}
+  // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}}

steakhal wrote:

I've looked at the exploded-graph before and after the patch, and the values 
and the branches are all the same. There is only one difference: in the new 
graph, after the `(OSArray *)` cast in init-expr of the decl, we have the type 
info bound: `Dynamic Types`:
```
SymRegion{reg_$0} -> OSArray (or a sub-class)
```

Later, when we load for the analyzer_eval, we can see that on the branch where 
the `safeMetaCast` succeeds, `arr1` and `obj` refers to the same lvalue, thus 
already proven to be equal.

![image](https://github.com/llvm/llvm-project/assets/6280485/2d2ee01e-b356-43fd-aa24-62b69d08b145)

BTW if I change the test into this, it would pass on both the old and new 
versions:
```c++
void check_cast_behavior(OSObject *obj) {
  OSArray *arr1 = OSDynamicCast(OSArray, obj);
  if (!arr1) return;
  // expected-note@-1 {{'arr1' is non-null}} expected-note@-1 {{Taking false 
branch}}
  // expected-note@-2 {{'arr1' is non-null}} expected-note@-2 {{Taking false 
branch}}
  clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}} 
expected-note{{TRUE}}
  OSArray *arr2 = OSRequiredCast(OSArray, obj);
  clang_analyzer_eval(arr2 == obj); // expected-warning{{TRUE}} 
expected-note{{TRUE}}
}
```
I think this test would describe the behavior for the "cast-succeeds" branch, 
aka. the branch where `arr1` is non-null. Consequently, I believe this PR would 
not regress this.

https://github.com/llvm/llvm-project/pull/69057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Improve StdLibraryFunctionsChecker 'readlink' modeling. (PR #71373)

2023-11-06 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

LGTM, but please wait for @DonatNagyE to have a look.
BTW I noticed that the note messages are not tested, but I'm okay to not test 
that I think.

https://github.com/llvm/llvm-project/pull/71373
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][Analyzer][NFC] Simplify preDefault/preFseek/preFreadFwrite of StreamChecker (PR #71394)

2023-11-06 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.

I find this alternative less idiomatic (dissimilar to how we implement checks 
in other checkers), thus I find this less readable.
I'm okay with that amount of redundancy as it was present.

https://github.com/llvm/llvm-project/pull/71394
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> > So, if I understand you correctly, at the 3rd if statement, we should 
> > canonicalize the symbol we are constraining by walking every sub-symbol and 
> > substituting it with its equivalent counterpart (if any), by basically with 
> > its eqclass' representative symbol.
> 
> I am not 100% sure if I'd go that far (doing canonicalization). Symbols carry 
> a lot of information, like provenance. Bug reporters, or maybe even some 
> custom checker state might rely on this. I am afraid that replacing/rewriting 
> like that might have unexpected consequences, nothing we cannot solve, but I 
> am not sure whether we want to solve them. What I'd suggest is more like 
> always adding all the constraints to the representative, and lazily 
> propagating those constraints to the other members of the equivalence 
> classes, only when we mention them in a constraint.
> 
> This might also be challenging for bug reporters to explain in some 
> scenarios, but at least we preserve the provenance information.
> 
> @steakhal

It wasn't actually what I had in mind. I wanted to assign constraints to 
canonicalized symexprs instead of to the symexpr the API gets from the assume 
call. And for lookups, we would again do the same thing, canonicalize and only 
then do the lookup - or let the lookup (rather infer do the canonicalization on 
the fly). So to be explicit, no other component would know what 
canonicalization happens within the range-based solver.

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Let the checkers query upper and lower bounds on symbols (PR #74141)

2023-12-05 Thread Balazs Benics via cfe-commits

steakhal wrote:

My take is that the z3-based solver is crashing all over the place. So its not 
just slower. We anyways don't have CI checks for it.
Given all these, I'd rather not put more burden to the issue tracker regarding 
this. I'd consider it if these issues wouldn't be present though, but we are 
really far from that.

https://github.com/llvm/llvm-project/pull/74141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Move alpha checker EnumCastOutOfRange to optin (PR #67157)

2023-12-05 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


steakhal wrote:

> (Force pushed the branch because I wanted to rebase it onto a recent main and 
> fix the merge conflicts. Is there a better workflow than this?)

I think git also offers conflict resolution for `git merge origin/main`. It 
should put the resolution into the merge commit.
That being said, I usually just do the rebase/resolve approach and force-push; 
only after all conversations are resolved and I don't expect anyone reviewing 
at the time. But yea, a merge commit should be the way for GitHub AFAIK.

https://github.com/llvm/llvm-project/pull/67157
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [Analyzer][NFC] Remove redundant function call (PR #75076)

2023-12-11 Thread Balazs Benics via cfe-commits

https://github.com/steakhal commented:

I would agree with @isuckatcs, and I'd be weak against this PR.
Right now I don't see the benefit of asserting this.
Consider downstream users that might use this reporting system and have their 
own trackers. (We don't at Sonar, but pretend), then they would need to remove 
one more unjust assert.
Speaking of assert, 
[here](https://discourse.llvm.org/t/rfc-error-handling-in-release-builds-aka-can-i-use-lldbassert-or-not/74738/10)
 is my statement about them for the whole llvm prohect.
We are on the cold path anyways here. It should not matter.

https://github.com/llvm/llvm-project/pull/75076
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Trust base to derived casts for dynamic types (PR #69057)

2023-11-17 Thread Balazs Benics via cfe-commits


@@ -492,11 +492,13 @@ void check_required_cast() {
 
 void check_cast_behavior(OSObject *obj) {
   OSArray *arr1 = OSDynamicCast(OSArray, obj);
-  clang_analyzer_eval(arr1 == obj); // expected-warning{{TRUE}}
-// expected-note@-1{{TRUE}}
-// expected-note@-2{{Assuming 'arr1' is 
not equal to 'obj'}}
-// expected-warning@-3{{FALSE}}
-// expected-note@-4   {{FALSE}}
+  clang_analyzer_eval(arr1 == obj); // #check_cast_behavior_1
+  // expected-warning@#check_cast_behavior_1 {{TRUE}}
+  // expected-note@#check_cast_behavior_1{{TRUE}}
+  // expected-note@#check_cast_behavior_1{{Assuming 'arr1' is equal to 'obj'}}

steakhal wrote:

Ping @haoNoQ

https://github.com/llvm/llvm-project/pull/69057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Add std::variant checker (PR #66481)

2023-11-21 Thread Balazs Benics via cfe-commits
=?utf-8?q?G=C3=A1bor?= Spaits,Gabor Spaits ,Gabor
 Spaits ,Gabor Spaits 
Message-ID:
In-Reply-To: 


steakhal wrote:

Yes, why not..

https://github.com/llvm/llvm-project/pull/66481
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer][NFC] Add more tests of 'StreamChecker' about 'tmpfile' (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

LGTM

https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Simplify SVal for simple NonLoc->Loc casts (PR #66463)

2023-10-31 Thread Balazs Benics via cfe-commits

steakhal wrote:

This is a brief summary of my recent investigation, no direct action is 
required.

I had a quick look at the issue differences between clang-17 and llvm/main as a 
preparation for the clang-18 release in early January and noticed that because 
of this patch, we have some unexpected improvements.
FYI this FP was eliminated by Z3 refutation anyway if you enabled that, but now 
even the engine can avoid exploring that dead-code.

---

Now that we simplify more often (see the patch), we can exclude more execution 
paths.
Here is an [example](https://godbolt.org/z/x35MT33xT):

```C++
int fixedFP(int x) {
  int* z = 0;
  if ((x & 1) && ((x & 1) ^ 1)) {
return *z; // Nullptr deref FP? Now it's eliminated.
  }
  return 0;
}
```

Notice, that the FP is only eliminated in C++ mode, and still present in C mode 
for some reason.
My theory is that we use `int` as a `bool` type, thus simplification can't 
exploit that a boolean expression can only be `1` or `0`; and fails to simplify 
the expression.

The reason for having a wider impact than originally anticipated was that the 
function we modified `handleLValueBitCast()`
suggested that it should be invoked only for lvalue bitcasts, however, 
actually, it gets invoked from a few more places including ExprEngineC.cpp 
ExprEngine::VisitCast:
```C++
  case CK_IntegralToBoolean:
  case CK_IntegralToFloating:
  case CK_FloatingToIntegral:
  case CK_FloatingToBoolean:
  case CK_FloatingCast:
  case CK_FloatingRealToComplex:
  case CK_FloatingComplexToReal:
  case CK_FloatingComplexToBoolean:
  case CK_FloatingComplexCast:
  case CK_FloatingComplexToIntegralComplex:
  case CK_IntegralRealToComplex:
  case CK_IntegralComplexToReal:
  case CK_IntegralComplexToBoolean:
  case CK_IntegralComplexCast:
  case CK_IntegralComplexToFloatingComplex:
  case CK_CPointerToObjCPointerCast:
  case CK_BlockPointerToObjCPointerCast:
  case CK_AnyPointerToBlockPointerCast:
  case CK_ObjCObjectLValueCast:
  case CK_ZeroToOCLOpaqueType:
  case CK_IntToOCLSampler:
  case CK_LValueBitCast:
  case CK_FloatingToFixedPoint:
  case CK_FixedPointToFloating:
  case CK_FixedPointCast:
  case CK_FixedPointToBoolean:
  case CK_FixedPointToIntegral:
  case CK_IntegralToFixedPoint: {
state =
handleLValueBitCast(state, Ex, LCtx, T, ExTy, CastE, Bldr, Pred);
continue;
  }
```

And this could have caused this "improvement".

---

The exploded node of the `PostStmt` `((x & 1) ^ 1)` was this prior the patch:
![image](https://github.com/llvm/llvm-project/assets/6280485/9fd6c577-c92d-48bc-a348-43014f7f3d41)
And now it looks like this (notice that the expression value is now simplified 
to `0 U1b`):
![image](https://github.com/llvm/llvm-project/assets/6280485/48c4cd9a-dc36-46ac-a871-67ebc5fd52f7)

Here is how a child node looks like for C now (notice that the constraint is 
`((reg_$0) & 1) ^ 1:  { [-2147483648, -1], [1, 2147483647] }`. This is 
the same state we had before this patch for C and C++ as well.
![image](https://github.com/llvm/llvm-project/assets/6280485/4335f96b-6c94-4bb0-96db-75e1f0474307)


https://github.com/llvm/llvm-project/pull/66463
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.

I must admit that I didn't look at the issue too much, but this patch speaks 
for itself.
Clean, to the point, and meets our conventions. Kudos.
I only have minor remarks.

And be sure to mention `Fixes #70464` in the PR/commit message to auto-close 
the issue.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -1222,6 +1222,15 @@ void ExprEngine::ProcessInitializer(const CFGInitializer 
CFGInit,
   PostInitializer PP(BMI, FieldLoc.getAsRegion(), stackFrame);
   evalBind(Tmp, Init, Pred, FieldLoc, InitVal, /*isInit=*/true, &PP);
 }
+  } else if (BMI->isBaseInitializer() && isa(Init)) {
+// When the base class is initialized with an initialization list, there
+// will not be a CXXConstructExpr to initialize the base region. Hence, we
+// need to make the bind for it.
+StoreManager &StoreMgr = State->getStateManager().getStoreManager();
+SVal BaseLoc = StoreMgr.evalDerivedToBase(

steakhal wrote:

```suggestion
SVal BaseLoc = getStoreManager()->evalDerivedToBase(
```

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,69 @@
+// Refer issue 70464 for more details.
+//
+// When the base class does not have a declared constructor, the base
+// initializer in the constructor of the derived class should use the given
+// initializer list to finish the initialization of the base class.
+//
+// Also testing base constructor and delegate constructor to make sure this
+// change will not break these two cases when a CXXConstructExpr is available.
+
+// RUN: %clang_analyze_cc1 %s -verify 
-analyzer-checker=core,debug.ExprInspection

steakhal wrote:

I'd put this at the top, as that's the usual place for these RUN lines.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,69 @@
+// Refer issue 70464 for more details.
+//
+// When the base class does not have a declared constructor, the base
+// initializer in the constructor of the derived class should use the given
+// initializer list to finish the initialization of the base class.
+//
+// Also testing base constructor and delegate constructor to make sure this
+// change will not break these two cases when a CXXConstructExpr is available.
+
+// RUN: %clang_analyze_cc1 %s -verify 
-analyzer-checker=core,debug.ExprInspection
+
+void clang_analyzer_dump(int);
+
+namespace init_list {
+
+struct foo {
+  int foox;
+};
+
+class bar : public foo {
+public:
+  bar() : foo{42} {
+// The dereference to this->foox below should be initialized properly.
+clang_analyzer_dump(this->foox); // expected-warning{{42 S32b}}

steakhal wrote:

```suggestion
clang_analyzer_dump(this->foox); // expected-warning{{42 S32b}}
clang_analyzer_dump(foox); // expected-warning{{42 S32b}}
```
Let's just have both here to be sure.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -1222,6 +1222,15 @@ void ExprEngine::ProcessInitializer(const CFGInitializer 
CFGInit,
   PostInitializer PP(BMI, FieldLoc.getAsRegion(), stackFrame);
   evalBind(Tmp, Init, Pred, FieldLoc, InitVal, /*isInit=*/true, &PP);
 }
+  } else if (BMI->isBaseInitializer() && isa(Init)) {
+// When the base class is initialized with an initialization list, there
+// will not be a CXXConstructExpr to initialize the base region. Hence, we
+// need to make the bind for it.
+StoreManager &StoreMgr = State->getStateManager().getStoreManager();
+SVal BaseLoc = StoreMgr.evalDerivedToBase(
+thisVal, QualType(BMI->getBaseClass(), 0), BMI->isBaseVirtual());
+SVal InitVal = State->getSVal(Init, stackFrame);
+evalBind(Tmp, Init, Pred, BaseLoc, InitVal, true);

steakhal wrote:

BTW I can see that in the neighborhood, we also pass a `PostInitializer PP` as 
the last argument of `evalBind`.
Why don't you do the same here?
To me, we should deal with the same "event"/program point here as everywhere in 
this function.

Maybe hoist `PostInitializer PP(BMI, FieldLoc.getAsRegion(), stackFrame);` and 
use `PP` everywhere.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,69 @@
+// Refer issue 70464 for more details.
+//
+// When the base class does not have a declared constructor, the base
+// initializer in the constructor of the derived class should use the given
+// initializer list to finish the initialization of the base class.
+//
+// Also testing base constructor and delegate constructor to make sure this
+// change will not break these two cases when a CXXConstructExpr is available.
+
+// RUN: %clang_analyze_cc1 %s -verify 
-analyzer-checker=core,debug.ExprInspection
+
+void clang_analyzer_dump(int);
+
+namespace init_list {
+
+struct foo {
+  int foox;
+};
+
+class bar : public foo {
+public:
+  bar() : foo{42} {
+// The dereference to this->foox below should be initialized properly.
+clang_analyzer_dump(this->foox); // expected-warning{{42 S32b}}
+  }
+};
+
+void entry() { bar test; }
+
+} // namespace init_list
+
+namespace base_ctor_call {
+
+void clang_analyzer_dump(int);
+
+struct foo {
+  int foox;
+  foo(int x) : foox(x) {}
+};
+
+class bar : public foo {
+public:
+  bar() : foo{42} {
+clang_analyzer_dump(this->foox); // expected-warning{{42 S32b}}
+  }
+};
+
+void entry() { bar test; }
+
+} // namespace base_ctor_call
+
+namespace delegate_ctor_call {
+
+void clang_analyzer_dump(int);
+
+struct foo {
+  int foox;
+};
+
+struct bar : foo {

steakhal wrote:

For inheritance-related stuff, I'd advocate for using identifiers that carry 
some sort of meaning, possibly denoting their part of the relationship, such as 
`Derived` and `Base`.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,69 @@
+// Refer issue 70464 for more details.

steakhal wrote:

Feel free to put a full link if you want.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,69 @@
+// Refer issue 70464 for more details.
+//
+// When the base class does not have a declared constructor, the base
+// initializer in the constructor of the derived class should use the given
+// initializer list to finish the initialization of the base class.
+//
+// Also testing base constructor and delegate constructor to make sure this
+// change will not break these two cases when a CXXConstructExpr is available.
+
+// RUN: %clang_analyze_cc1 %s -verify 
-analyzer-checker=core,debug.ExprInspection
+
+void clang_analyzer_dump(int);
+
+namespace init_list {
+
+struct foo {
+  int foox;
+};
+
+class bar : public foo {
+public:
+  bar() : foo{42} {
+// The dereference to this->foox below should be initialized properly.
+clang_analyzer_dump(this->foox); // expected-warning{{42 S32b}}
+  }
+};
+
+void entry() { bar test; }
+
+} // namespace init_list
+
+namespace base_ctor_call {
+
+void clang_analyzer_dump(int);

steakhal wrote:

Only define these in the global namespace, please.
That's the common pattern so far in the tests.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Update CallDescription of 'tmpfile' & 'fopen' in StreamChecker (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Update CallDescription of 'tmpfile' & 'fopen' in StreamChecker (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,16 @@
+// RUN: %clang_analyze_cc1 -fno-builtin 
-analyzer-checker=core,alpha.unix.Stream -verify %s
+// expected-no-diagnostics
+
+typedef struct _FILE FILE;
+
+// These functions are not standard C library functions.
+FILE *tmpfile(const char *restrict path);
+FILE *fopen(const char *restrict path);

steakhal wrote:

```suggestion
// These functions are not standard C library functions.
FILE *tmpfile(const char *restrict path); // Real 'tmpfile' should have exactly 
0 formal parameters.
FILE *fopen(const char *restrict path); // Real 'fopen' should have exactly 2 
formal parameters.
```

https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Update CallDescription of 'tmpfile' & 'fopen' in StreamChecker (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.

Still looks good to me.
I recommended a couple comments here and there to clarify the intent of the 
test, and to raise awareness.

I'd suggest to reword the PR title (and the commit title ofc) to something like
`[analyzer] Restrict fopen and tmpfile modeling to POSIX versions in 
StreamChecker`


I'd also suggest renaming the `clang/test/Analysis/stream-wild-function.c` file 
to something more descriptive like 
`clang/test/Analysis/non-posix-stream-functions.c`

https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Update CallDescription of 'tmpfile' & 'fopen' in StreamChecker (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,16 @@
+// RUN: %clang_analyze_cc1 -fno-builtin 
-analyzer-checker=core,alpha.unix.Stream -verify %s
+// expected-no-diagnostics
+
+typedef struct _FILE FILE;
+
+// These functions are not standard C library functions.
+FILE *tmpfile(const char *restrict path);
+FILE *fopen(const char *restrict path);
+
+void test_fopen(void) {
+  FILE *fp = fopen("file");
+}
+
+void test_tmpfile(void) {
+  FILE *fp = tmpfile("file");
+}

steakhal wrote:

```suggestion
void non_posix_fopen(void) {
  FILE *fp = fopen("file"); // no-leak: this isn't the standard POSIX fopen, we 
don't the semantics of this call.
}

void non_posix_tmpfile(void) {
  FILE *fp = tmpfile("file"); // no-leak: this isn't the standard POSIX 
tmpfile, we don't the semantics of this call.
}
```

https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+return RegName;

steakhal wrote:

```suggestion
  if (std::string RegName = Region->getDescriptiveName(); !RegName.empty())
return RegName;
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs()) {
+StringRef Name = FR->getDecl()->getName();
+if (!Name.empty())
+  return formatv("the field '{0}'", Name);
+return "the unnamed field";
+  }
+
+  if (isa(Region))
+return "the memory returned by 'alloca'";
+
+  if (isa(Region) &&
+  isa(Region->getMemorySpace()))
+return "the heap area";
+
+  if (isa(Region))
+return "the string literal";
+
+  return "the region";
+}
+
+static std::optional getConcreteValue(NonLoc SV) {
+  if (auto ConcreteVal = SV.getAs()) {
+const llvm::APSInt &IntVal = ConcreteVal->getValue();
+return IntVal.tryExtValue();
+  }
+  return std::nullopt;
+}
+
+static const char *ShortMsgTemplates[] = {
+"Out of bound access to memory preceding {0}",
+"Out of bound access to memory after the end of {0}",
+"Potential out of bound access to {0} with tainted offset"};
+
+static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
+  return formatv(ShortMsgTemplates[Kind], RegName);

steakhal wrote:

You could have the `ShortMsgTemplates` defined within this function, closer to 
the place being used.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal requested changes to this pull request.

Looks good. I only have minor remarks.
Consider renaming the PR `Improve reports` -> `Improve messages`, or 
`diagnostics`, to highlight that the "messages" aspect is improved, not e.g. 
improving the FP/TP rate or something like that.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs()) {
+StringRef Name = FR->getDecl()->getName();
+if (!Name.empty())
+  return formatv("the field '{0}'", Name);
+return "the unnamed field";

steakhal wrote:

```suggestion
if (StringRef Name = FR->getDecl()->getName(); !Name.empty())
  return formatv("the field '{0}'", Name);
return "the unnamed field";
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs()) {
+StringRef Name = FR->getDecl()->getName();
+if (!Name.empty())
+  return formatv("the field '{0}'", Name);
+return "the unnamed field";
+  }
+
+  if (isa(Region))
+return "the memory returned by 'alloca'";
+
+  if (isa(Region) &&
+  isa(Region->getMemorySpace()))
+return "the heap area";
+
+  if (isa(Region))
+return "the string literal";
+
+  return "the region";
+}
+
+static std::optional getConcreteValue(NonLoc SV) {
+  if (auto ConcreteVal = SV.getAs()) {
+const llvm::APSInt &IntVal = ConcreteVal->getValue();
+return IntVal.tryExtValue();

steakhal wrote:

```suggestion
return ConcreteVal->getValue().tryExtValue();
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve reports from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs()) {
+StringRef Name = FR->getDecl()->getName();
+if (!Name.empty())
+  return formatv("the field '{0}'", Name);
+return "the unnamed field";
+  }
+
+  if (isa(Region))
+return "the memory returned by 'alloca'";
+
+  if (isa(Region) &&
+  isa(Region->getMemorySpace()))
+return "the heap area";
+
+  if (isa(Region))
+return "the string literal";
+
+  return "the region";
+}
+
+static std::optional getConcreteValue(NonLoc SV) {
+  if (auto ConcreteVal = SV.getAs()) {
+const llvm::APSInt &IntVal = ConcreteVal->getValue();
+return IntVal.tryExtValue();
+  }
+  return std::nullopt;
+}
+
+static const char *ShortMsgTemplates[] = {
+"Out of bound access to memory preceding {0}",
+"Out of bound access to memory after the end of {0}",
+"Potential out of bound access to {0} with tainted offset"};
+
+static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
+  return formatv(ShortMsgTemplates[Kind], RegName);

steakhal wrote:

I didnt mean local statics, simple locals to global cstring constant literals 
should be fine.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in CXXInstanceCall::getCXXThisVal (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/70837

Workaround the case when the `this` pointer is actually a `NonLoc`, by 
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to how 
casts work, I feel this is our easiest and best option.

I've considered using `SVB.evalCast(ThisVal, Base->getType(), QualType())`, but 
it doesn't work as `EvalCastVisitor::VisitNonLocSymbolVal()` only evaluates 
casts that happen from NonLoc to NonLocs.

When I tried to actually implement that case, I figured:
1) Create a SymbolicRegion from that nonloc::SymbolVal; but SymbolRegion ctor 
expects a pointer type for the symbol.
2) Okay, just have a SymbolCast, getting us the pointer type; but SymbolRegion 
expects SymbolData symbols, not generic SymExprs, as stated:

> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.

3) We can't use ElementRegions to perform this cast because to have an 
ElementRegion, you already have to have a SubRegion that you want to cast, but 
the point is that we don't have that.

At this point, I gave up, and just returned `Unknown` xD IMO this is still 
better than having a crash.

Fixes #69922

>From a7f64815f4986fad597b9cb2d1acce2de9ac20bf Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Mon, 23 Oct 2023 18:10:29 +0200
Subject: [PATCH] [analyzer] Fix assertion failure in
 CXXInstanceCall::getCXXThisVal

Workaround the case when the `this` pointer is actually a `NonLoc`, by
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to
how casts work, I feel this is our easiest and best option.

I've considered using `SVB.evalCast(ThisVal, Base->getType(), QualType())`,
but it doesn't work as `EvalCastVisitor::VisitNonLocSymbolVal()` only
evaluates casts that happen from NonLoc to NonLocs.

When I tried to actually implement that case, I figured:
1) Create a SymbolicRegion from that nonloc::SymbolVal; but SymbolRegion
   ctor expects a pointer type for the symbol.
2) Okay, just have a SymbolCast, getting us the pointer type; but
   SymbolRegion expects SymbolData symbols, not generic SymExprs, as stated:

> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.

3) We can't use ElementRegions to perform this cast because to have an
   ElementRegion, you already have to have a SubRegion that you want to
   cast, but the point is that we don't have that.

At this point, I gave up, and just returned `Unknown` xD
IMO this is still better than having a crash.

Fixes #69922
---
 clang/lib/StaticAnalyzer/Core/CallEvent.cpp |  5 ++---
 clang/test/Analysis/builtin_bitcast.cpp | 21 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp 
b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
index ad5bb66c4fff3c8..20bc64dc2631cec 100644
--- a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
@@ -715,10 +715,9 @@ void CXXInstanceCall::getExtraInvalidatedValues(
 SVal CXXInstanceCall::getCXXThisVal() const {
   const Expr *Base = getCXXThisExpr();
   // FIXME: This doesn't handle an overloaded ->* operator.
-  if (!Base)
+  SVal ThisVal = Base ? getSVal(Base) : UnknownVal();
+  if (isa(ThisVal))
 return UnknownVal();
-
-  SVal ThisVal = getSVal(Base);
   assert(ThisVal.isUnknownOrUndef() || isa(ThisVal));
   return ThisVal;
 }
diff --git a/clang/test/Analysis/builtin_bitcast.cpp 
b/clang/test/Analysis/builtin_bitcast.cpp
index 396e7caa45f6acd..13475694d287a22 100644
--- a/clang/test/Analysis/builtin_bitcast.cpp
+++ b/clang/test/Analysis/builtin_bitcast.cpp
@@ -30,3 +30,24 @@ void test(int i) {
   clang_analyzer_dump(g4);
   // expected-warning@-1 {{&i [as 64 bit integer]}}
 }
+
+struct A {
+  int n;
+  void set(int x) {
+n = x;
+  }
+};
+using ptr_size = decltype(sizeof(void *));
+void gh_69922(ptr_size p) {
+  // expected-warning-re@+1 {{(reg_${{[0-9]+}}) & 1U}}
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1));
+
+  __builtin_bit_cast(A*, p & 1)->set(2); // no-crash
+  // However, since the `this` pointer is expected to be a Loc, but we have
+  // NonLoc there, we simply give up and resolve it as `Unknown`.
+  // Then, inside the `set()` member function call we can't evaluate the
+  // store to the member variable `n`.
+
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1)->n); // Ideally, this 
should print "2".
+  // expected-warning-re@-1 {{(reg_${{[0-9]+}}) & 1U}}
+}

___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -217,80 +326,71 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, 
bool isLoad,
 // MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
 // non-symbolic regions (e.g. a field subregion of a symbolic region) in
 // unknown space.
-auto [state_precedesLowerBound, state_withinLowerBound] =
-compareValueToThreshold(state, ByteOffset,
-svalBuilder.makeZeroArrayIndex(), svalBuilder);
+auto [PrecedesLowerBound, WithinLowerBound] = compareValueToThreshold(
+State, ByteOffset, SVB.makeZeroArrayIndex(), SVB);
 
-if (state_precedesLowerBound && !state_withinLowerBound) {
+if (PrecedesLowerBound && !WithinLowerBound) {
   // We know that the index definitely precedes the lower bound.
-  reportOOB(checkerContext, state_precedesLowerBound, OOB_Precedes);
+  std::string RegName = getRegionName(Reg);
+  std::string Msg = getPrecedesMsg(RegName, ByteOffset);
+  reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);

steakhal wrote:

```suggestion
  reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, 
std::move(RegName), std::move(Msg));
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.

I only had a couple `std::move`s missing, an FYI comment, and one question 
about the diagnostics in the tests.

Even in the current state, I think it's a good baby step in improving the 
status quo. 
Thank you for pushing this forward!
---

One remark about the review workflow, I'd prefer if the conversation starter 
would resolve the conversations. Let me explain why:

Given the amount of reviewers for CSA, I'd suggest explicitly supporting 
reviewer experience.
Speaking of that, personally I find the following workflow to suite myself the 
best:
 - All comments needs a reaction, either by explict commenting on it or by 
putting there a thums-up emoji or something. This helps the author follow which 
comments were addressed downstream and is pending for submission. An explicit 
comment is expected for challenging a comment. I prefer emojies, becase they 
don't send an email to everyone subscribe - unlike for individual comments; and 
usually I also batch individual commits into a "self-review" as patch author 
when I reply for the same reason.
 - All comments needs to be reacted before requestion the next round of review 
from the person who added those comments.
 - The comment should be only marked resolved by the person who raised the 
concern. This ensures that the comment was adequately solved, thus the issue is 
no longer relevant.
 - To merge a PR, the PR should have no open conversations.

Note that I'm not raising these because I feel we have to adjust, but rather 
because I find the reviews experience on GitHub really poor in general.
Comments disappear, comments don't show up for the patch author, only if they 
look at the right pane like "Conversations" and "Files changed" - yes, not all 
comments are present on each -.- And I've been bitten by it as a patch author 
many many times, and expectations around "reacting on comments" helped to 
mitigate this pain to some degree.

Note that this is only my personal preference, and if I'm not mistaken, there 
is no legitimized consensus around the project. At least, last time I checked 
the following relevant thread around GitHub review workflows:
https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178/



https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -174,9 +176,116 @@ compareValueToThreshold(ProgramStateRef State, NonLoc 
Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
-void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
-const Stmt* LoadS,
-CheckerContext &checkerContext) const {
+static std::string getRegionName(const SubRegion *Region) {
+  if (std::string RegName = Region->getDescriptiveName(); !RegName.empty())
+return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs()) {
+if (StringRef Name = FR->getDecl()->getName(); !Name.empty())
+  return formatv("the field '{0}'", Name);
+return "the unnamed field";
+  }
+
+  if (isa(Region))
+return "the memory returned by 'alloca'";
+
+  if (isa(Region) &&
+  isa(Region->getMemorySpace()))
+return "the heap area";
+
+  if (isa(Region))
+return "the string literal";
+
+  return "the region";
+}
+
+static std::optional getConcreteValue(NonLoc SV) {
+  if (auto ConcreteVal = SV.getAs()) {
+return ConcreteVal->getValue().tryExtValue();
+  }
+  return std::nullopt;
+}
+
+static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
+  static const char *ShortMsgTemplates[] = {
+  "Out of bound access to memory preceding {0}",
+  "Out of bound access to memory after the end of {0}",
+  "Potential out of bound access to {0} with tainted offset"};
+
+  return formatv(ShortMsgTemplates[Kind], RegName);
+}
+
+static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+  SmallString<128> Buf;
+  llvm::raw_svector_ostream Out(Buf);

steakhal wrote:

Have you considered using `std::string` and `llvm::raw_string_ostream`, given 
that you convert it to `std::string` anyways. BTW sorry for not spotting this 
at the previous round.
This applies to other similar cases as well.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -217,80 +326,71 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, 
bool isLoad,
 // MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
 // non-symbolic regions (e.g. a field subregion of a symbolic region) in
 // unknown space.
-auto [state_precedesLowerBound, state_withinLowerBound] =
-compareValueToThreshold(state, ByteOffset,
-svalBuilder.makeZeroArrayIndex(), svalBuilder);
+auto [PrecedesLowerBound, WithinLowerBound] = compareValueToThreshold(
+State, ByteOffset, SVB.makeZeroArrayIndex(), SVB);
 
-if (state_precedesLowerBound && !state_withinLowerBound) {
+if (PrecedesLowerBound && !WithinLowerBound) {
   // We know that the index definitely precedes the lower bound.
-  reportOOB(checkerContext, state_precedesLowerBound, OOB_Precedes);
+  std::string RegName = getRegionName(Reg);
+  std::string Msg = getPrecedesMsg(RegName, ByteOffset);
+  reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
   return;
 }
 
-if (state_withinLowerBound)
-  state = state_withinLowerBound;
+if (WithinLowerBound)
+  State = WithinLowerBound;
   }
 
   // CHECK UPPER BOUND
-  DefinedOrUnknownSVal Size = getDynamicExtent(state, Reg, svalBuilder);
+  DefinedOrUnknownSVal Size = getDynamicExtent(State, Reg, SVB);
   if (auto KnownSize = Size.getAs()) {
-auto [state_withinUpperBound, state_exceedsUpperBound] =
-compareValueToThreshold(state, ByteOffset, *KnownSize, svalBuilder);
+auto [WithinUpperBound, ExceedsUpperBound] =
+compareValueToThreshold(State, ByteOffset, *KnownSize, SVB);
 
-if (state_exceedsUpperBound) {
-  if (!state_withinUpperBound) {
+if (ExceedsUpperBound) {
+  if (!WithinUpperBound) {
 // We know that the index definitely exceeds the upper bound.
-reportOOB(checkerContext, state_exceedsUpperBound, OOB_Exceeds);
+std::string RegName = getRegionName(Reg);
+std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
+*KnownSize, Location);
+reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
 return;
   }
-  if (isTainted(state, ByteOffset)) {
+  if (isTainted(State, ByteOffset)) {
 // Both cases are possible, but the index is tainted, so report.
-reportOOB(checkerContext, state_exceedsUpperBound, OOB_Taint,
-  ByteOffset);
+std::string RegName = getRegionName(Reg);
+std::string Msg = getTaintMsg(RegName);
+reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);

steakhal wrote:

```suggestion
reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, 
std::move(RegName), std::move(Msg));
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -217,80 +326,71 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, 
bool isLoad,
 // MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
 // non-symbolic regions (e.g. a field subregion of a symbolic region) in
 // unknown space.
-auto [state_precedesLowerBound, state_withinLowerBound] =
-compareValueToThreshold(state, ByteOffset,
-svalBuilder.makeZeroArrayIndex(), svalBuilder);
+auto [PrecedesLowerBound, WithinLowerBound] = compareValueToThreshold(
+State, ByteOffset, SVB.makeZeroArrayIndex(), SVB);
 
-if (state_precedesLowerBound && !state_withinLowerBound) {
+if (PrecedesLowerBound && !WithinLowerBound) {
   // We know that the index definitely precedes the lower bound.
-  reportOOB(checkerContext, state_precedesLowerBound, OOB_Precedes);
+  std::string RegName = getRegionName(Reg);
+  std::string Msg = getPrecedesMsg(RegName, ByteOffset);
+  reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);
   return;
 }
 
-if (state_withinLowerBound)
-  state = state_withinLowerBound;
+if (WithinLowerBound)
+  State = WithinLowerBound;
   }
 
   // CHECK UPPER BOUND
-  DefinedOrUnknownSVal Size = getDynamicExtent(state, Reg, svalBuilder);
+  DefinedOrUnknownSVal Size = getDynamicExtent(State, Reg, SVB);
   if (auto KnownSize = Size.getAs()) {
-auto [state_withinUpperBound, state_exceedsUpperBound] =
-compareValueToThreshold(state, ByteOffset, *KnownSize, svalBuilder);
+auto [WithinUpperBound, ExceedsUpperBound] =
+compareValueToThreshold(State, ByteOffset, *KnownSize, SVB);
 
-if (state_exceedsUpperBound) {
-  if (!state_withinUpperBound) {
+if (ExceedsUpperBound) {
+  if (!WithinUpperBound) {
 // We know that the index definitely exceeds the upper bound.
-reportOOB(checkerContext, state_exceedsUpperBound, OOB_Exceeds);
+std::string RegName = getRegionName(Reg);
+std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
+*KnownSize, Location);
+reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);

steakhal wrote:

```suggestion
reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, 
std::move(RegName), std::move(Msg));
```

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -0,0 +1,149 @@
+// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-output=text\
+// RUN: 
-analyzer-checker=core,alpha.security.ArrayBoundV2,unix.Malloc,alpha.security.taint
 -verify %s
+
+int array[10];
+
+void arrayUnderflow(void) {
+  array[-3] = 5;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
+}
+
+int scanf(const char *restrict fmt, ...);
+
+void taintedIndex(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with 
tainted offset}}
+  // expected-note@-2 {{Access of 'array' with a tainted offset that may be 
too large}}
+}
+
+void arrayOverflow(void) {
+  array[12] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 
'int' elements}}
+}
+
+int scalar;
+int scalarOverflow(void) {
+  return (&scalar)[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'scalar'}}
+  // expected-note@-2 {{Access of 'scalar' at index 1, while it holds only a 
single 'int' element}}
+}
+
+int oneElementArray[1];
+int oneElementArrayOverflow(void) {
+  return oneElementArray[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'oneElementArray'}}
+  // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds 
only a single 'int' element}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 
'short' elements}}
+}
+
+struct vec {
+  int len;
+  double elems[64];
+} v;
+
+double arrayInStruct(void) {
+  return v.elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'v.elems'}}
+  // expected-note@-2 {{Access of 'v.elems' at index 64, while it holds only 
64 'double' elements}}
+}
+
+double arrayInStructPtr(struct vec *pv) {
+  return pv->elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
field 'elems'}}
+  // expected-note@-2 {{Access of the field 'elems' at index 64, while it 
holds only 64 'double' elements}}
+}
+
+struct two_bytes {
+  char lo, hi;
+};
+
+struct two_bytes convertedArray2(void) {
+  // We report this with byte offsets because the offset is not divisible by 
the element size.
+  struct two_bytes a = {0, 0};
+  char *p = (char*)&a;
+  return *((struct two_bytes*)(p + 7));
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'a'}}
+  // expected-note@-2 {{Access of 'a' at byte offset 7, while it holds only 2 
bytes}}
+}
+
+int intFromString(void) {
+  // We report this with byte offsets because the extent is not divisible by 
the element size.
+  return ((const int*)"this is a string of 33 characters")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
string literal}}
+  // expected-note@-2 {{Access of the string literal at byte offset 80, while 
it holds only 34 bytes}}
+}
+
+int intFromStringDivisible(void) {
+  // However, this is reported with indices/elements, because the extent 
happens to be a multiple of 4.
+  return ((const int*)"abc")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
string literal}}
+  // expected-note@-2 {{Access of the string literal at index 20, while it 
holds only a single 'int' element}}
+}

steakhal wrote:

I couldn't wrap my head around the end of the note: `while it holds only a 
single 'int' element`
Could you elaborate on why is `int` mentioned here, and if so, why only a 
`single`?

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -22,23 +22,25 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
 #include "llvm/ADT/SmallString.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/raw_ostream.h"
 #include 
 
 using namespace clang;
 using namespace ento;
 using namespace taint;
+using llvm::formatv;
 
 namespace {
+enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };

steakhal wrote:

FYI you can use any identifiers here because these are only exposed to this 
single TU.
The ugly prefixes are only applied to enums exposed in headers etc. where there 
is an unbounded chance of having collisions.
No action is expected.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-10-31 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits


@@ -1222,6 +1222,15 @@ void ExprEngine::ProcessInitializer(const CFGInitializer 
CFGInit,
   PostInitializer PP(BMI, FieldLoc.getAsRegion(), stackFrame);
   evalBind(Tmp, Init, Pred, FieldLoc, InitVal, /*isInit=*/true, &PP);
 }
+  } else if (BMI->isBaseInitializer() && isa(Init)) {
+// When the base class is initialized with an initialization list, there
+// will not be a CXXConstructExpr to initialize the base region. Hence, we
+// need to make the bind for it.
+StoreManager &StoreMgr = State->getStateManager().getStoreManager();
+SVal BaseLoc = StoreMgr.evalDerivedToBase(
+thisVal, QualType(BMI->getBaseClass(), 0), BMI->isBaseVirtual());
+SVal InitVal = State->getSVal(Init, stackFrame);
+evalBind(Tmp, Init, Pred, BaseLoc, InitVal, true);

steakhal wrote:

Well, me neither but I think it should be fine with the default.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

Looks great!

FYI when submitting patches to GH try not to force-push to help the UI for 
following lines having comments. Otherwise, they will be marked as "outdated" 
and become hard to dig up and relate to new line locations. This was easy this 
time, as the size of the PR was minimal (as it should be).

After the PR is approved and all conversations resolved; the author can either 
manually force-push to squash the commits or use the "squash-merge" to let GH 
do it.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in CXXInstanceCall::getCXXThisVal (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/70837

>From a7f64815f4986fad597b9cb2d1acce2de9ac20bf Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Mon, 23 Oct 2023 18:10:29 +0200
Subject: [PATCH 1/2] [analyzer] Fix assertion failure in
 CXXInstanceCall::getCXXThisVal

Workaround the case when the `this` pointer is actually a `NonLoc`, by
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to
how casts work, I feel this is our easiest and best option.

I've considered using `SVB.evalCast(ThisVal, Base->getType(), QualType())`,
but it doesn't work as `EvalCastVisitor::VisitNonLocSymbolVal()` only
evaluates casts that happen from NonLoc to NonLocs.

When I tried to actually implement that case, I figured:
1) Create a SymbolicRegion from that nonloc::SymbolVal; but SymbolRegion
   ctor expects a pointer type for the symbol.
2) Okay, just have a SymbolCast, getting us the pointer type; but
   SymbolRegion expects SymbolData symbols, not generic SymExprs, as stated:

> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.

3) We can't use ElementRegions to perform this cast because to have an
   ElementRegion, you already have to have a SubRegion that you want to
   cast, but the point is that we don't have that.

At this point, I gave up, and just returned `Unknown` xD
IMO this is still better than having a crash.

Fixes #69922
---
 clang/lib/StaticAnalyzer/Core/CallEvent.cpp |  5 ++---
 clang/test/Analysis/builtin_bitcast.cpp | 21 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp 
b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
index ad5bb66c4fff3c8..20bc64dc2631cec 100644
--- a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
@@ -715,10 +715,9 @@ void CXXInstanceCall::getExtraInvalidatedValues(
 SVal CXXInstanceCall::getCXXThisVal() const {
   const Expr *Base = getCXXThisExpr();
   // FIXME: This doesn't handle an overloaded ->* operator.
-  if (!Base)
+  SVal ThisVal = Base ? getSVal(Base) : UnknownVal();
+  if (isa(ThisVal))
 return UnknownVal();
-
-  SVal ThisVal = getSVal(Base);
   assert(ThisVal.isUnknownOrUndef() || isa(ThisVal));
   return ThisVal;
 }
diff --git a/clang/test/Analysis/builtin_bitcast.cpp 
b/clang/test/Analysis/builtin_bitcast.cpp
index 396e7caa45f6acd..13475694d287a22 100644
--- a/clang/test/Analysis/builtin_bitcast.cpp
+++ b/clang/test/Analysis/builtin_bitcast.cpp
@@ -30,3 +30,24 @@ void test(int i) {
   clang_analyzer_dump(g4);
   // expected-warning@-1 {{&i [as 64 bit integer]}}
 }
+
+struct A {
+  int n;
+  void set(int x) {
+n = x;
+  }
+};
+using ptr_size = decltype(sizeof(void *));
+void gh_69922(ptr_size p) {
+  // expected-warning-re@+1 {{(reg_${{[0-9]+}}) & 1U}}
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1));
+
+  __builtin_bit_cast(A*, p & 1)->set(2); // no-crash
+  // However, since the `this` pointer is expected to be a Loc, but we have
+  // NonLoc there, we simply give up and resolve it as `Unknown`.
+  // Then, inside the `set()` member function call we can't evaluate the
+  // store to the member variable `n`.
+
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1)->n); // Ideally, this 
should print "2".
+  // expected-warning-re@-1 {{(reg_${{[0-9]+}}) & 1U}}
+}

>From 27d2b84e14d73dcc2385f202a204993f25235de4 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Tue, 31 Oct 2023 19:20:16 +0100
Subject: [PATCH 2/2] Add fixme and add the ineffective evalCast

---
 clang/lib/StaticAnalyzer/Core/CallEvent.cpp   | 9 +++--
 clang/lib/StaticAnalyzer/Core/SValBuilder.cpp | 5 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp 
b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
index 20bc64dc2631cec..76fb7481f65194b 100644
--- a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
@@ -716,8 +716,13 @@ SVal CXXInstanceCall::getCXXThisVal() const {
   const Expr *Base = getCXXThisExpr();
   // FIXME: This doesn't handle an overloaded ->* operator.
   SVal ThisVal = Base ? getSVal(Base) : UnknownVal();
-  if (isa(ThisVal))
-return UnknownVal();
+
+  if (isa(ThisVal)) {
+SValBuilder &SVB = getState()->getStateManager().getSValBuilder();
+QualType OriginalTy = ThisVal.getType(SVB.getContext());
+return SVB.evalCast(ThisVal, Base->getType(), OriginalTy);
+  }
+
   assert(ThisVal.isUnknownOrUndef() || isa(ThisVal));
   return ThisVal;
 }
diff --git a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp 
b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
index d89d82626f72694..9375f39b2d71dd4 100644
--- a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -980,6 +980,11 @@ class EvalCastVisitor : public 
SValVisitor {
   re

[clang] [analyzer] Fix assertion failure in CXXInstanceCall::getCXXThisVal (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

steakhal wrote:

> Hmm, I wonder if we should leave a FIXME comment, but it looks good to me.

I was thinking where to put the FIXME, and as I explored that should be within 
the CastVisitor. After that, I argued, that then I should still have the 
(ineffective) `SVB.evalCast()` to actually exercise that path.
It should be a fancy way of returning `Unknown` now, but arguably for the 
better reason xD

@Xazax-hun WDYT?

https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in CXXInstanceCall::getCXXThisVal (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-10-31 Thread Balazs Benics via cfe-commits

steakhal wrote:

> > WDYT?
> 
> I like this! I hope we do not add too much redundant work, but at least we 
> make it clear what is the plan to fix this in the future.

Please approve the PR again, so that I could merge this after I give some time 
for others to look at this.

https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Restrict 'fopen' & 'tmpfile' modeling to POSIX versions in StreamChecker (PR #70540)

2023-10-31 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.


https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer]][NFC] Simplify method 'ensureStreamNonNull' of StreamChecker (PR #70927)

2023-11-01 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

You are right.
I'm not sure it improves the code too much, and I wonder if you have further 
ideas refactoring the checker; if so we could probably bundle up similar 
changes into this one.

https://github.com/llvm/llvm-project/pull/70927
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Restrict 'fopen' & 'tmpfile' modeling to POSIX versions in StreamChecker (PR #70540)

2023-11-02 Thread Balazs Benics via cfe-commits


@@ -0,0 +1,16 @@
+// RUN: %clang_analyze_cc1 -fno-builtin 
-analyzer-checker=core,alpha.unix.Stream -verify %s
+// expected-no-diagnostics
+
+typedef struct _FILE FILE;
+
+// These functions are not standard C library functions.
+FILE *tmpfile(const char *restrict path); // Real 'tmpfile' should have 
exactly 0 formal parameters.
+FILE *fopen(const char *restrict path);   // Real 'fopen' should have exactly 
2 formal parameters.
+
+void test_fopen_non_posix(void) {
+  FILE *fp = fopen("file"); // no-leak: this isn't the standard POSIX fopen, 
we don't the semantics of this call.
+}
+
+void test_tmpfile_non_posix(void) {
+  FILE *fp = tmpfile("file"); // no-leak: this isn't the standard POSIX 
tmpfile, we don't the semantics of this call.

steakhal wrote:

I think the verb was missing. Probably the "we don't [know] the semantics..."

https://github.com/llvm/llvm-project/pull/70540
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-02 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/71039

The goal of this patch is to refine how the `SVal` base and sub-kinds are 
represented by forming one unified enum describing the possible SVals. This 
means that the `unsigned SVal::Kind` and the attached bit-packing semantics 
would be replaced by a single unified enum. This is more conventional and leads 
to a better debugging experience by default. This eases the need of using debug 
pretty-printers, or the use of runtime functions doing the printing for us like 
we do today by calling `Val.dump()` whenever we inspect the values.

Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated the 
following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`. The rest 
of the upper bits represented the sub-kind, where the value represented the 
index among only the `Loc`s or `NonLoc`s, effectively attaching 2 meanings of 
the upper bits depending on the base-kind. We don't need to pack these bits, as 
we have plenty even if we would use just a plan-old `unsigned char`.

Consequently, in this patch, I propose to lay out all the (non-abstract) `SVal` 
kinds into a single enum, along with some metadata (`BEGIN_Loc`, `END_Loc`, 
`BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar how we do with 
the `MemRegions`.

Note that in the unified `SVal::Kind` enum, to differentiate 
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with `Loc` 
and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the `nonloc::Kind` 
enum items with `inline constexpr` global constants to mimic the original 
behavior - and offer nicer spelling to these enum values.

Some `SVal` constructors were not marked explicit, which I now mark as such to 
follow best practices, and marked others as `/*implicit*/` to clarify the 
intent.
During refactoring, I also found at least one function not marked 
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.

The `TypeRetrievingVisitor` visitor had some accidental dead code, namely: 
`VisitNonLocConcreteInt` and `VisitLocConcreteInt`.

Previously, the `SValVisitor` expected visit handlers of 
`VisitNonLocX(nonloc::X)` and `VisitLocX(loc::X)`, where I felt 
that envoding `NonLoc` and `Loc` in the name is not necessary as the type of 
the parameter would select the right overload anyways, so I simplified the 
naming of those visit functions.

The rest of the diff is a lot of times just formatting, because `getKind()` by 
nature, frequently appears in switches, which means that the whole switch gets 
automatically reformatted. I could probably undo the formatting, but I didn't 
want to deviate from the rule unless explicitly requested.

>From 3bc43ab005aa76a43644d4d93286215b490cc8fa Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 2 Nov 2023 10:21:03 +0100
Subject: [PATCH] [analyzer][NFC] Rework SVal kind representation

The goal of this patch is to refine how the `SVal` base and sub-kinds
are represented by forming one unified enum describing the possible SVals.
This means that the `unsigned SVal::Kind` and the attached bit-packing
semantics would be replaced by a single unified enum.
This is more conventional and leads to a better debugging experience by default.
This eases the need of using debug pretty-printers, or the use of
runtime functions doing the printing for us like we do today by calling
`Val.dump()` whenever we inspect the values.

Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated
the following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`.
The rest of the upper bits represented the sub-kind, where the value
represented the index among only the `Loc`s or `NonLoc`s, effectively
attaching 2 meanings of the upper bits depending on the base-kind.
We don't need to pack these bits, as we have plenty even if we would use
just a plan-old `unsigned char`.

Consequently, in this patch, I propose to lay out all the (non-abstract)
`SVal` kinds into a single enum, along with some metadata (`BEGIN_Loc`,
`END_Loc`, `BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar
how we do with the `MemRegions`.

Note that in the unified `SVal::Kind` enum, to differentiate
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with
`Loc` and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the
`nonloc::Kind` enum items with `inline constexpr` global constants to
mimic the original behavior - and offer nicer spelling to these enum
values.

Some `SVal` constructors were not marked explicit, which I now mark as
such to follow best practices, and marked others as `/*implicit*/` to
clarify the intent.
During refactoring, I also found at least one function not marked
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.

The `TypeRetrievingVisitor` visitor had some accidental dead code,
namely: `VisitNonLocConcreteInt` and `VisitLocCon

[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-02 Thread Balazs Benics via cfe-commits

steakhal wrote:

This PR relates to #69835 
([comment](https://github.com/llvm/llvm-project/issues/69835#issuecomment-1775533393)).

https://github.com/llvm/llvm-project/pull/71039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-02 Thread Balazs Benics via cfe-commits

steakhal wrote:

> But I have to point out that this patch doesn't address the fact that `const 
> void* Data` is not friendly to debuggers, especially with type information 
> encoded in another member. So even with this patch applied, someone would 
> still have to write (and maintain) a custom formatter on debugger side to 
> display `Data` correctly.

That's correct. I'll think about how to resolve that as well.

> If you refactor `const void* Data` to be a `llvm::PointerUnion`, then it will 
> be picked up automatically (`PointerUnion` is a popular type, so I've already 
> written a formatter for it.) Together with removing `BaseBits` from `Kind` 
> and making the latter non-packed, `SVal` will have a debugger-friendly layout.

I've considered this but I found the number of alternatives too large to make 
it feasible. Consider that we have 11 possible SValKinds, which would require 4 
bits to encode. Requiring all `Data` pointers to be aligned as such seems rough 
- although not impossible.

> That said, debugger-friendliness is probably not among top priorities, so I'm 
> not going to block this patch, if this is the direction you and reviewers 
> want to take.

I'd say it's one baby step in that direction, but not the end of the journey. 
How about looking at this like that?


https://github.com/llvm/llvm-project/pull/71039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-11-02 Thread Balazs Benics via cfe-commits

steakhal wrote:

I wanted to highlight that this PR resolved a bunch of open issues, namely: 
#61919, #59493, #54533
Thank you!

So I think we should mention this in the release notes for clang-18 in some 
way. I'll keep this in mind.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-11-02 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -0,0 +1,149 @@
+// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-output=text\
+// RUN: 
-analyzer-checker=core,alpha.security.ArrayBoundV2,unix.Malloc,alpha.security.taint
 -verify %s
+
+int array[10];
+
+void arrayUnderflow(void) {
+  array[-3] = 5;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
+}
+
+int scanf(const char *restrict fmt, ...);
+
+void taintedIndex(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with 
tainted offset}}
+  // expected-note@-2 {{Access of 'array' with a tainted offset that may be 
too large}}
+}
+
+void arrayOverflow(void) {
+  array[12] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 
'int' elements}}
+}
+
+int scalar;
+int scalarOverflow(void) {
+  return (&scalar)[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'scalar'}}
+  // expected-note@-2 {{Access of 'scalar' at index 1, while it holds only a 
single 'int' element}}
+}
+
+int oneElementArray[1];
+int oneElementArrayOverflow(void) {
+  return oneElementArray[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'oneElementArray'}}
+  // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds 
only a single 'int' element}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 
'short' elements}}
+}
+
+struct vec {
+  int len;
+  double elems[64];
+} v;
+
+double arrayInStruct(void) {
+  return v.elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 
'v.elems'}}
+  // expected-note@-2 {{Access of 'v.elems' at index 64, while it holds only 
64 'double' elements}}
+}
+
+double arrayInStructPtr(struct vec *pv) {
+  return pv->elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
field 'elems'}}
+  // expected-note@-2 {{Access of the field 'elems' at index 64, while it 
holds only 64 'double' elements}}
+}
+
+struct two_bytes {
+  char lo, hi;
+};
+
+struct two_bytes convertedArray2(void) {
+  // We report this with byte offsets because the offset is not divisible by 
the element size.
+  struct two_bytes a = {0, 0};
+  char *p = (char*)&a;
+  return *((struct two_bytes*)(p + 7));
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'a'}}
+  // expected-note@-2 {{Access of 'a' at byte offset 7, while it holds only 2 
bytes}}
+}
+
+int intFromString(void) {
+  // We report this with byte offsets because the extent is not divisible by 
the element size.
+  return ((const int*)"this is a string of 33 characters")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
string literal}}
+  // expected-note@-2 {{Access of the string literal at byte offset 80, while 
it holds only 34 bytes}}
+}
+
+int intFromStringDivisible(void) {
+  // However, this is reported with indices/elements, because the extent 
happens to be a multiple of 4.
+  return ((const int*)"abc")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the 
string literal}}
+  // expected-note@-2 {{Access of the string literal at index 20, while it 
holds only a single 'int' element}}
+}

steakhal wrote:

I have no strong opinion.
Thanks for the explanation.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Improve diagnostics from ArrayBoundCheckerV2 (PR #70056)

2023-11-02 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -217,80 +326,71 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, 
bool isLoad,
 // MallocChecker that call SValBuilder::getConjuredHeapSymbolVal()) and
 // non-symbolic regions (e.g. a field subregion of a symbolic region) in
 // unknown space.
-auto [state_precedesLowerBound, state_withinLowerBound] =
-compareValueToThreshold(state, ByteOffset,
-svalBuilder.makeZeroArrayIndex(), svalBuilder);
+auto [PrecedesLowerBound, WithinLowerBound] = compareValueToThreshold(
+State, ByteOffset, SVB.makeZeroArrayIndex(), SVB);
 
-if (state_precedesLowerBound && !state_withinLowerBound) {
+if (PrecedesLowerBound && !WithinLowerBound) {
   // We know that the index definitely precedes the lower bound.
-  reportOOB(checkerContext, state_precedesLowerBound, OOB_Precedes);
+  std::string RegName = getRegionName(Reg);
+  std::string Msg = getPrecedesMsg(RegName, ByteOffset);
+  reportOOB(C, PrecedesLowerBound, OOB_Precedes, ByteOffset, RegName, Msg);

steakhal wrote:

To me, `move` describes the intent: I produced something, and this is the only 
place that is supposed to consume it. Keep in mind that this code is cold, thus 
we can do whatever we want, including making unnecessary copies.
BTW I didn't see that the result of the first is actually used for the second 
call. This way this code makes all sense.

https://github.com/llvm/llvm-project/pull/70056
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-11-02 Thread Balazs Benics via cfe-commits

steakhal wrote:

> As #59493 is an array, which is different from the test case I provided and 
> the ones in #61919 and #54533, although this pr can correctly handle the 
> array case, do I still need to add the array one to the test case?

It would be really nice if you could. Thanks!



https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix uninitialized base class with initializer list when ctor is not declared in the base class (#70464) (PR #70792)

2023-11-02 Thread Balazs Benics via cfe-commits

steakhal wrote:

> Do I need to
> 
> * create a new PR?
> 
> * push directly to this PR on the original branch `Snape3058:issue-70464`?
> 
> * commit directly without revision?
> 
> 
> Which operation is correct?
> 
> (Sorry for not familiar with GitHub) -(

I'd prefer a new PR, and mentioning the original PR (the fix) and the issue for 
which it adds the test.

https://github.com/llvm/llvm-project/pull/70792
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Add a test case to PR-70792 for Issue-59493 (PR #71073)

2023-11-02 Thread Balazs Benics via cfe-commits


@@ -66,3 +66,23 @@ struct Derived : Base {
 void entry() { Derived test; }
 
 } // namespace delegate_ctor_call
+
+// Additional test case from issue #59493
+namespace init_list_array {
+
+struct Base {
+  int foox[1];

steakhal wrote:

I'd suggest a more realistic array size for the test and also test some of the 
subsequent elements, including zero initialized elements.

https://github.com/llvm/llvm-project/pull/71073
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Add a test case to PR-70792 for Issue-59493 (PR #71073)

2023-11-02 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/71073
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Add a test case to PR-70792 for Issue-59493 (PR #71073)

2023-11-02 Thread Balazs Benics via cfe-commits

https://github.com/steakhal requested changes to this pull request.


https://github.com/llvm/llvm-project/pull/71073
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [clang][analyzer] Add a test case to PR-70792 for Issue-59493 (PR #71073)

2023-11-02 Thread Balazs Benics via cfe-commits

https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/71073
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Add a test case to PR-70792 for Issue-59493 (PR #71073)

2023-11-02 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

Thanks

https://github.com/llvm/llvm-project/pull/71073
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-03 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/71039

>From 3bc43ab005aa76a43644d4d93286215b490cc8fa Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Thu, 2 Nov 2023 10:21:03 +0100
Subject: [PATCH 1/2] [analyzer][NFC] Rework SVal kind representation

The goal of this patch is to refine how the `SVal` base and sub-kinds
are represented by forming one unified enum describing the possible SVals.
This means that the `unsigned SVal::Kind` and the attached bit-packing
semantics would be replaced by a single unified enum.
This is more conventional and leads to a better debugging experience by default.
This eases the need of using debug pretty-printers, or the use of
runtime functions doing the printing for us like we do today by calling
`Val.dump()` whenever we inspect the values.

Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated
the following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`.
The rest of the upper bits represented the sub-kind, where the value
represented the index among only the `Loc`s or `NonLoc`s, effectively
attaching 2 meanings of the upper bits depending on the base-kind.
We don't need to pack these bits, as we have plenty even if we would use
just a plan-old `unsigned char`.

Consequently, in this patch, I propose to lay out all the (non-abstract)
`SVal` kinds into a single enum, along with some metadata (`BEGIN_Loc`,
`END_Loc`, `BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar
how we do with the `MemRegions`.

Note that in the unified `SVal::Kind` enum, to differentiate
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with
`Loc` and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the
`nonloc::Kind` enum items with `inline constexpr` global constants to
mimic the original behavior - and offer nicer spelling to these enum
values.

Some `SVal` constructors were not marked explicit, which I now mark as
such to follow best practices, and marked others as `/*implicit*/` to
clarify the intent.
During refactoring, I also found at least one function not marked
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.

The `TypeRetrievingVisitor` visitor had some accidental dead code,
namely: `VisitNonLocConcreteInt` and `VisitLocConcreteInt`.

Previously, the `SValVisitor` expected visit handlers of
`VisitNonLocX(nonloc::X)` and `VisitLocX(loc::X)`, where
I felt that envoding `NonLoc` and `Loc` in the name is not necessary as
the type of the parameter would select the right overload anyways, so I
simplified the naming of those visit functions.

The rest of the diff is a lot of times just formatting, because
`getKind()` by nature, frequently appears in switches, which means that
the whole switch gets automatically reformatted. I could probably undo
the formatting, but I didn't want to deviate from the rule unless
explicitly requested.
---
 .../StaticAnalyzer/Checkers/SValExplainer.h   |  10 +-
 .../Core/PathSensitive/SValVisitor.h  |  57 ++---
 .../Core/PathSensitive/SVals.def  |  38 ++-
 .../StaticAnalyzer/Core/PathSensitive/SVals.h | 225 ++
 clang/lib/StaticAnalyzer/Core/SValBuilder.cpp |  28 +--
 clang/lib/StaticAnalyzer/Core/SVals.cpp   |  87 ---
 .../Core/SimpleConstraintManager.cpp  |   4 +-
 .../StaticAnalyzer/Core/SimpleSValBuilder.cpp |  73 +++---
 clang/lib/StaticAnalyzer/Core/Store.cpp   |   2 +-
 9 files changed, 220 insertions(+), 304 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h 
b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
index 3ae28c1dba3eb5a..43a70f596a4da28 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
+++ b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
@@ -65,7 +65,7 @@ class SValExplainer : public FullSValVisitor {
 return "undefined value";
   }
 
-  std::string VisitLocMemRegionVal(loc::MemRegionVal V) {
+  std::string VisitMemRegionVal(loc::MemRegionVal V) {
 const MemRegion *R = V.getRegion();
 // Avoid the weird "pointer to pointee of ...".
 if (auto SR = dyn_cast(R)) {
@@ -76,7 +76,7 @@ class SValExplainer : public FullSValVisitor {
 return "pointer to " + Visit(R);
   }
 
-  std::string VisitLocConcreteInt(loc::ConcreteInt V) {
+  std::string VisitConcreteInt(loc::ConcreteInt V) {
 const llvm::APSInt &I = V.getValue();
 std::string Str;
 llvm::raw_string_ostream OS(Str);
@@ -84,11 +84,11 @@ class SValExplainer : public FullSValVisitor {
 return Str;
   }
 
-  std::string VisitNonLocSymbolVal(nonloc::SymbolVal V) {
+  std::string VisitSymbolVal(nonloc::SymbolVal V) {
 return Visit(V.getSymbol());
   }
 
-  std::string VisitNonLocConcreteInt(nonloc::ConcreteInt V) {
+  std::string VisitConcreteInt(nonloc::ConcreteInt V) {
 const llvm::APSInt &I = V.getValue();
 std::string Str;
 llvm::raw_string_ostream OS(Str);
@@ -97,7

[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-03 Thread Balazs Benics via cfe-commits

steakhal wrote:

> But I have to point out that this patch doesn't address the fact that `const 
> void* Data` is not friendly to debuggers, especially with type information 
> encoded in another member. So even with this patch applied, someone would 
> still have to write (and maintain) a custom formatter on debugger side to 
> display `Data` correctly.
> 
> If you refactor `const void* Data` to be a `llvm::PointerUnion`, then it will 
> be picked up automatically (`PointerUnion` is a popular type, so I've already 
> written a formatter for it.) Together with removing `BaseBits` from `Kind` 
> and making the latter non-packed, `SVal` will have a debugger-friendly layout.

What other ways are to make a `const void *` debugger-friendly - other than 
`llvm::PointerUnion`?
I've considered using plain-old unions, but I found that approach pretty ugly, 
and I'm not even sure if it would help debugging experience.

https://github.com/llvm/llvm-project/pull/71039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-11-04 Thread Balazs Benics via cfe-commits


@@ -30,3 +30,24 @@ void test(int i) {
   clang_analyzer_dump(g4);
   // expected-warning@-1 {{&i [as 64 bit integer]}}
 }
+
+struct A {
+  int n;
+  void set(int x) {
+n = x;
+  }
+};
+using ptr_size = decltype(sizeof(void *));
+void gh_69922(ptr_size p) {
+  // expected-warning-re@+1 {{(reg_${{[0-9]+}}) & 1U}}
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1));

steakhal wrote:

Yes, that is to form a `SymExpr`, which is a `SymbolVal` (NonLoc) wrapping a 
`BO_And` `SymIntExpr` I think.

https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-11-04 Thread Balazs Benics via cfe-commits


@@ -30,3 +30,24 @@ void test(int i) {
   clang_analyzer_dump(g4);
   // expected-warning@-1 {{&i [as 64 bit integer]}}
 }
+
+struct A {
+  int n;
+  void set(int x) {
+n = x;
+  }
+};
+using ptr_size = decltype(sizeof(void *));

steakhal wrote:

Fixed. Now using `size_t`.

https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/70837

>From 2de19fc8e14319674ce87c18771ba1b8ba22f79b Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Mon, 23 Oct 2023 18:10:29 +0200
Subject: [PATCH 1/3] [analyzer] Fix assertion failure in
 CXXInstanceCall::getCXXThisVal

Workaround the case when the `this` pointer is actually a `NonLoc`, by
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to
how casts work, I feel this is our easiest and best option.

I've considered using `SVB.evalCast(ThisVal, Base->getType(), QualType())`,
but it doesn't work as `EvalCastVisitor::VisitNonLocSymbolVal()` only
evaluates casts that happen from NonLoc to NonLocs.

When I tried to actually implement that case, I figured:
1) Create a SymbolicRegion from that nonloc::SymbolVal; but SymbolRegion
   ctor expects a pointer type for the symbol.
2) Okay, just have a SymbolCast, getting us the pointer type; but
   SymbolRegion expects SymbolData symbols, not generic SymExprs, as stated:

> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.

3) We can't use ElementRegions to perform this cast because to have an
   ElementRegion, you already have to have a SubRegion that you want to
   cast, but the point is that we don't have that.

At this point, I gave up, and just returned `Unknown` xD
IMO this is still better than having a crash.

Fixes #69922
---
 clang/lib/StaticAnalyzer/Core/CallEvent.cpp |  5 ++---
 clang/test/Analysis/builtin_bitcast.cpp | 21 +
 2 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp 
b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
index ad5bb66c4fff3c8..20bc64dc2631cec 100644
--- a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
@@ -715,10 +715,9 @@ void CXXInstanceCall::getExtraInvalidatedValues(
 SVal CXXInstanceCall::getCXXThisVal() const {
   const Expr *Base = getCXXThisExpr();
   // FIXME: This doesn't handle an overloaded ->* operator.
-  if (!Base)
+  SVal ThisVal = Base ? getSVal(Base) : UnknownVal();
+  if (isa(ThisVal))
 return UnknownVal();
-
-  SVal ThisVal = getSVal(Base);
   assert(ThisVal.isUnknownOrUndef() || isa(ThisVal));
   return ThisVal;
 }
diff --git a/clang/test/Analysis/builtin_bitcast.cpp 
b/clang/test/Analysis/builtin_bitcast.cpp
index 396e7caa45f6acd..13475694d287a22 100644
--- a/clang/test/Analysis/builtin_bitcast.cpp
+++ b/clang/test/Analysis/builtin_bitcast.cpp
@@ -30,3 +30,24 @@ void test(int i) {
   clang_analyzer_dump(g4);
   // expected-warning@-1 {{&i [as 64 bit integer]}}
 }
+
+struct A {
+  int n;
+  void set(int x) {
+n = x;
+  }
+};
+using ptr_size = decltype(sizeof(void *));
+void gh_69922(ptr_size p) {
+  // expected-warning-re@+1 {{(reg_${{[0-9]+}}) & 1U}}
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1));
+
+  __builtin_bit_cast(A*, p & 1)->set(2); // no-crash
+  // However, since the `this` pointer is expected to be a Loc, but we have
+  // NonLoc there, we simply give up and resolve it as `Unknown`.
+  // Then, inside the `set()` member function call we can't evaluate the
+  // store to the member variable `n`.
+
+  clang_analyzer_dump(__builtin_bit_cast(A*, p & 1)->n); // Ideally, this 
should print "2".
+  // expected-warning-re@-1 {{(reg_${{[0-9]+}}) & 1U}}
+}

>From bcb048c09dcc7bb2728d46afc0ff9a09cf71f663 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Tue, 31 Oct 2023 19:20:16 +0100
Subject: [PATCH 2/3] Add fixme and add the ineffective evalCast

---
 clang/lib/StaticAnalyzer/Core/CallEvent.cpp   | 9 +++--
 clang/lib/StaticAnalyzer/Core/SValBuilder.cpp | 5 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp 
b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
index 20bc64dc2631cec..76fb7481f65194b 100644
--- a/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
+++ b/clang/lib/StaticAnalyzer/Core/CallEvent.cpp
@@ -716,8 +716,13 @@ SVal CXXInstanceCall::getCXXThisVal() const {
   const Expr *Base = getCXXThisExpr();
   // FIXME: This doesn't handle an overloaded ->* operator.
   SVal ThisVal = Base ? getSVal(Base) : UnknownVal();
-  if (isa(ThisVal))
-return UnknownVal();
+
+  if (isa(ThisVal)) {
+SValBuilder &SVB = getState()->getStateManager().getSValBuilder();
+QualType OriginalTy = ThisVal.getType(SVB.getContext());
+return SVB.evalCast(ThisVal, Base->getType(), OriginalTy);
+  }
+
   assert(ThisVal.isUnknownOrUndef() || isa(ThisVal));
   return ThisVal;
 }
diff --git a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp 
b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
index d89d82626f72694..9375f39b2d71dd4 100644
--- a/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
+++ b/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp
@@ -980,6 +980,11 @@ class EvalCastVisitor : public 
SValVisitor {
   re

[clang] [analyzer] Fix assertion failure in `CXXInstanceCall::getCXXThisVal` (PR #70837)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal closed 
https://github.com/llvm/llvm-project/pull/70837
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/71284

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`, 
then check the eqclasses of X and Y respectively and for `X' RELOP Y'` 
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives, then 
intersecting all these RangeSets should never be empty - aka. there should be a 
value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by 
`gh_62215_contradicting_nested_right_equivalent`, where we would need to apply 
the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become 
unreachable after this. Line 127 will remain still reachable, but keep in mind 
that when cross-checking with Z3 (aka. Z3 refutation), then all 4 reports would 
be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215

>From 92ece501b340c3a2a52b5a4614ddb70bb3e35c93 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Sat, 4 Nov 2023 13:44:28 +0100
Subject: [PATCH] [analyzer][solver] On SymSym RelOps, check EQClass members
 for contradictions

The idea is that if we see a `X RELOP Y` being constrained to a RangeSet `S`,
then check the eqclasses of X and Y respectively and for `X' RELOP Y'`
SymSymExprs and try to infer their ranges.
If there is no contradiction with any of the equivalent alternatives,
then intersecting all these RangeSets should never be empty - aka. there
should be a value satisfying the constraints we have.

It costs around `|eqclass(X)| + |eqclass(y)|`.

The approach has its limitations, as demonstrated by
`gh_62215_contradicting_nested_right_equivalent`, where we would need to
apply the same logic, but on a sub-expression of a direct operand.

Before the patch, line 90, 100, and 112 would be reachable; and become
unreachable after this. Line 127 will remain still reachable, but keep
in mind that when cross-checking with Z3 (aka. Z3 refutation), then all
4 reports would be eliminated.

The idea comes from
https://github.com/llvm/llvm-project/issues/62215#issuecomment-1792878474

Fixes #62215
---
 .../Core/RangeConstraintManager.cpp   | 53 +++
 clang/test/Analysis/constraint-assignor.c | 48 +
 2 files changed, 101 insertions(+)

diff --git a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp 
b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
index 5de99384449a4c8..d631369e895d3a9 100644
--- a/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
@@ -2067,6 +2067,12 @@ class ConstraintAssignor : public 
ConstraintAssignorBase {
 return Assignor.assign(CoS, NewConstraint);
   }
 
+  /// Check if using an equivalent operand alternative would lead to
+  /// contradiction.
+  /// If a contradiction is witnessed, returns false; otherwise returns true.
+  bool handleEquivalentAlternativeSymOperands(const SymSymExpr *SymSym,
+  RangeSet Constraint);
+
   /// Handle expressions like: a % b != 0.
   template 
   bool handleRemainderOp(const SymT *Sym, RangeSet Constraint) {
@@ -2218,11 +2224,58 @@ bool ConstraintAssignor::assignSymExprToConst(const 
SymExpr *Sym,
   return true;
 }
 
+bool ConstraintAssignor::handleEquivalentAlternativeSymOperands(
+const SymSymExpr *SymSym, RangeSet Constraint) {
+  SymbolRef LHS = SymSym->getLHS();
+  SymbolRef RHS = SymSym->getRHS();
+  EquivalenceClass LHSClass = EquivalenceClass::find(State, LHS);
+  EquivalenceClass RHSClass = EquivalenceClass::find(State, RHS);
+  SymbolSet SymbolsEqWithLHS = LHSClass.getClassMembers(State);
+  SymbolSet SymbolsEqWithRHS = RHSClass.getClassMembers(State);
+  llvm::SmallVector AlternativeSymSyms;
+
+  // Gather left alternatives.
+  for (SymbolRef AlternativeLHS : SymbolsEqWithLHS) {
+if (AlternativeLHS == LHS)
+  continue;
+AlternativeSymSyms.emplace_back(AlternativeLHS, SymSym->getOpcode(), RHS,
+SymSym->getType());
+  }
+
+  // Gather right alternatives.
+  for (SymbolRef AlternativeRHS : SymbolsEqWithRHS) {
+if (AlternativeRHS == RHS)
+  continue;
+AlternativeSymSyms.emplace_back(LHS, SymSym->getOpcode(), AlternativeRHS,
+SymSym->getType());
+  }
+
+  // Crosscheck the inferred ranges.
+  for (SymSymExpr AltSymSym : AlternativeSymSyms) {
+RangeSet AltSymSymConstrant =
+SymbolicRangeInferrer::inferRange(RangeFactory, State, &AltSymSym);
+Constraint = intersect(RangeFactory, Constraint, AltSymSymConstrant);
+
+// Check if we witnessed a contradiction with the equivalent alternative
+// operand.
+if (Constraint.isEmpty()) {
+  State = nullptr;
+  return false;
+}
+  }
+  r

[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal updated 
https://github.com/llvm/llvm-project/pull/71039

>From 8f16d3000a91df33d416dd09381175ddeb7e5ed3 Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Sat, 4 Nov 2023 15:25:42 +0100
Subject: [PATCH] [analyzer][NFC] Rework SVal kind representation

The goal of this patch is to refine how the `SVal` base and sub-kinds
are represented by forming one unified enum describing the possible SVals.
This means that the `unsigned SVal::Kind` and the attached bit-packing
semantics would be replaced by a single unified enum.
This is more conventional and leads to a better debugging experience by default.
This eases the need of using debug pretty-printers, or the use of
runtime functions doing the printing for us like we do today by calling
`Val.dump()` whenever we inspect the values.

Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated
the following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`.
The rest of the upper bits represented the sub-kind, where the value
represented the index among only the `Loc`s or `NonLoc`s, effectively
attaching 2 meanings of the upper bits depending on the base-kind.
We don't need to pack these bits, as we have plenty even if we would use
just a plan-old `unsigned char`.

Consequently, in this patch, I propose to lay out all the (non-abstract)
`SVal` kinds into a single enum, along with some metadata (`BEGIN_Loc`,
`END_Loc`, `BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar
how we do with the `MemRegions`.

Note that in the unified `SVal::Kind` enum, to differentiate
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with
`Loc` and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the
`nonloc::Kind` enum items with `inline constexpr` global constants to
mimic the original behavior - and offer nicer spelling to these enum
values.

Some `SVal` constructors were not marked explicit, which I now mark as
such to follow best practices, and marked others as `/*implicit*/` to
clarify the intent.
During refactoring, I also found at least one function not marked
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.

The `TypeRetrievingVisitor` visitor had some accidental dead code,
namely: `VisitNonLocConcreteInt` and `VisitLocConcreteInt`.

Previously, the `SValVisitor` expected visit handlers of
`VisitNonLocX(nonloc::X)` and `VisitLocX(loc::X)`, where
I felt that envoding `NonLoc` and `Loc` in the name is not necessary as
the type of the parameter would select the right overload anyways, so I
simplified the naming of those visit functions.

The rest of the diff is a lot of times just formatting, because
`getKind()` by nature, frequently appears in switches, which means that
the whole switch gets automatically reformatted. I could probably undo
the formatting, but I didn't want to deviate from the rule unless
explicitly requested.
---
 .../StaticAnalyzer/Checkers/SValExplainer.h   |  10 +-
 .../Core/PathSensitive/SValVisitor.h  |  55 ++---
 .../Core/PathSensitive/SVals.def  |  38 ++-
 .../StaticAnalyzer/Core/PathSensitive/SVals.h | 226 ++
 clang/lib/StaticAnalyzer/Core/SValBuilder.cpp |  28 +--
 clang/lib/StaticAnalyzer/Core/SVals.cpp   |  87 ---
 .../Core/SimpleConstraintManager.cpp  |   4 +-
 .../StaticAnalyzer/Core/SimpleSValBuilder.cpp |  73 +++---
 clang/lib/StaticAnalyzer/Core/Store.cpp   |   2 +-
 9 files changed, 219 insertions(+), 304 deletions(-)

diff --git a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h 
b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
index 3ae28c1dba3eb5a..43a70f596a4da28 100644
--- a/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
+++ b/clang/include/clang/StaticAnalyzer/Checkers/SValExplainer.h
@@ -65,7 +65,7 @@ class SValExplainer : public FullSValVisitor {
 return "undefined value";
   }
 
-  std::string VisitLocMemRegionVal(loc::MemRegionVal V) {
+  std::string VisitMemRegionVal(loc::MemRegionVal V) {
 const MemRegion *R = V.getRegion();
 // Avoid the weird "pointer to pointee of ...".
 if (auto SR = dyn_cast(R)) {
@@ -76,7 +76,7 @@ class SValExplainer : public FullSValVisitor {
 return "pointer to " + Visit(R);
   }
 
-  std::string VisitLocConcreteInt(loc::ConcreteInt V) {
+  std::string VisitConcreteInt(loc::ConcreteInt V) {
 const llvm::APSInt &I = V.getValue();
 std::string Str;
 llvm::raw_string_ostream OS(Str);
@@ -84,11 +84,11 @@ class SValExplainer : public FullSValVisitor {
 return Str;
   }
 
-  std::string VisitNonLocSymbolVal(nonloc::SymbolVal V) {
+  std::string VisitSymbolVal(nonloc::SymbolVal V) {
 return Visit(V.getSymbol());
   }
 
-  std::string VisitNonLocConcreteInt(nonloc::ConcreteInt V) {
+  std::string VisitConcreteInt(nonloc::ConcreteInt V) {
 const llvm::APSInt &I = V.getValue();
 std::string Str;
 llvm::raw_string_ostream OS(Str);
@@ -97,7 +97

[clang] [analyzer][NFC] Rework SVal kind representation (PR #71039)

2023-11-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal closed 
https://github.com/llvm/llvm-project/pull/71039
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-04 Thread Balazs Benics via cfe-commits

steakhal wrote:

For crossreference: I raised some related questions around having void casts 
artificially keeping constraints and symbols alive at discuss:
https://discourse.llvm.org/t/range-based-solver-and-eager-symbol-garbage-collection/74670

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer][solver] On SymSym RelOps, check EQClass members for contradictions (PR #71284)

2023-11-06 Thread Balazs Benics via cfe-commits

steakhal wrote:

> I think every time we need to iterate over all member of an equivalence 
> class, we might do something wrong. The point of the equivalence class would 
> be to make sure those elements are equivalent. One way to avoid iteration 
> would be to always use the representative of the equivalence class. E.g., 
> each time we record a new constraint to a member of the class, we add this 
> information to the representative element. Every time we do a query, we first 
> look up the representative element which already should have all the info 
> from the class and use it instead of the original symbol.
> 
> Would something like this work or am I missing something?

I had to think about it for a little while, and here are my thoughts:
In an example like here:
```c++
void gh_62215(int x, int y, int z) {
  if (x != y) return; // x == y
  if (z <= x) return; // z > x
  if (z >= y) return; // z < y
  clang_analyzer_warnIfReached(); // no-warning: This should be dead code.
  (void)(x + y + z); // keep the constraints alive.
}
```
Here, after the first if, we would have an eqclass of `{x,y}`; with a single 
constraint `x != y: [0,0]`.
After the second if, we would have notionally 3 eqclasses: `{x,y}`, and two 
trivial ones `x` and `y`.
We would also have 3 constraints: `x != y: [0,0]`, `z <= x: [0,0]`, `z <= y: 
[0,0]`.

Note that the keys in the constraints map (symbols) can be 'arbitrarily' 
complex and refer to already dead symbols.
In this example, I keep these symbols artificially alive to make the case 
slightly simpler to discuss.

So, if I understand you correctly, at the 3rd if statement, we should 
canonicalize the symbol we are constraining by walking every sub-symbol and 
substituting it with its equivalent counterpart (if any), by basically with its 
eqclass' representative symbol.
In this example, it would mean that instead of adding a constraint `z >= y: 
[0,0]`, we would instead `z >= x: [0,0]` (assuming that `x` is the 
representative symbol of eqclass `{x,y}`.
I believe this canonicalization could work (and would solve the other 
demonstrative limitation I added in this PR), but would also incur overhead.
And from a design point of view, an eqclass would need to keep the 
representative symbol alive, because otherwise, we can't substitute a symbol 
with the representative symbol.
(Note that for this reason, we don't actually have a representative symbol, but 
rather a unique integral value as an ID for the eqclass - which happens to be 
equal to the `SymbolRef` pointer value of the representative symbol if that's 
still alive.

Here is an example where the representative symbol `x` of the eqclass `{x,y}` 
would die, but remain the ID of the eqclass:
```c++
  // same example, but with:
  (void)(y + z); // X is not mentioned!, thus reclaimed after the 2nd if 
statement.
```

https://github.com/llvm/llvm-project/pull/71284
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Trust base to derived casts for dynamic types (PR #69057)

2023-12-13 Thread Balazs Benics via cfe-commits

steakhal wrote:

Thanks for the replies.
I'll come back to this PR once I have some time; maybe during the holidays.
Both assertions directly relate to this PR for sure.

> I looked into Tom's bug report and I hit the following assertion in a debug 
> build:
> 
> ```
> clang: 
> /srv/repos/llvm-project/clang/lib/StaticAnalyzer/Core/DynamicType.cpp:134: 
> clang::ento::ProgramStateRef 
> clang::ento::setDynamicTypeAndCastInfo(clang::ento::ProgramStateRef, const 
> clang::ento::MemRegion*, clang::QualType, clang::QualType, bool): Assertion 
> `(CastToTy->isAnyPointerType() || CastToTy->isReferenceType()) && 
> "DynamicTypeInfo should always be a pointer."' failed.
> ```
> [...]
> What I don't understand is why these are not reference types. @steakhal do 
> you know more here?
Its because prior we `unbox` them I believe, which here sort of means that we 
drop refs and pointers I think.




https://github.com/llvm/llvm-project/pull/69057
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-11-28 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


steakhal wrote:

I'm in favor of this change. I'll pull the patch downstream and report back how 
it performed.
Coming back to the `&array[size]` example, actually I believe that it's 
well-formed in C, but UB in C++, but I'm not a language lawyer. @shafik 
probably knows this better :sweat_smile: 
However, `(array + size)` should be well-formed in both C and C++.

https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Use AllocaRegion in MallocChecker (PR #72402)

2023-11-28 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.

Overall, I'm in favor of this change.
On the other hand, I'd urge for not to regress on the diagnostics.
To me, `alloca` is like a VLA; which is prone to misuses, thus the edge-cases 
count there (esp. if tainted).

https://github.com/llvm/llvm-project/pull/72402
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-11-28 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-11-28 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


steakhal wrote:

FYI I edited the PR summary so that I'm not tagged there directly because if 
someone is tagged in a commit message on GH, that person will be notified each 
time that commit is pushed in public. Consequently, the tagged person gets 
spammed by GH from all the public forks each time they rebase and push this 
commit :D
Such a great feature, right?

https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-11-28 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


steakhal wrote:

> @steakhal thanks for the checking and sorry for the unintentional spamming.
> 
> > Such a great feature, right?
> 
> Just wonderful 😄

To clarify, you did not spam me. I'm worried about merging a commit directly 
mentioning people. That would be the point when forks start to pick up the 
commit (and my name e.g.) and start spamming. I just wanted to raise awareness 
of this being a thing, and why I did that edit.

https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Let the checkers query upper and lower bounds on symbols (PR #74141)

2023-12-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal approved this pull request.

The patch makes sense to me.

https://github.com/llvm/llvm-project/pull/74141
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-12-04 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal edited 
https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-12-04 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 


https://github.com/steakhal approved this pull request.

Sorry for reporting back only after a week or so. The analysis was done in like 
a day, but then we had some credential issues that blocked me actually doing 
the diffing.
Now about the diffing: a lot of things changed because now we report earlier, 
so a different range gets highlighted. This caused me some problems, so I 
decided to not do anything fancy, but randomly pick a bunch of issues.
In short, the picked issues looked better; but effectively remained the same. 
This is good.

https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 (PR #72107)

2023-12-04 Thread Balazs Benics via cfe-commits
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy ,
=?utf-8?q?Donát?= Nagy 
Message-ID:
In-Reply-To: 



@@ -413,6 +464,19 @@ bool ArrayBoundCheckerV2::isFromCtypeMacro(const Stmt *S, 
ASTContext &ACtx) {
   (MacroName == "isupper") || (MacroName == "isxdigit"));
 }
 
+bool ArrayBoundCheckerV2::isInAddressOf(const Stmt *S, ASTContext &ACtx) {
+  ParentMapContext &ParentCtx = ACtx.getParentMapContext();
+  do {
+const DynTypedNodeList Parents = ParentCtx.getParents(*S);
+if (Parents.empty())
+  return false;
+S = Parents[0].get();
+  } while (isa_and_nonnull(S));
+  if (const auto *UnaryOp = dyn_cast_or_null(S))
+return UnaryOp->getOpcode() == UO_AddrOf;
+  return false;

steakhal wrote:

```suggestion
  const auto *UnaryOp = dyn_cast_or_null(S);
  return UnaryOp && UnaryOp->getOpcode() == UO_AddrOf;
```

https://github.com/llvm/llvm-project/pull/72107
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [analyzer] Fix "sprintf" parameter modeling in CStringChecker (PR #74345)

2023-12-04 Thread Balazs Benics via cfe-commits

https://github.com/steakhal created 
https://github.com/llvm/llvm-project/pull/74345

Review the commits one by one. I plan to merge them manually by pushing both of 
these at once.
This PR intends to fix #74269.

>From 1359a7ef528358cc7e10a751aa885c6bd8ac8d1c Mon Sep 17 00:00:00 2001
From: Balazs Benics 
Date: Mon, 4 Dec 2023 17:40:09 +0100
Subject: [PATCH 1/2] [analyzer][NFC] Prefer CallEvent over CallExpr in APIs

This change only uplifts existing APIs, without any semantic changes.
This is the continuation of 44820630dfa45bc47748a5abda7d4a9cb86da2c1.

Benefits of using CallEvents over CallExprs:
The callee decl is traced through function pointers if possible.
This will be important to fix #74269 in a follow-up patch.
---
 .../Checkers/CStringChecker.cpp   | 381 ++
 1 file changed, 203 insertions(+), 178 deletions(-)

diff --git a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp 
b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
index 31f5b03dcdeba..f5dbf9d82beee 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
@@ -121,7 +121,7 @@ class CStringChecker : public Checker< eval::Call,
const CallEvent *Call) const;
 
   using FnCheck = std::function;
+ const CallEvent &)>;
 
   CallDescriptionMap Callbacks = {
   {{CDF_MaybeBuiltin, {"memcpy"}, 3},
@@ -173,56 +173,53 @@ class CStringChecker : public Checker< eval::Call,
   StdCopyBackward{{"std", "copy_backward"}, 3};
 
   FnCheck identifyCall(const CallEvent &Call, CheckerContext &C) const;
-  void evalMemcpy(CheckerContext &C, const CallExpr *CE, CharKind CK) const;
-  void evalMempcpy(CheckerContext &C, const CallExpr *CE, CharKind CK) const;
-  void evalMemmove(CheckerContext &C, const CallExpr *CE, CharKind CK) const;
-  void evalBcopy(CheckerContext &C, const CallExpr *CE) const;
-  void evalCopyCommon(CheckerContext &C, const CallExpr *CE,
+  void evalMemcpy(CheckerContext &C, const CallEvent &Call, CharKind CK) const;
+  void evalMempcpy(CheckerContext &C, const CallEvent &Call, CharKind CK) 
const;
+  void evalMemmove(CheckerContext &C, const CallEvent &Call, CharKind CK) 
const;
+  void evalBcopy(CheckerContext &C, const CallEvent &Call) const;
+  void evalCopyCommon(CheckerContext &C, const CallEvent &Call,
   ProgramStateRef state, SizeArgExpr Size,
   DestinationArgExpr Dest, SourceArgExpr Source,
   bool Restricted, bool IsMempcpy, CharKind CK) const;
 
-  void evalMemcmp(CheckerContext &C, const CallExpr *CE, CharKind CK) const;
+  void evalMemcmp(CheckerContext &C, const CallEvent &Call, CharKind CK) const;
 
-  void evalstrLength(CheckerContext &C, const CallExpr *CE) const;
-  void evalstrnLength(CheckerContext &C, const CallExpr *CE) const;
-  void evalstrLengthCommon(CheckerContext &C,
-   const CallExpr *CE,
+  void evalstrLength(CheckerContext &C, const CallEvent &Call) const;
+  void evalstrnLength(CheckerContext &C, const CallEvent &Call) const;
+  void evalstrLengthCommon(CheckerContext &C, const CallEvent &Call,
bool IsStrnlen = false) const;
 
-  void evalStrcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcpyCommon(CheckerContext &C, const CallExpr *CE, bool ReturnEnd,
-bool IsBounded, ConcatFnKind appendK,
+  void evalStrcpy(CheckerContext &C, const CallEvent &Call) const;
+  void evalStrncpy(CheckerContext &C, const CallEvent &Call) const;
+  void evalStpcpy(CheckerContext &C, const CallEvent &Call) const;
+  void evalStrlcpy(CheckerContext &C, const CallEvent &Call) const;
+  void evalStrcpyCommon(CheckerContext &C, const CallEvent &Call,
+bool ReturnEnd, bool IsBounded, ConcatFnKind appendK,
 bool returnPtr = true) const;
 
-  void evalStrcat(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrncat(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrlcat(CheckerContext &C, const CallExpr *CE) const;
+  void evalStrcat(CheckerContext &C, const CallEvent &Call) const;
+  void evalStrncat(CheckerContext &C, const CallEvent &Call) const;
+  void evalStrlcat(CheckerContext &C, const CallEvent &Call) const;
 
-  void evalStrcmp(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrncmp(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcasecmp(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrncasecmp(CheckerContext &C, const CallExpr *CE) const;
-  void evalStrcmpCommon(CheckerContext &C,
-const CallExpr *CE,
-bool IsBounded = false,
-bool IgnoreCa

[clang] [analyzer][docs] Update the release notes for llvm-18 (PR #76446)

2023-12-27 Thread Balazs Benics via cfe-commits

https://github.com/steakhal milestoned 
https://github.com/llvm/llvm-project/pull/76446
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


  1   2   3   4   5   6   7   8   9   10   >