[clang] [Wunsafe-buffer-usage] False positives for & expression indexing constant size array (arr[anything & 0]) (PR #112284)

via cfe-commits Wed, 30 Oct 2024 11:06:48 -0700

================
@@ -420,6 +420,118 @@ AST_MATCHER(CXXConstructExpr, 
isSafeSpanTwoParamConstruct) {
   return false;
 }
 
+class MaxValueEval : public RecursiveASTVisitor<MaxValueEval> {
+
+  std::vector<llvm::APInt> val;
+  ASTContext &Context;
+  llvm::APInt Max;
+  unsigned bit_width;
+
+public:
+  typedef RecursiveASTVisitor<MaxValueEval> VisitorBase;
+
+  explicit MaxValueEval(ASTContext &Ctx, const Expr *exp) : Context(Ctx) {
+    bit_width = Ctx.getIntWidth(exp->getType());
+    Max = llvm::APInt::getSignedMaxValue(bit_width);
+    val.clear();
+  }
+
+  bool findMatch(Expr *exp) {
+    TraverseStmt(exp);
+    return true;
+  }
+
+  bool VisitDeclRefExpr(DeclRefExpr *dre) {
+    val.push_back(Max);
+    return false;
+  }
+
+  bool VisitArraySubscriptExpr(ArraySubscriptExpr *E) {
+    val.push_back(Max);
+    return false;
+  }
+
+  bool EvaluateExpression(Expr *exp) {
+    Expr::EvalResult EVResult;
+    if (exp->EvaluateAsInt(EVResult, Context)) {
+      llvm::APSInt Result = EVResult.Val.getInt();
+      val.push_back(Result);
+      return true;
+    }
+    return false;
+  }
+
+  bool VisitBinaryOperator(BinaryOperator *E) {
----------------
jkorous-apple wrote:


Let's first establish the scope of this work.

The current approach seems very ambitious (which is generally great!) but 
misses some of the tricky details.
For example the same arithmetic expression evaluated in APInt domain (Arbitrary 
Precision Integer) and e. g. `uint32_t` domain gives different results because 
of overflow/underflow, etc.

I don't think we need to be that ambitious here and now - if we leave some 
false positives for `-Wunsafe-buffer-usage` for `my_array[ /* some weird 
expression */ ]` that's fine and we can improve it later using the data from 
the adoption as a signal for prioritization.
I also assume correct implementation of the most ambitious version of this 
functionality would take way more effort and time than is reasonable to invest 
into this problem at this point.

The conclusion for me is - let's support only certain types of expressions for 
now and keep the implementation simpler.

Based on what we've seen in the adoption and my intuition I feel that this 
approximation might be sufficient for now (let's discuss what I've missed 
though!):
`arbitrary_unsigned_E & constant` - max is `constant`
`constant & arbitrary_unsigned_E` - max is `constant`
`arbitrary_unsigned_E % constant` - max is `constant`
`arbitrary_unsigned_E >> constant` - max is something like (this is just buggy 
pseudo-code) `pow( max(bitwidth(arbitrary_E) - constant, 0), 2 )`

For this we might also avoid using the recursive visitor and simplify the 
implementation to just a single function where we traverse the expression 
manually using `getLHS`, `getRHS`, etc.

https://github.com/llvm/llvm-project/pull/112284
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [Wunsafe-buffer-usage] False positives for & expression indexing constant size array (arr[anything & 0]) (PR #112284)

Reply via email to