Author: Arseniy Zaostrovnykh Date: 2025-03-17T08:23:31+01:00 New Revision: 57e36419b251f7e5a86566c86b4d61fbd605db5c
URL: https://github.com/llvm/llvm-project/commit/57e36419b251f7e5a86566c86b4d61fbd605db5c DIFF: https://github.com/llvm/llvm-project/commit/57e36419b251f7e5a86566c86b4d61fbd605db5c.diff LOG: [analyzer] Introduce per-entry-point statistics (#131175) So far CSA was relying on the LLVM Statistic package that allowed us to gather some data about analysis of an entire translation unit. However, the translation unit consists of a collection of loosely related entry points. Aggregating data across multiple such entry points is often counter productive. This change introduces a new lightweight always-on facility to collect Boolean or numerical statistics for each entry point and dump them in a CSV format. Such format makes it easy to aggregate data across multiple translation units and analyze it with common data-processing tools. We break down the existing statistics that were collected on the per-TU basis into values per entry point. Additionally, we enable the statistics unconditionally (STATISTIC -> ALWAYS_ENABLED_STATISTIC) to facilitate their use (you can gather the data with a simple run-time flag rather than having to recompile the analyzer). These statistics are very light and add virtually no overhead. Co-authored-by: Balazs Benics <benicsbal...@gmail.com> CPP-6160 Added: clang/docs/analyzer/developer-docs/Statistics.rst clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp clang/test/Analysis/analyzer-stats/entry-point-stats.cpp clang/test/Analysis/csv2json.py Modified: clang/docs/analyzer/developer-docs.rst clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp clang/lib/StaticAnalyzer/Core/BugReporter.cpp clang/lib/StaticAnalyzer/Core/CMakeLists.txt clang/lib/StaticAnalyzer/Core/CoreEngine.cpp clang/lib/StaticAnalyzer/Core/ExprEngine.cpp clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp clang/lib/StaticAnalyzer/Core/WorkList.cpp clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp clang/test/Analysis/analyzer-config.c clang/test/lit.cfg.py Removed: ################################################################################ diff --git a/clang/docs/analyzer/developer-docs.rst b/clang/docs/analyzer/developer-docs.rst index 60c0e71ad847c..a925cf7ca02e1 100644 --- a/clang/docs/analyzer/developer-docs.rst +++ b/clang/docs/analyzer/developer-docs.rst @@ -12,3 +12,4 @@ Contents: developer-docs/nullability developer-docs/RegionStore developer-docs/PerformanceInvestigation + developer-docs/Statistics diff --git a/clang/docs/analyzer/developer-docs/Statistics.rst b/clang/docs/analyzer/developer-docs/Statistics.rst new file mode 100644 index 0000000000000..595b44dd95753 --- /dev/null +++ b/clang/docs/analyzer/developer-docs/Statistics.rst @@ -0,0 +1,33 @@ +=================== +Analysis Statistics +=================== + +Clang Static Analyzer enjoys two facilities to collect statistics: per translation unit and per entry point. +We use `llvm/ADT/Statistic.h`_ for numbers describing the entire translation unit. +We use `clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h`_ to collect data for each symbolic-execution entry point. + +.. _llvm/ADT/Statistic.h: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/Statistic.h#L171 +.. _clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h: https://github.com/llvm/llvm-project/blob/main/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h + +In many cases, it makes sense to collect statistics on both translation-unit level and entry-point level. You can use the two macros defined in EntryPointStats.h for that: + +- ``STAT_COUNTER`` for additive statistics, for example, "the number of steps executed", "the number of functions inlined". +- ``STAT_MAX`` for maximizing statistics, for example, "the maximum environment size", or "the longest execution path". + +If you want to define a statistic that makes sense only for the entire translation unit, for example, "the number of entry points", Statistic.h defines two macros: ``STATISTIC`` and ``ALWAYS_ENABLED_STATISTIC``. +You should prefer ``ALWAYS_ENABLED_STATISTIC`` unless you have a good reason not to. +``STATISTIC`` is controlled by ``LLVM_ENABLE_STATS`` / ``LLVM_FORCE_ENABLE_STATS``. +However, note that with ``LLVM_ENABLE_STATS`` disabled, only storage of the values is disabled, the computations producing those values still carry on unless you took an explicit precaution to make them conditional too. + +If you want to define a statistic only for entry point, EntryPointStats.h has four classes at your disposal: + + +- ``BoolEPStat`` - a boolean value assigned at most once per entry point. For example: "has the inline limit been reached". +- ``UnsignedEPStat`` - an unsigned value assigned at most once per entry point. For example: "the number of source characters in an entry-point body". +- ``CounterEPStat`` - an additive statistic. It starts with 0 and you can add to it as many times as needed. For example: "the number of bugs discovered". +- ``UnsignedMaxEPStat`` - a maximizing statistic. It starts with 0 and when you join it with a value, it picks the maximum of the previous value and the new one. For example, "the longest execution path of a bug". + +To produce a CSV file with all the statistics collected per entry point, use the ``dump-entry-point-stats-to-csv=<file>.csv`` parameter. + +Note, EntryPointStats.h is not meant to be complete, and if you feel it is lacking certain kind of statistic, odds are that it does. +Feel free to extend it! diff --git a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def index 2aa00db411844..f9f22a9ced650 100644 --- a/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def +++ b/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def @@ -353,6 +353,12 @@ ANALYZER_OPTION(bool, DisplayCTUProgress, "display-ctu-progress", "the analyzer's progress related to ctu.", false) +ANALYZER_OPTION( + StringRef, DumpEntryPointStatsToCSV, "dump-entry-point-stats-to-csv", + "If provided, the analyzer will dump statistics per entry point " + "into the specified CSV file.", + "") + ANALYZER_OPTION(bool, ShouldTrackConditions, "track-conditions", "Whether to track conditions that are a control dependency of " "an already tracked variable.", diff --git a/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h new file mode 100644 index 0000000000000..633fb7aa8f72d --- /dev/null +++ b/clang/include/clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h @@ -0,0 +1,162 @@ +// EntryPointStats.h - Tracking statistics per entry point ------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#ifndef CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H +#define CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H + +#include "llvm/ADT/Statistic.h" +#include "llvm/ADT/StringRef.h" + +namespace llvm { +class raw_ostream; +} // namespace llvm + +namespace clang { +class Decl; + +namespace ento { + +class EntryPointStat { +public: + llvm::StringLiteral name() const { return Name; } + + static void lockRegistry(); + + static void takeSnapshot(const Decl *EntryPoint); + static void dumpStatsAsCSV(llvm::raw_ostream &OS); + static void dumpStatsAsCSV(llvm::StringRef FileName); + +protected: + explicit EntryPointStat(llvm::StringLiteral Name) : Name{Name} {} + EntryPointStat(const EntryPointStat &) = delete; + EntryPointStat(EntryPointStat &&) = delete; + EntryPointStat &operator=(EntryPointStat &) = delete; + EntryPointStat &operator=(EntryPointStat &&) = delete; + +private: + llvm::StringLiteral Name; +}; + +class BoolEPStat : public EntryPointStat { + std::optional<bool> Value = {}; + +public: + explicit BoolEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value && *Value; } + void set(bool V) { + assert(!Value.has_value()); + Value = V; + } + void reset() { Value = {}; } +}; + +// used by CounterEntryPointTranslationUnitStat +class CounterEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + unsigned Value = {}; + +public: + explicit CounterEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value; } + void reset() { Value = {}; } + CounterEPStat &operator++() { + ++Value; + return *this; + } + + CounterEPStat &operator++(int) { + // No diff erence as you can't extract the value + return ++(*this); + } + + CounterEPStat &operator+=(unsigned Inc) { + Value += Inc; + return *this; + } +}; + +// used by UnsignedMaxEtryPointTranslationUnitStatistic +class UnsignedMaxEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + unsigned Value = {}; + +public: + explicit UnsignedMaxEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value; } + void reset() { Value = {}; } + void updateMax(unsigned X) { Value = std::max(Value, X); } +}; + +class UnsignedEPStat : public EntryPointStat { + using EntryPointStat::EntryPointStat; + std::optional<unsigned> Value = {}; + +public: + explicit UnsignedEPStat(llvm::StringLiteral Name); + unsigned value() const { return Value.value_or(0); } + void reset() { Value.reset(); } + void set(unsigned V) { + assert(!Value.has_value()); + Value = V; + } +}; + +class CounterEntryPointTranslationUnitStat { + CounterEPStat M; + llvm::TrackingStatistic S; + +public: + CounterEntryPointTranslationUnitStat(const char *DebugType, + llvm::StringLiteral Name, + llvm::StringLiteral Desc) + : M(Name), S(DebugType, Name.data(), Desc.data()) {} + CounterEntryPointTranslationUnitStat &operator++() { + ++M; + ++S; + return *this; + } + + CounterEntryPointTranslationUnitStat &operator++(int) { + // No diff erence with prefix as the value is not observable. + return ++(*this); + } + + CounterEntryPointTranslationUnitStat &operator+=(unsigned Inc) { + M += Inc; + S += Inc; + return *this; + } +}; + +class UnsignedMaxEntryPointTranslationUnitStatistic { + UnsignedMaxEPStat M; + llvm::TrackingStatistic S; + +public: + UnsignedMaxEntryPointTranslationUnitStatistic(const char *DebugType, + llvm::StringLiteral Name, + llvm::StringLiteral Desc) + : M(Name), S(DebugType, Name.data(), Desc.data()) {} + void updateMax(uint64_t Value) { + M.updateMax(static_cast<unsigned>(Value)); + S.updateMax(Value); + } +}; + +#define STAT_COUNTER(VARNAME, DESC) \ + static clang::ento::CounterEntryPointTranslationUnitStat VARNAME = { \ + DEBUG_TYPE, #VARNAME, DESC} + +#define STAT_MAX(VARNAME, DESC) \ + static clang::ento::UnsignedMaxEntryPointTranslationUnitStatistic VARNAME = \ + {DEBUG_TYPE, #VARNAME, DESC} + +} // namespace ento +} // namespace clang + +#endif // CLANG_INCLUDE_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_ENTRYPOINTSTATS_H diff --git a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp index a54f1b1e71d47..d030e69a2a6e0 100644 --- a/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp +++ b/clang/lib/StaticAnalyzer/Checkers/AnalyzerStatsChecker.cpp @@ -13,12 +13,12 @@ #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" #include "clang/StaticAnalyzer/Core/Checker.h" #include "clang/StaticAnalyzer/Core/CheckerManager.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallString.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/raw_ostream.h" #include <optional> @@ -27,10 +27,9 @@ using namespace ento; #define DEBUG_TYPE "StatsChecker" -STATISTIC(NumBlocks, - "The # of blocks in top level functions"); -STATISTIC(NumBlocksUnreachable, - "The # of unreachable blocks in analyzing top level functions"); +STAT_COUNTER(NumBlocks, "The # of blocks in top level functions"); +STAT_COUNTER(NumBlocksUnreachable, + "The # of unreachable blocks in analyzing top level functions"); namespace { class AnalyzerStatsChecker : public Checker<check::EndAnalysis> { diff --git a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp index a4f9e092e8205..5f78fc433275d 100644 --- a/clang/lib/StaticAnalyzer/Core/BugReporter.cpp +++ b/clang/lib/StaticAnalyzer/Core/BugReporter.cpp @@ -39,6 +39,7 @@ #include "clang/StaticAnalyzer/Core/Checker.h" #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/CheckerRegistryData.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h" @@ -54,7 +55,6 @@ #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallString.h" #include "llvm/ADT/SmallVector.h" -#include "llvm/ADT/Statistic.h" #include "llvm/ADT/StringExtras.h" #include "llvm/ADT/StringRef.h" #include "llvm/ADT/iterator_range.h" @@ -82,19 +82,19 @@ using namespace llvm; #define DEBUG_TYPE "BugReporter" -STATISTIC(MaxBugClassSize, - "The maximum number of bug reports in the same equivalence class"); -STATISTIC(MaxValidBugClassSize, - "The maximum number of bug reports in the same equivalence class " - "where at least one report is valid (not suppressed)"); - -STATISTIC(NumTimesReportPassesZ3, "Number of reports passed Z3"); -STATISTIC(NumTimesReportRefuted, "Number of reports refuted by Z3"); -STATISTIC(NumTimesReportEQClassAborted, - "Number of times a report equivalence class was aborted by the Z3 " - "oracle heuristic"); -STATISTIC(NumTimesReportEQClassWasExhausted, - "Number of times all reports of an equivalence class was refuted"); +STAT_MAX(MaxBugClassSize, + "The maximum number of bug reports in the same equivalence class"); +STAT_MAX(MaxValidBugClassSize, + "The maximum number of bug reports in the same equivalence class " + "where at least one report is valid (not suppressed)"); + +STAT_COUNTER(NumTimesReportPassesZ3, "Number of reports passed Z3"); +STAT_COUNTER(NumTimesReportRefuted, "Number of reports refuted by Z3"); +STAT_COUNTER(NumTimesReportEQClassAborted, + "Number of times a report equivalence class was aborted by the Z3 " + "oracle heuristic"); +STAT_COUNTER(NumTimesReportEQClassWasExhausted, + "Number of times all reports of an equivalence class was refuted"); BugReporterVisitor::~BugReporterVisitor() = default; diff --git a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt index fb9394a519eb7..d0a9b202f9c52 100644 --- a/clang/lib/StaticAnalyzer/Core/CMakeLists.txt +++ b/clang/lib/StaticAnalyzer/Core/CMakeLists.txt @@ -24,6 +24,7 @@ add_clang_library(clangStaticAnalyzerCore CoreEngine.cpp DynamicExtent.cpp DynamicType.cpp + EntryPointStats.cpp Environment.cpp ExplodedGraph.cpp ExprEngine.cpp diff --git a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp index d96211c3a6635..5c05c9c87f124 100644 --- a/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/CoreEngine.cpp @@ -22,12 +22,12 @@ #include "clang/Basic/LLVM.h" #include "clang/StaticAnalyzer/Core/AnalyzerOptions.h" #include "clang/StaticAnalyzer/Core/PathSensitive/BlockCounter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/FunctionSummary.h" #include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h" #include "llvm/ADT/STLExtras.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/ErrorHandling.h" #include "llvm/Support/FormatVariadic.h" @@ -43,14 +43,12 @@ using namespace ento; #define DEBUG_TYPE "CoreEngine" -STATISTIC(NumSteps, - "The # of steps executed."); -STATISTIC(NumSTUSteps, "The # of STU steps executed."); -STATISTIC(NumCTUSteps, "The # of CTU steps executed."); -STATISTIC(NumReachedMaxSteps, - "The # of times we reached the max number of steps."); -STATISTIC(NumPathsExplored, - "The # of paths explored by the analyzer."); +STAT_COUNTER(NumSteps, "The # of steps executed."); +STAT_COUNTER(NumSTUSteps, "The # of STU steps executed."); +STAT_COUNTER(NumCTUSteps, "The # of CTU steps executed."); +ALWAYS_ENABLED_STATISTIC(NumReachedMaxSteps, + "The # of times we reached the max number of steps."); +STAT_COUNTER(NumPathsExplored, "The # of paths explored by the analyzer."); //===----------------------------------------------------------------------===// // Core analysis engine. diff --git a/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp new file mode 100644 index 0000000000000..617002cce90eb --- /dev/null +++ b/clang/lib/StaticAnalyzer/Core/EntryPointStats.cpp @@ -0,0 +1,201 @@ +//===- EntryPointStats.cpp --------------------------------------*- C++ -*-===// +// +// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +// See https://llvm.org/LICENSE.txt for license information. +// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +// +//===----------------------------------------------------------------------===// + +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" +#include "clang/AST/DeclBase.h" +#include "clang/Analysis/AnalysisDeclContext.h" +#include "llvm/ADT/STLExtras.h" +#include "llvm/ADT/StringExtras.h" +#include "llvm/ADT/StringRef.h" +#include "llvm/Support/FileSystem.h" +#include "llvm/Support/ManagedStatic.h" +#include "llvm/Support/raw_ostream.h" +#include <iterator> + +using namespace clang; +using namespace ento; + +namespace { +struct Registry { + std::vector<BoolEPStat *> BoolStats; + std::vector<CounterEPStat *> CounterStats; + std::vector<UnsignedMaxEPStat *> UnsignedMaxStats; + std::vector<UnsignedEPStat *> UnsignedStats; + + bool IsLocked = false; + + struct Snapshot { + const Decl *EntryPoint; + std::vector<bool> BoolStatValues; + std::vector<unsigned> UnsignedStatValues; + + void dumpAsCSV(llvm::raw_ostream &OS) const; + }; + + std::vector<Snapshot> Snapshots; +}; +} // namespace + +static llvm::ManagedStatic<Registry> StatsRegistry; + +namespace { +template <typename Callback> void enumerateStatVectors(const Callback &Fn) { + Fn(StatsRegistry->BoolStats); + Fn(StatsRegistry->CounterStats); + Fn(StatsRegistry->UnsignedMaxStats); + Fn(StatsRegistry->UnsignedStats); +} +} // namespace + +static void checkStatName(const EntryPointStat *M) { +#ifdef NDEBUG + return; +#endif // NDEBUG + constexpr std::array AllowedSpecialChars = { + '+', '-', '_', '=', ':', '(', ')', '@', '!', '~', + '$', '%', '^', '&', '*', '\'', ';', '<', '>', '/'}; + for (unsigned char C : M->name()) { + if (!std::isalnum(C) && !llvm::is_contained(AllowedSpecialChars, C)) { + llvm::errs() << "Stat name \"" << M->name() << "\" contains character '" + << C << "' (" << static_cast<int>(C) + << ") that is not allowed."; + assert(false && "The Stat name contains unallowed character"); + } + } +} + +void EntryPointStat::lockRegistry() { + auto CmpByNames = [](const EntryPointStat *L, const EntryPointStat *R) { + return L->name() < R->name(); + }; + enumerateStatVectors( + [CmpByNames](auto &Stats) { llvm::sort(Stats, CmpByNames); }); + enumerateStatVectors( + [](const auto &Stats) { llvm::for_each(Stats, checkStatName); }); + StatsRegistry->IsLocked = true; +} + +static bool isRegistered(llvm::StringLiteral Name) { + auto ByName = [Name](const EntryPointStat *M) { return M->name() == Name; }; + bool Result = false; + enumerateStatVectors([ByName, &Result](const auto &Stats) { + Result = Result || llvm::any_of(Stats, ByName); + }); + return Result; +} + +BoolEPStat::BoolEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->BoolStats.push_back(this); +} + +CounterEPStat::CounterEPStat(llvm::StringLiteral Name) : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->CounterStats.push_back(this); +} + +UnsignedMaxEPStat::UnsignedMaxEPStat(llvm::StringLiteral Name) + : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->UnsignedMaxStats.push_back(this); +} + +UnsignedEPStat::UnsignedEPStat(llvm::StringLiteral Name) + : EntryPointStat(Name) { + assert(!StatsRegistry->IsLocked); + assert(!isRegistered(Name)); + StatsRegistry->UnsignedStats.push_back(this); +} + +static std::vector<unsigned> consumeUnsignedStats() { + std::vector<unsigned> Result; + Result.reserve(StatsRegistry->CounterStats.size() + + StatsRegistry->UnsignedMaxStats.size() + + StatsRegistry->UnsignedStats.size()); + for (auto *M : StatsRegistry->CounterStats) { + Result.push_back(M->value()); + M->reset(); + } + for (auto *M : StatsRegistry->UnsignedMaxStats) { + Result.push_back(M->value()); + M->reset(); + } + for (auto *M : StatsRegistry->UnsignedStats) { + Result.push_back(M->value()); + M->reset(); + } + return Result; +} + +static std::vector<llvm::StringLiteral> getStatNames() { + std::vector<llvm::StringLiteral> Ret; + auto GetName = [](const EntryPointStat *M) { return M->name(); }; + enumerateStatVectors([GetName, &Ret](const auto &Stats) { + transform(Stats, std::back_inserter(Ret), GetName); + }); + return Ret; +} + +void Registry::Snapshot::dumpAsCSV(llvm::raw_ostream &OS) const { + OS << '"'; + llvm::printEscapedString( + clang::AnalysisDeclContext::getFunctionName(EntryPoint), OS); + OS << "\", "; + auto PrintAsBool = [&OS](bool B) { OS << (B ? "true" : "false"); }; + llvm::interleaveComma(BoolStatValues, OS, PrintAsBool); + OS << ((BoolStatValues.empty() || UnsignedStatValues.empty()) ? "" : ", "); + llvm::interleaveComma(UnsignedStatValues, OS); +} + +static std::vector<bool> consumeBoolStats() { + std::vector<bool> Result; + Result.reserve(StatsRegistry->BoolStats.size()); + for (auto *M : StatsRegistry->BoolStats) { + Result.push_back(M->value()); + M->reset(); + } + return Result; +} + +void EntryPointStat::takeSnapshot(const Decl *EntryPoint) { + auto BoolValues = consumeBoolStats(); + auto UnsignedValues = consumeUnsignedStats(); + StatsRegistry->Snapshots.push_back( + {EntryPoint, std::move(BoolValues), std::move(UnsignedValues)}); +} + +void EntryPointStat::dumpStatsAsCSV(llvm::StringRef FileName) { + std::error_code EC; + llvm::raw_fd_ostream File(FileName, EC, llvm::sys::fs::OF_Text); + if (EC) + return; + dumpStatsAsCSV(File); +} + +void EntryPointStat::dumpStatsAsCSV(llvm::raw_ostream &OS) { + OS << "EntryPoint, "; + llvm::interleaveComma(getStatNames(), OS); + OS << "\n"; + + std::vector<std::string> Rows; + Rows.reserve(StatsRegistry->Snapshots.size()); + for (const auto &Snapshot : StatsRegistry->Snapshots) { + std::string Row; + llvm::raw_string_ostream RowOs(Row); + Snapshot.dumpAsCSV(RowOs); + RowOs << "\n"; + Rows.push_back(RowOs.str()); + } + llvm::sort(Rows); + for (const auto &Row : Rows) { + OS << Row; + } +} diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp index 914eb0f4ef6bd..12a5b248c843f 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp @@ -49,6 +49,7 @@ #include "clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CoreEngine.h" #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExplodedGraph.h" #include "clang/StaticAnalyzer/Core/PathSensitive/LoopUnrolling.h" #include "clang/StaticAnalyzer/Core/PathSensitive/LoopWidening.h" @@ -67,7 +68,6 @@ #include "llvm/ADT/ImmutableSet.h" #include "llvm/ADT/STLExtras.h" #include "llvm/ADT/SmallVector.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/DOTGraphTraits.h" @@ -90,16 +90,18 @@ using namespace ento; #define DEBUG_TYPE "ExprEngine" -STATISTIC(NumRemoveDeadBindings, - "The # of times RemoveDeadBindings is called"); -STATISTIC(NumMaxBlockCountReached, - "The # of aborted paths due to reaching the maximum block count in " - "a top level function"); -STATISTIC(NumMaxBlockCountReachedInInlined, - "The # of aborted paths due to reaching the maximum block count in " - "an inlined function"); -STATISTIC(NumTimesRetriedWithoutInlining, - "The # of times we re-evaluated a call without inlining"); +STAT_COUNTER(NumRemoveDeadBindings, + "The # of times RemoveDeadBindings is called"); +STAT_COUNTER( + NumMaxBlockCountReached, + "The # of aborted paths due to reaching the maximum block count in " + "a top level function"); +STAT_COUNTER( + NumMaxBlockCountReachedInInlined, + "The # of aborted paths due to reaching the maximum block count in " + "an inlined function"); +STAT_COUNTER(NumTimesRetriedWithoutInlining, + "The # of times we re-evaluated a call without inlining"); //===----------------------------------------------------------------------===// // Internal program state traits. diff --git a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp index 02facf786830d..1a44ba4f49133 100644 --- a/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp +++ b/clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp @@ -19,9 +19,9 @@ #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h" #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/SmallSet.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/Casting.h" #include "llvm/Support/Compiler.h" #include "llvm/Support/SaveAndRestore.h" @@ -32,14 +32,14 @@ using namespace ento; #define DEBUG_TYPE "ExprEngine" -STATISTIC(NumOfDynamicDispatchPathSplits, - "The # of times we split the path due to imprecise dynamic dispatch info"); +STAT_COUNTER( + NumOfDynamicDispatchPathSplits, + "The # of times we split the path due to imprecise dynamic dispatch info"); -STATISTIC(NumInlinedCalls, - "The # of times we inlined a call"); +STAT_COUNTER(NumInlinedCalls, "The # of times we inlined a call"); -STATISTIC(NumReachedInlineCountMax, - "The # of times we reached inline count maximum"); +STAT_COUNTER(NumReachedInlineCountMax, + "The # of times we reached inline count maximum"); void ExprEngine::processCallEnter(NodeBuilderContext& BC, CallEnter CE, ExplodedNode *Pred) { diff --git a/clang/lib/StaticAnalyzer/Core/WorkList.cpp b/clang/lib/StaticAnalyzer/Core/WorkList.cpp index 7042a9020837a..9f40926e9a026 100644 --- a/clang/lib/StaticAnalyzer/Core/WorkList.cpp +++ b/clang/lib/StaticAnalyzer/Core/WorkList.cpp @@ -11,11 +11,11 @@ //===----------------------------------------------------------------------===// #include "clang/StaticAnalyzer/Core/PathSensitive/WorkList.h" -#include "llvm/ADT/PriorityQueue.h" -#include "llvm/ADT/DenseSet.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "llvm/ADT/DenseMap.h" +#include "llvm/ADT/DenseSet.h" +#include "llvm/ADT/PriorityQueue.h" #include "llvm/ADT/STLExtras.h" -#include "llvm/ADT/Statistic.h" #include <deque> #include <vector> @@ -24,8 +24,8 @@ using namespace ento; #define DEBUG_TYPE "WorkList" -STATISTIC(MaxQueueSize, "Maximum size of the worklist"); -STATISTIC(MaxReachableSize, "Maximum size of auxiliary worklist set"); +STAT_MAX(MaxQueueSize, "Maximum size of the worklist"); +STAT_MAX(MaxReachableSize, "Maximum size of auxiliary worklist set"); //===----------------------------------------------------------------------===// // Worklist classes for exploration of reachable states. diff --git a/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp b/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp index c4dd016f70d86..fca792cdf86f7 100644 --- a/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp +++ b/clang/lib/StaticAnalyzer/Core/Z3CrosscheckVisitor.cpp @@ -14,8 +14,8 @@ #include "clang/StaticAnalyzer/Core/BugReporter/Z3CrosscheckVisitor.h" #include "clang/StaticAnalyzer/Core/AnalyzerOptions.h" #include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/SMTConv.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/SMTAPI.h" #include "llvm/Support/Timer.h" @@ -25,20 +25,21 @@ // Multiple `check()` calls might be called on the same query if previous // attempts of the same query resulted in UNSAT for any reason. Each query is // only counted once for these statistics, the retries are not accounted for. -STATISTIC(NumZ3QueriesDone, "Number of Z3 queries done"); -STATISTIC(NumTimesZ3TimedOut, "Number of times Z3 query timed out"); -STATISTIC(NumTimesZ3ExhaustedRLimit, - "Number of times Z3 query exhausted the rlimit"); -STATISTIC(NumTimesZ3SpendsTooMuchTimeOnASingleEQClass, - "Number of times report equivalenece class was cut because it spent " - "too much time in Z3"); - -STATISTIC(NumTimesZ3QueryAcceptsReport, - "Number of Z3 queries accepting a report"); -STATISTIC(NumTimesZ3QueryRejectReport, - "Number of Z3 queries rejecting a report"); -STATISTIC(NumTimesZ3QueryRejectEQClass, - "Number of times rejecting an report equivalenece class"); +STAT_COUNTER(NumZ3QueriesDone, "Number of Z3 queries done"); +STAT_COUNTER(NumTimesZ3TimedOut, "Number of times Z3 query timed out"); +STAT_COUNTER(NumTimesZ3ExhaustedRLimit, + "Number of times Z3 query exhausted the rlimit"); +STAT_COUNTER( + NumTimesZ3SpendsTooMuchTimeOnASingleEQClass, + "Number of times report equivalenece class was cut because it spent " + "too much time in Z3"); + +STAT_COUNTER(NumTimesZ3QueryAcceptsReport, + "Number of Z3 queries accepting a report"); +STAT_COUNTER(NumTimesZ3QueryRejectReport, + "Number of Z3 queries rejecting a report"); +STAT_COUNTER(NumTimesZ3QueryRejectEQClass, + "Number of times rejecting an report equivalenece class"); using namespace clang; using namespace ento; diff --git a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp index 8a4bb35925e2c..b4222eddc09f9 100644 --- a/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp +++ b/clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp @@ -32,10 +32,10 @@ #include "clang/StaticAnalyzer/Core/CheckerManager.h" #include "clang/StaticAnalyzer/Core/PathDiagnosticConsumers.h" #include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h" +#include "clang/StaticAnalyzer/Core/PathSensitive/EntryPointStats.h" #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h" #include "llvm/ADT/PostOrderIterator.h" #include "llvm/ADT/ScopeExit.h" -#include "llvm/ADT/Statistic.h" #include "llvm/Support/FileSystem.h" #include "llvm/Support/Path.h" #include "llvm/Support/Program.h" @@ -51,17 +51,18 @@ using namespace ento; #define DEBUG_TYPE "AnalysisConsumer" -STATISTIC(NumFunctionTopLevel, "The # of functions at top level."); -STATISTIC(NumFunctionsAnalyzed, - "The # of functions and blocks analyzed (as top level " - "with inlining turned on)."); -STATISTIC(NumBlocksInAnalyzedFunctions, - "The # of basic blocks in the analyzed functions."); -STATISTIC(NumVisitedBlocksInAnalyzedFunctions, - "The # of visited basic blocks in the analyzed functions."); -STATISTIC(PercentReachableBlocks, "The % of reachable basic blocks."); -STATISTIC(MaxCFGSize, "The maximum number of basic blocks in a function."); - +STAT_COUNTER(NumFunctionTopLevel, "The # of functions at top level."); +ALWAYS_ENABLED_STATISTIC(NumFunctionsAnalyzed, + "The # of functions and blocks analyzed (as top level " + "with inlining turned on)."); +ALWAYS_ENABLED_STATISTIC(NumBlocksInAnalyzedFunctions, + "The # of basic blocks in the analyzed functions."); +ALWAYS_ENABLED_STATISTIC( + NumVisitedBlocksInAnalyzedFunctions, + "The # of visited basic blocks in the analyzed functions."); +ALWAYS_ENABLED_STATISTIC(PercentReachableBlocks, + "The % of reachable basic blocks."); +STAT_MAX(MaxCFGSize, "The maximum number of basic blocks in a function."); //===----------------------------------------------------------------------===// // AnalysisConsumer declaration. //===----------------------------------------------------------------------===// @@ -128,7 +129,9 @@ class AnalysisConsumer : public AnalysisASTConsumer, PP(CI.getPreprocessor()), OutDir(outdir), Opts(opts), Plugins(plugins), Injector(std::move(injector)), CTU(CI), MacroExpansions(CI.getLangOpts()) { + EntryPointStat::lockRegistry(); DigestAnalyzerOptions(); + if (Opts.AnalyzerDisplayProgress || Opts.PrintStats || Opts.ShouldSerializeStats) { AnalyzerTimers = std::make_unique<llvm::TimerGroup>( @@ -653,6 +656,10 @@ void AnalysisConsumer::HandleTranslationUnit(ASTContext &C) { PercentReachableBlocks = (FunctionSummaries.getTotalNumVisitedBasicBlocks() * 100) / NumBlocksInAnalyzedFunctions; + + if (!Opts.DumpEntryPointStatsToCSV.empty()) { + EntryPointStat::dumpStatsAsCSV(Opts.DumpEntryPointStatsToCSV); + } } AnalysisConsumer::AnalysisMode @@ -688,6 +695,8 @@ AnalysisConsumer::getModeForDecl(Decl *D, AnalysisMode Mode) { return Mode; } +static UnsignedEPStat PathRunningTime("PathRunningTime"); + void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, ExprEngine::InliningModes IMode, SetOfConstDecls *VisitedCallees) { @@ -732,6 +741,7 @@ void AnalysisConsumer::HandleCode(Decl *D, AnalysisMode Mode, if ((Mode & AM_Path) && checkerMgr->hasPathSensitiveCheckers()) { RunPathSensitiveChecks(D, IMode, VisitedCallees); + EntryPointStat::takeSnapshot(D); if (IMode != ExprEngine::Inline_Minimal) NumFunctionsAnalyzed++; } diff --git a/clang/test/Analysis/analyzer-config.c b/clang/test/Analysis/analyzer-config.c index 978a7509ee5e3..00177769f3243 100644 --- a/clang/test/Analysis/analyzer-config.c +++ b/clang/test/Analysis/analyzer-config.c @@ -79,6 +79,7 @@ // CHECK-NEXT: debug.AnalysisOrder:RegionChanges = false // CHECK-NEXT: display-checker-name = true // CHECK-NEXT: display-ctu-progress = false +// CHECK-NEXT: dump-entry-point-stats-to-csv = "" // CHECK-NEXT: eagerly-assume = true // CHECK-NEXT: elide-constructors = true // CHECK-NEXT: expand-macros = false diff --git a/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp b/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp new file mode 100644 index 0000000000000..bddba084ee4bf --- /dev/null +++ b/clang/test/Analysis/analyzer-stats/entry-point-stats.cpp @@ -0,0 +1,96 @@ +// REQUIRES: asserts +// RUN: %clang_analyze_cc1 -analyzer-checker=core \ +// RUN: -analyzer-config dump-entry-point-stats-to-csv="%t.csv" \ +// RUN: -verify %s +// RUN: %csv2json "%t.csv" | FileCheck --check-prefix=CHECK %s +// +// CHECK: { +// CHECK-NEXT: "fib(unsigned int)": { +// CHECK-NEXT: "NumBlocks": "{{[0-9]+}}", +// CHECK-NEXT: "NumBlocksUnreachable": "{{[0-9]+}}", +// CHECK-NEXT: "NumCTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumFunctionTopLevel": "{{[0-9]+}}", +// CHECK-NEXT: "NumInlinedCalls": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReached": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReachedInInlined": "{{[0-9]+}}", +// CHECK-NEXT: "NumOfDynamicDispatchPathSplits": "{{[0-9]+}}", +// CHECK-NEXT: "NumPathsExplored": "{{[0-9]+}}", +// CHECK-NEXT: "NumReachedInlineCountMax": "{{[0-9]+}}", +// CHECK-NEXT: "NumRemoveDeadBindings": "{{[0-9]+}}", +// CHECK-NEXT: "NumSTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassAborted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassWasExhausted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportPassesZ3": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportRefuted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesRetriedWithoutInlining": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3ExhaustedRLimit": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryAcceptsReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3SpendsTooMuchTimeOnASingleEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3TimedOut": "{{[0-9]+}}", +// CHECK-NEXT: "NumZ3QueriesDone": "{{[0-9]+}}", +// CHECK-NEXT: "MaxBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxCFGSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxQueueSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxReachableSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxValidBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "PathRunningTime": "{{[0-9]+}}" +// CHECK-NEXT: }, +// CHECK-NEXT: "main(int, char **)": { +// CHECK-NEXT: "NumBlocks": "{{[0-9]+}}", +// CHECK-NEXT: "NumBlocksUnreachable": "{{[0-9]+}}", +// CHECK-NEXT: "NumCTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumFunctionTopLevel": "{{[0-9]+}}", +// CHECK-NEXT: "NumInlinedCalls": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReached": "{{[0-9]+}}", +// CHECK-NEXT: "NumMaxBlockCountReachedInInlined": "{{[0-9]+}}", +// CHECK-NEXT: "NumOfDynamicDispatchPathSplits": "{{[0-9]+}}", +// CHECK-NEXT: "NumPathsExplored": "{{[0-9]+}}", +// CHECK-NEXT: "NumReachedInlineCountMax": "{{[0-9]+}}", +// CHECK-NEXT: "NumRemoveDeadBindings": "{{[0-9]+}}", +// CHECK-NEXT: "NumSTUSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumSteps": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassAborted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportEQClassWasExhausted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportPassesZ3": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesReportRefuted": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesRetriedWithoutInlining": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3ExhaustedRLimit": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryAcceptsReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3QueryRejectReport": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3SpendsTooMuchTimeOnASingleEQClass": "{{[0-9]+}}", +// CHECK-NEXT: "NumTimesZ3TimedOut": "{{[0-9]+}}", +// CHECK-NEXT: "NumZ3QueriesDone": "{{[0-9]+}}", +// CHECK-NEXT: "MaxBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxCFGSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxQueueSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxReachableSize": "{{[0-9]+}}", +// CHECK-NEXT: "MaxValidBugClassSize": "{{[0-9]+}}", +// CHECK-NEXT: "PathRunningTime": "{{[0-9]+}}" +// CHECK-NEXT: } +// CHECK-NEXT: } +// CHECK-NOT: non_entry_point + +// expected-no-diagnostics +int non_entry_point(int end) { + int sum = 0; + for (int i = 0; i <= end; ++i) { + sum += i; + } + return sum; +} + +int fib(unsigned n) { + if (n <= 1) { + return 1; + } + return fib(n - 1) + fib(n - 2); +} + +int main(int argc, char **argv) { + int i = non_entry_point(argc); + return i; +} diff --git a/clang/test/Analysis/csv2json.py b/clang/test/Analysis/csv2json.py new file mode 100644 index 0000000000000..3c20d689243e7 --- /dev/null +++ b/clang/test/Analysis/csv2json.py @@ -0,0 +1,102 @@ +#!/usr/bin/env python +# +# ===- csv2json.py - Static Analyzer test helper ---*- python -*-===# +# +# Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. +# See https://llvm.org/LICENSE.txt for license information. +# SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception +# +# ===------------------------------------------------------------------------===# + +r""" +Clang Static Analyzer test helper +================================= + +This script converts a CSV file to a JSON file with a specific structure. + +The JSON file contains a single dictionary. The keys of this dictionary +are taken from the first column of the CSV. The values are dictionaries +themselves, mapping the CSV header names (except the first column) to +the corresponding row values. + + +Usage: + csv2json.py <source-file> + +Example: + // RUN: %csv2json.py %t | FileCheck %s +""" + +import csv +import sys +import json + + +def csv_to_json_dict(csv_filepath): + """ + Args: + csv_filepath: The path to the input CSV file. + + Raises: + FileNotFoundError: If the CSV file does not exist. + csv.Error: If there is an error parsing the CSV file. + Exception: For any other unexpected errors. + """ + try: + with open(csv_filepath, "r", encoding="utf-8") as csvfile: + reader = csv.reader(csvfile) + + # Read the header row (column names) + try: + header = next(reader) + except StopIteration: # Handle empty CSV file + json.dumps({}, indent=2) # write an empty dict + return + + # handle a csv file that contains no rows, not even a header row. + if not header: + json.dumps({}, indent=2) + return + + other_column_names = [name.strip() for name in header[1:]] + + data_dict = {} + + for row in reader: + if len(row) != len(header): + raise csv.Error("Inconsistent CSV file") + exit(1) + + key = row[0] + value_map = {} + + for i, col_name in enumerate(other_column_names): + # +1 to skip the first column + value_map[col_name] = row[i + 1].strip() + + data_dict[key] = value_map + + return json.dumps(data_dict, indent=2) + + except FileNotFoundError: + raise FileNotFoundError(f"Error: CSV file not found at {csv_filepath}") + except csv.Error as e: + raise csv.Error(f"Error parsing CSV file: {e}") + except Exception as e: + raise Exception(f"An unexpected error occurred: {e}") + + +def main(): + """Example usage with error handling.""" + csv_file = sys.argv[1] + + try: + print(csv_to_json_dict(csv_file)) + except (FileNotFoundError, csv.Error, Exception) as e: + print(str(e)) + except: + print("An error occured") + + +if __name__ == "__main__": + main() diff --git a/clang/test/lit.cfg.py b/clang/test/lit.cfg.py index 9820ddd1f14af..025ef7a9133ea 100644 --- a/clang/test/lit.cfg.py +++ b/clang/test/lit.cfg.py @@ -186,6 +186,14 @@ def have_host_clang_repl_cuda(): ) ) + csv2json_path = os.path.join(config.test_source_root, "Analysis", "csv2json.py") + config.substitutions.append( + ( + "%csv2json", + '"%s" %s' % (config.python_executable, csv2json_path), + ) + ) + llvm_config.add_tool_substitutions(tools, tool_dirs) config.substitutions.append( _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits