whisperity added a comment. In general, perhaps you should go over the rendered text once again and make use of //emph// and **bold** some more. Explanation looks okay from my part, although I'm really not knowledgeable about CSA internals. But there are plenty of "typesetting" issues.
Do you have to stick to 80-col in the documentation file? It is customary (at least in LaTeX) that prose is organised in a "one line of code per sentence" fashion. This also makes handling the diff easier... In the latter part towards the discussion about how iterators are targeted, the "80-col" is broken anyways. ================ Comment at: clang/docs/analyzer/developer-docs.rst:14 developer-docs/RegionStore - \ No newline at end of file + developer-docs/ContainerModeling.rst + ---------------- How come this new addition is a `.rst` but the rest isn't. Does the rendering work without it? We should keep a style here, either way. So turn all into proper `.rst` links, or drop the `.rst` here. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:5 + +The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a +symbolic abstraction for containers to the Clang Static Analyzer. There are ---------------- Can the checker's name be turned into a link, or am I mixing up individual checker documentations' availability with Tidy? Either way, show the check's name in bold face. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:6 +The goal of checker ``alpha.cplusplus.ContainerModeling`` is to provide a +symbolic abstraction for containers to the Clang Static Analyzer. There are +various concepts regarding containers that help formulate static analysis ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:8 +various concepts regarding containers that help formulate static analysis +problems more concisely. The size of the container, whether is empty or not are +the most trivial motivating examples. Standard containers can be iterated, and ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:11 +this idiom is well-adopted in case of non-standard container implementations as +well, because it can be used to provide a compatible interface to algorithms. +Therefore iterator modeling is closely related to containers. Iterators extend ---------------- what does `it` refer to here? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:21 +the modeling checkers mentioned: + * :ref:`alpha-cplusplus-InvalidatedIterator` + * :ref:`alpha-cplusplus-IteratorRange` ---------------- Will this appear to the reader as `alpha-` or `alpha.`? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:29 + +According to ContainerModeling, a value ``c`` with type ``C`` is considered a +container if either of the following holds: ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35 + This should be detected by type C having member functions ``T C::begin()`` + and ``T C::end()``, where T is an `iterator + <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:35-36 + This should be detected by type C having member functions ``T C::begin()`` + and ``T C::end()``, where T is an `iterator + <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_. + * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a ---------------- whisperity wrote: > Iterator was linked already, I don't think this needs to repeat. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:37-47 + * The expression ``begin(c)`` and ``end(c)`` are both valid expressions in a + given scope, and return an `iterator + <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_. + This should be detected by checking the existence functions with the + corresponding names, and can either be user-defined free functions or + template specialization of the standard-defined ``template<typename T> + constexpr auto std::begin(T& t) -> decltype(t.begin())`` and ---------------- What about [[ https://en.cppreference.com/w/cpp/algorithm/ranges/for_each | neibloids (see right before Parameters) ]] (callables that are explicitly disabling ADL at a call site even though they look like an ADL call, with tricks, usually found in the `ranges` library). ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:40 + <https://en.cppreference.com/w/cpp/iterator#Iterator_categories>`_. + This should be detected by checking the existence functions with the + corresponding names, and can either be user-defined free functions or ---------------- Existence function? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:51-55 + - std::array + - std::vector + - std::deque + - std::list + - std::forward_list ---------------- Format these as `monospace` too. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:57 + +Example of a custom container with member functions. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:71 + +Example of a custom container with free functions. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:87 + +Example of a custom container with std template specialization. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:106 +A container is modeled if it has an associated ``MemRegion``, and this ``MemRegion``, +or rather the ``const MemRegion*`` (and pointers to its subclasses), that is accessible +by the ``MemRegionManager`` is what uniquely identifies a container. Temporary ---------------- (While I don't at all agree with the whole //put the `*` to the variable name, not the type name// yadda, Clang prints type names with a ` ` before the `*`, so let's keep it consistent...ly bad.) ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:110 + +A container is tracked from the ``ProgramPoint``, where either ``begin`` or ``end`` +member function (or free function) is called. Abstract modeling uses ``SymbolRef``-s for the ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:114 +in form of assumptions (inside ``ConstraintManager``). +For specifying positions inside the container we use one of the following expressions + - ``<begin-symbol> + <concrete-value>`` for specifying a position relative to the beginning of the container. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:122 + +Containers are modeled in the GDM by their region (MemRegion*) as their associated key, +this region is immutable, it cannot change during the lifetime of the modeled object. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:123 +Containers are modeled in the GDM by their region (MemRegion*) as their associated key, +this region is immutable, it cannot change during the lifetime of the modeled object. +The begin and end symbols are conjured and are completely unrelated to the region of ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:126 +the container. For each region we store the only the begin and end symbols, other properties +are to be computed from these, and their relationships stored in the ContstraintManager. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:147 +container modeling. The problem of RValue/LValue (more precisely prvalue, xvalue, and +lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_) +modeling is not prominent is case containers alone, as no temporary objects are considered. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:148 +lvalue see `value categories <https://en.cppreference.com/w/cpp/language/value_category>`_) +modeling is not prominent is case containers alone, as no temporary objects are considered. +However, this is an issue to be solved when it comes to modeling iterators. ---------------- I think something is missing for here, I am incapable of decyphering this sentence. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:154-155 + modeling of copy constructors are needed. + No constructors of containers are modeled, there is a WIP + `patch <https://reviews.llvm.org/D87388>`_ for default constructor. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:157 + +There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that +every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager`` ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:158 +There is a limitation in the size of Symbols handled by ``the ConstraintManager``, namely that +every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager`` +could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol ---------------- Is `typesize` a defined thing in the constraint manager? Otherwise, wouldn't `sizeof(T) / 4` be better readable? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:159 +every offset is assumed to be at most ``typesize/4`` in size, otherwise the ``ConstraintManager`` +could not reorder expressions containing the the symbol. As an orthogonal issue symbol-symbol +comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:161-162 +comparisons still cannot be handled properly if the ``ContstraintManager`` would also be able +answer questions like: is symbol A less than symbol B (instead of just reporting the possible +range of the values a symbol can have). + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:165-166 +.. note:: + There is a WIP extension: if range of 2 symbols is disjunct and the max of first is less than + the min of the second, report less relation. `patch <https://reviews.llvm.org/D77792>`_ + This patch would be needed to compare the sizes of containers. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:169 + If the containers don't overlap in memory, then this would provide a way to determine the size + differences. E.g.: If we could store ``a = b + 5`` even if the ranges of a and b is unknown, + reordering of this would produce: ``a - b = 5`` and this can have a range attached in the ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:180 + +A value with type T is possibly considered an iterator: +If T is ---------------- (same with T -> `T` below consistently) ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:185 + - destructible + - can be incremented (both post and prefix unary plus-plus operator are defined) +AND ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:204 +The tracking of an iterator begins if a value is detected with the preceding properties *and* its name +has 'iterator'/'iter'/'it' postfix. In special cases pointers are also treated as iterators, namely, +if they are results of ``begin`` or ``end`` member functions or free functions. ---------------- What does the `'` do here? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:210 + - only track an iterator if its generating expression is a function call which has at least 1 argument, that is an already tracked iterator (and the first iterator parameter's container will be the parent container of the returned iterator) +If either of these heuristics matches the tracking of iterator should be skipped. + ---------------- Should be? Is. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:213 +Iterators are modeled in the GDM with 2 kinds of keys: + - Region (``const MemRegion*``) + - Symbol (``SymbolRef``) ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:224 + + - a single conjured symbol (SymbolVal) + - a conjured symbol (SymbolVal) + a number (``ConcreteInt``) (This for is useful for reordering) ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:227 + +Functions like find (when alpha.cplusplus.STLAlgorithmModeling is enabled) handle cases where an +element is found, and a case where it is not ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:236 + +Example of a pointer iterator implementation (conceptionally no difference between inline and non-inline modes) + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:261-282 + Cont c; // no modeling should be done here + + int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont + // begin() member function call triggers the modeling + // iterator-modeling also begins by tracking the value of c.begin() + // as an iterator + // we check if the value has the necessary iterator-properties ---------------- Perhaps it would be better if indivudal comments for each line would simply be on the line before the code? Something like: ``` // No modelling needed: Container C; // Bla bla foo() ``` ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:263 + + int* it = c.begin(); // container-modeling begins by tracking c as a container of type Cont + // begin() member function call triggers the modeling ---------------- Yeaaaaaaah... ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290 + +Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason +why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:290-293 +Contrary to the container-modeling, not only lvalue iterators are tracked. This is the reason +why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region +(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value +then a Symbol is used (``SymbolRef``). ---------------- whisperity wrote: > Previously, `lvalue` was written as `LValue`! ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:292 +why 2 different keys are used in the GDM for iterators. An lvalue iterator has a Region +(``const MemRegion*``) and it is used if available. If no Region is found for an iterator value +then a Symbol is used (``SymbolRef``). ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:295 + +In the case of pointer iterators (where std::is_pointer<T>::value is true for the type T of the iterator), +the modeling of the symbolic value is simpler. The lifetime of such values is simple to model, ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:299 + +Example of a pointer iterator. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:311 +Iterators implemented as pointer live generally in the SymbolMap (the map containing ``SymbolRef``-s as +opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and +consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:312 +opposed to the map containing the ``const MemRegion*``-s), and can not be only represented with LValues (and +consequently inside the RegionMap), as tracking the value of symbolic offset of an iterator must handle +expressions where there may only be temporaries with no LValues associated with them. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:315 + +We cannot consistently track only LValues (``MemRegionVal``-s) or only ``RValues`` (``SymbolVal`` or ``LazyCompoundVal``), +because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions), ---------------- LValues, `RValues`, incosistency ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:317 +because pointer iterators have only Rvalues that always identifies them (across multiple subexpressions), +class instance variables only have Lvalues for this role. SymbolMap always has iterator values that are RValues. +RegionMap can have iterator values which are LVals but also values which are RValues. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:320 + +In case of class instance implemented iterators, the operations are ``CXXOperatorCallExpr``-s (not built-in +operators). Also sometimes RValues of such instances are modeled as ``LazyCompoundVal`` ``SVal``-s, but can ---------------- Could we turn this on its head and simply call these "user-defined iterator types" instead of this, while technically correct, rather wonky "class instance" kind of stuff? ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:324-325 + +The modeling of special container-related member functions can be found in ``Iterator.cpp``, and +algorithm modeling in ``STLAlgorithmModeling.cpp``. + ---------------- Which directory? Previously the algorithm checker was referenced with its logical name `alpha.cplusplus.SomethingSomething`. ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:328-329 +The semantic difference between the 2 iterator implementation with respect to their ``SVals`` is +that accessing a pointers ``SVal`` always return reference to a Region (no way to be a symbol, SymRegion), +but in case if a class instance iterator can be a symbol (SymExpr). + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:331 + +Example of a class instance iterator. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:335 + + class cont_it {...}; + cont_it it = cont.begin(); ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:340 + +Example of a container which has iterators as elements. + ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:360 + + **Option 1**: use only RValues (``SymbolRef``-s) as keys + Class instance iterators (possibly with ``LazyCompoundVal``) can not be used as keys as they are now, because ---------------- I think LLVM's RST rendering rules have a `.. option::` element, which renders it in a nice style. But this might also be Tidy-specific... ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:363 + they do not satisfy map key criteria (this could be maybe solved by defining an ordering on them), but the + main issue is that even though they wrap a SymbolVal, this wrapping is an implementation detail and should not + be relied upon, also meaning they should not be unwrapped. ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:367 + + **Option 2**: use only LValues (``const MemRegion*``) as keys + Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369 + Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate + RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every + operation results in a temporary which can be tracked). ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:369 + Class instance iterators that evaluate as a result of multiple subexpressions have RValues and these immediate + RValues break a cheain of value propagation. lazyCompoundVal should not be used as keys in a map (every + operation results in a temporary which can be tracked). ---------------- whisperity wrote: > typo ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:372 + Pointer iterators that evaluate as a result of multiple subexpressions have RValues and these immediate RValues break + the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a + temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there ---------------- ================ Comment at: clang/docs/analyzer/developer-docs/ContainerModeling.rst:373-374 + the chain of value propagation. lazyCompoundVal should not be used as keys in a map (every operation results in a + temporary which can be tracked). no temporaries are created during the evaluation of expressions (i + 1 + 2) there + is no intermediate lvalue for i + 1. + ---------------- Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D91948/new/ https://reviews.llvm.org/D91948 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits