Hi Lei,

Just to follow up, I know this conversation has been a bit roundabout.  I'm 
trying to get a clear design for these kind of checkers, and I'm gradually 
understanding the requirements here.

I'm a bit nervous about caching data in the checker itself, since we've 
recently been changing the lifetime of a Checker object.  Previously the 
lifetime of a Checker object was bound to the lifetime of an ExprEngine, but 
now Checkers are managed by CheckerManager, and can conceptually be used by 
multiple ExprEngines.

Keeping data in LocationContexts is a both hit and miss because they were 
intended to model context-sensitivity.  I can completely buy the argument that 
checkers may wish to record information in LocationContexts that are, well, 
tied to a given location context.  Your statistical checker, however, will be 
making assumptions about the LocationContext that doesn't seem true in all 
cases.

I'm also a bit skeptical that all statistical checks would run in a strict 
"textual" (or one-function-at-a-time, no inling) versus the other extreme of 
whole program IPA.  It seems like different statistical checkers will want the 
freedom to collect, and possibly merge, different kinds of statistics, possibly 
from different LocationContexts.  That said, how statistics are collected is 
strictly a policy decision of the individual checker, but I'd be much happier 
that a statistical checker was fine with working in IPA mode as well; it just 
was smart about what statistics to collect.

In a nutshell, I see three kinds of data the checker may wish to collect:

1) "global" data, that spans the entirety of all code analyzed by a Checker 
instance

2) "location context" specific data, that spans the analysis of a given *call* 
to a function

3) per-path/per-state data, which is stored in the GDM of a GRState

I think your statistical checker will likely one to use (1) or (2).  You 
outlined the restriction of using (2) and assuming that a single 
LocationContext would be made for each analyzed function.  I think that's a 
fairly brittle restriction that's likely to bite us later.  It seems to me that 
if a Checker wants to keep statistics for just a given function (regardless of 
LocationContexts), then that data isn't context-sensitive, and should be stored 
in the Checker instance.  Essentially, the Checker object would have a DenseMap 
mapping from FunctionDecls to a statistics map.  If a Checker saw that it 
already had statistics for a given FunctionDecl, it could either collect/merge 
additional statistics, or decide not to collect any additional statistics.  
This would allow a statistical checker to work fine in the case of multiple 
LocationContexts per FunctionDecl.

What do you think?

Ted

On Apr 13, 2011, at 7:39 PM, Ted Kremenek wrote:

> Hi Lei,
> 
> I think there are multiple possible approaches here.
> 
> First, I think it's reasonable for us to add a "Generic Data Map" mechanism, 
> similar to what we have in GRState, for keeping checker-specific data in 
> LocationContexts.  It needs to be a general mechanism, however, so that the 
> LocationContext API doesn't have anything relating to a specific checker; 
> just a generic way to hold data.  That data also needs to be deallocated once 
> the LocationContext object is freed.
> 
> That said, a simpler mechanism may just be to keep a DenseMap in the 
> statistical checker from LocationContexts -> statistics, and then notify a 
> checker when the LocationContext object gets destroyed.  Actually, we can 
> possibly implement the Generic Data Map mechanism in this way.
> 
> Concerning "inlining", what you call "textual" analysis versus WPA mode 
> reduces down to us having a "policy" object that dictates when we inline and 
> when we don't.  With a policy object, we could change when and how IPA is 
> done, and reuse the same analysis engine.  Different checkers could then 
> articulate that they require a specific IPA policy in place (in the case of 
> this checker, no IPA).  This idea is only half-formed, but I think there are 
> multiple places in the analyzer engine where various "policies" could be used 
> to help guide how analysis is done (e.g., how are loops handled, etc.).
> 
> If your checker truly won't work in IPA, then it is probably okay to just use 
> a DenseMap in the Checker itself, and just invalidate that DenseMap when we 
> start analyzing a function.  That seems like the least complicated solution, 
> although it prohibits a single instance of the Checker from being used 
> simultaneously by multiple ExprEngines.  After thinking about this some, I 
> think this is a reasonable approach, just as long as you understand the 
> restrictions here and that the statistics really will only work when 
> analyzing a single function at a time.
> 
> On Apr 11, 2011, at 11:10 PM, 章磊 wrote:
> 
>> Hi Ted, 
>>  
>> I think statistical checker is some different from the checkers we have. 
>> 
>> IMO, when doing statistical check, the static analyzer should not work in 
>> the "whole-program-analysis" mode, but in the "textual" statistic mode. 
>> 
>> In this mode, we treat every top level functiondecl (also the inline 
>> function) has a body as a cell. Then we do symbolic execution in every cell 
>> to check CheckedReturn state for calls (i.e, a call to foo()), and textually 
>> count a function decl's (i.e, foo()) CheckedReturn state from all the 
>> occurrences (calls). 
>> 
>> So we need a new action to activate this "textual" mode, and in this mode 
>> the inline IPA should be deactivated (i think the function summary IPA also 
>> works). Then every cell will have only one LocationContext, right? 
>>  
>> Here is a example to make it clearer. 
>>  
>> int f1(); 
>>  
>> int main() { 
>>   f2(); 
>>   f3(); 
>> } 
>>  
>> void f2() { 
>>   f1(); 
>> } 
>>  
>> void f3() { 
>>   f2(); 
>>   if (f1()) 
>>     return; 
>> } 
>>  
>> There are two "textual" calls to f1(), the one in f3() is checked and the 
>> one in f2() is unchecked. But if we do the check in WPA Mode(IMO, as viewed 
>> from "execute"), there are 3 calls to f1(), one of them is checked and the 
>> other two are not. 
>>  
>> We should do it in the "textual" way, right?
>> 
>> What you say about this?
>> 
>> 2011/4/12 Ted Kremenek <[email protected]>
>> Lei,
>> 
>> Just to make this a little clearer, suppose we had:
>> 
>> unsigned factorial(unsigned n) {
>>  if (n == 0)
>>   return 1;
>>  return n * factorial(n - 1);
>> }
>> 
>> Currently when we analyze this function, we create only one LocationContext. 
>>  However, one way to do IPA is to due inlining of function calls.  In this 
>> case, we might create a second LocationContext during the first recursive 
>> call to factorial.  If we wanted to further inline, we might create a second 
>> LocationContext, etc., each one capturing the "context" of the recursive 
>> call and simulating an abstract stack frame (obviously bounding the amount 
>> of LocationContexts we create).
>> 
>> On Apr 11, 2011, at 10:53 AM, Ted Kremenek wrote:
>> 
>> > I thought about this some more, and using the LocationContext might be a 
>> > reasonable place to put checker-specific data that represents "global" 
>> > information (i.e not specific to a GRState) but also limited in scope 
>> > (i.e. the data is limited to the scope of analyzing a given *call* to a 
>> > function).  Once we support analysis inlining, it will be possible for 
>> > multiple LocationContext objects to be around at the once for the same 
>> > function.  Are the statistics you are interested in specific to a given 
>> > LocationContext?
>> >
>> > On Apr 11, 2011, at 10:19 AM, Ted Kremenek <[email protected]> wrote:
>> >
>> >> Hi Lei,
>> >>
>> >> LocationContext should not contain any checker-specific data.  It is only 
>> >> intended to model context-sensitivity (i.e., it simulates an abstract 
>> >> stack frame, or "location" in an abstract call chain).
>> >>
>> >> What are you trying to do?
>> >>
>> >> Ted
>> >>
>> >> On Apr 10, 2011, at 8:26 PM, 章磊 wrote:
>> >>
>> >>> Hi Clang,
>> >>>
>> >>> This patch add a DenseMap to record the CheckedReturn count for a 
>> >>> functiondecl(fielddecl, vardecl as function pointer) in LocationContext.
>> >>>
>> >>> ImmutableMap seems not alright here, is DenseMap ok?
>> >>>
>> >>> And how to make it checker-specific?
>> >>>
>> >>> This patch is preparation for statistical UncheckedRenturn checker.
>> >>> I'll appreciate it if there are any advice about this patch.
>> >>>
>> >>> --
>> >>> Best regards!
>> >>>
>> >>> Lei Zhang
>> >>> <CRResultMap.patch>_______________________________________________
>> >>> cfe-commits mailing list
>> >>> [email protected]
>> >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> >>
>> >>
>> >> _______________________________________________
>> >> cfe-commits mailing list
>> >> [email protected]
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>> 
>> 
>> 
>> 
>> -- 
>> Best regards!
>> 
>> Lei Zhang
> 

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to