babati added a comment.

> please consider stripping the line from comments and normalize white spaces 
> before hashing and calculating column position.

>  This will make the hash more robust, as a re-identation or adding comments 
> to a line will not spoil the hashes.


I changed the patch and the normalized line is used instead of the raw content 
of line. So, the whitespaces and the comments are removed from the line before 
the hashing.

> By redundant, I mean that this information is already encoded in the report; 
> even if it's not part of the issue id. I can see this argument go either way. 
> However, if we do decide to include the filename, we would need to change 
> clang/utils/analyzer/CmpRuns.py and the current issue_hash so that it's all 
> consistent.


The hash should be a unique identifier of a concreate defect. If a hash 
identifies multiple deffects in different files at the same time, that must be 
considered as a fault (from the user perspective). 
If the user suppresses a fault then he discovers later that with that 
suppression 2 or more other bugs „disappeared”.

The filename should be part of the hash because there will be hash clash if for 
example:
-there are multiple main() functions in the codebase with the same signature 
(this is likely)
-and there is a same line with a defect in each of them
then the same bug hash would be generated.

Including the filename in the hash would decrease the likelyhood of such cases.


http://reviews.llvm.org/D10305




_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to