babati added a comment. > please consider stripping the line from comments and normalize white spaces > before hashing and calculating column position.
> This will make the hash more robust, as a re-identation or adding comments > to a line will not spoil the hashes. I changed the patch and the normalized line is used instead of the raw content of line. So, the whitespaces and the comments are removed from the line before the hashing. > By redundant, I mean that this information is already encoded in the report; > even if it's not part of the issue id. I can see this argument go either way. > However, if we do decide to include the filename, we would need to change > clang/utils/analyzer/CmpRuns.py and the current issue_hash so that it's all > consistent. The hash should be a unique identifier of a concreate defect. If a hash identifies multiple deffects in different files at the same time, that must be considered as a fault (from the user perspective). If the user suppresses a fault then he discovers later that with that suppression 2 or more other bugs „disappeared”. The filename should be part of the hash because there will be hash clash if for example: -there are multiple main() functions in the codebase with the same signature (this is likely) -and there is a same line with a defect in each of them then the same bug hash would be generated. Including the filename in the hash would decrease the likelyhood of such cases. http://reviews.llvm.org/D10305 _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
