On Thu, Aug 4, 2016 at 10:45 AM, Alexander Kornienko <ale...@google.com> wrote: > alexfh added inline comments. > > ================ > Comment at: clang-tidy/misc/ArgumentCommentCheck.cpp:124 > @@ +123,3 @@ > + InDecl = InDecl.trim('_'); > + return InComment.compare_lower(InDecl) == 0; > +} > ---------------- > aaron.ballman wrote: >> alexfh wrote: >> > aaron.ballman wrote: >> > > Correct, which means this won't behave properly in some locales with >> > > UTF-8 identifiers. Consider Turkish, where İ (U+0130 “Latin Capital >> > > Letter I With Dot Above”) is the uppercase form of ı (U+0131 “Latin >> > > Small Letter Dotless I”). If the comment contains one version while the >> > > identifier contains the other, the comparison will currently fail, while >> > > a locale-aware comparison would succeed. You run into similar things >> > > with SS vs ß in German as well, where the uppercase form is two >> > > characters while the lowercase is only a single character. >> > Interesting, though it looks like there's now an official capital ẞ >> > https://en.wikipedia.org/wiki/Capital_%E1%BA%9E (which is not frequently >> > needed anyway, I guess). >> > >> > At the end of the day, what we get is that the non-strict mode is >> > currently somewhat stricter for non-ascii characters. Similar will happen >> > with all other parts in LLVM that rely on `StringRef::compare_lower`. I >> > don't think we need a separate test for this _here_, since it's a problem >> > on a completely different level. And I guess the use non-ascii identifiers >> > in C++ will cause much more serious problems than a slightly stricter >> > clang-tidy warning ;] >> We may just have different testing philosophies -- I would advocate for a >> test because we know of a use case that's broken with this particular use of >> `compare_lower`. Not all uses of `compare_lower` are problematic, after all. >> However, I'm not going to fight for that test case too hard because this is >> hopefully an edge case that is low-impact. A FIXME would also suffice. > I'm reluctant to add a case, since the cost of making it work and maintaining > on both linux and windows is higher than the value of it, IMO (it's my take > out from writing clang-format's limited support for Unicode).
I am totally okay with that line of reasoning. I was mostly looking for some marker that says "if this acts funky, it's expected, not accidental." The FIXME scratches that itch for me, so thank you! ~Aaron > > > https://reviews.llvm.org/D23135 > > > _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits