LegalizeAdulthood marked 5 inline comments as done.
================
Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:96
@@ +95,3 @@
+ std::string Delimiter;
+ for (int Counter = 0; containsDelimiter(Bytes, Delimiter); ++Counter) {
+ Delimiter = (Counter == 0) ? "lit" : "lit" + std::to_string(Counter);
----------------
alexfh wrote:
> LegalizeAdulthood wrote:
> > alexfh wrote:
> > > Please address my comment above that relates to this code. Specifically,
> > > I find this generic algorithm unnecessarily inefficient and too rigid,
> > > i.e. one can't configure a custom set of delimiters that don't follow the
> > > <prefix><number> pattern. I suggest using a list of pre-concatenated
> > > pairs of strings (like `R"lit(` and `)lit"`).
> > Raw strings are new and not many people are using them because the don't
> > have a good way to automatically convert disgusting quoted strings to raw
> > strings. So I don't think it is reasonable to draw conclusions by looking
> > in existing code bases for raw strings.
> >
> > We're having the same conversation we've had before. I'm trying to do a
> > basic check and get things working properly and the review comments are
> > tending towards a desire for some kind of perfection.
> >
> > I don't see why we have to make the perfect the enemy of the good. Nothing
> > you are suggesting **must** be present now in order for this check to
> > function properly and reasonably.
> We're looking at this from different perspectives. From my point of view,
> preventing a potentially wasteful code in ClangTidy checks before it's even
> committed is much easier than tracking down performance issues when tens of
> checks run on multiple machines analyzing millions lines of code. So I'm
> asking to avoid writing code using string allocations and concatenations when
> there are good alternatives. Apart from possible performance issues, in this
> case the generic solution you suggest is targets extremely rare cases, while
> most real-world situations can be handled with a fixed set of delimiters
> (possibly, configured by the user, while I expect the preferences for the
> choice of delimiters to be very different in different teams).
I believe we agree on the following:
- In order to turn a non-raw string literal into a raw string literal I have
to surround it with `R"(` and `)"`.
- If the existing string literal contains `)"` already, then surrounding the
literal with the above will result in a syntax error. Therefore, every
potential raw string literal will have to be searched at least once for this
non-delimited form of raw string literal.
- `clang-tidy` checks should never introduce syntax errors.
Therefore, I can either not transform the string literal if it contains `)"`,
or I can come up with some delimited form of raw string literal where the
delimiter does not appear in the string. In order to not transform the
literal, I first have to search for `)"` in order to know that it should not be
transformed. Therefore, the search for `)"` must be performed, no matter what
algorithm is used to determine a delimited or non-delimited raw string literal.
Where we seem to disagree is what algorithm should be used to determine the
delimiter when a delimited raw string literal is required.
My implementation is the minimum performance impact for the typical case where
the string does not contain `)"`.
Now let's talk about the cases where searching the string repeatedly for a
delimiter would be expensive.
First, the string literal will have to contain `)"`.
Second, the string literal would have to be lengthy for the delimiter searching
to be expensive. Most of the time lengthy string literals are when someone has
transformed a text file into a giant string literal. Such text files include
newlines and in the latest version of the code, I've decided to skip any string
literal containing a newline. It simply results in too many strings being
converted to raw string literals and obfuscates the newlines. The one case
where the embedded newlines makes sense is the case of the text file converted
into a string literal.
Repeated searching of the string for the literal delimiter could be avoided by
changing the routine to search for the delimiter. If present, it could examine
the characters following the literal to make the literal more unique and then
continue searching from there to look for more occurrences of the extended
delimiter. It would proceed incrementally through the rest of the string to
create a unique delimiter in a single pass through the string. I think the
resulting literals would be even more non-sensical than the current
implementation, but it would result in a delimiter obtained by a single pass
through the string. It's a significantly more complicated implementation and
results in an even weirder delimiter to handle a very edge case.
If I follow your suggested approach of using a fixed number of string
delimiters, I still have to check the strings for the presence of those
delimiters surrounded by `)` and `"`, so I am not saving anything in terms of
performance. In the worst case i still have to perform multiple searches
through the string and then fail the entire transformation if all of them are
present in the string. Failing the transformation significantly complicates
the entire check because more state has to be passed around up and down the
call chain of the functions evaluating the string literal so that we can abort
the whole replacement.
I ran this check on the entire LLVM code base and not once did the check
produce a raw string literal containing a delimiter. Therefore the performance
question of how to choose a unique delimiter we are discussing is moot. It
never entered the picture.
I'm happy to run this check on other large code bases to determine the
likelihood of `)"` appearing in a string and the need for determining a unique
delimiter.
================
Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:116
@@ +115,3 @@
+ if (containsEscapedCharacters(Result, Literal))
+ replaceWithRawStringLiteral(Result, Literal);
+ }
----------------
alexfh wrote:
> aaron.ballman wrote:
> > A configuration option makes sense to me, but I would be fine if it was a
> > follow-on patch. I think that someone running this check is likely fine
> > with the level of noise it will produce (which should be moderately low).
> Agreed. This can be done in a follow up.
In my current version of the code, I've decided to skip strings containing
newlines. There are too many heuristics involved and subjective evaluations
about readability when newlines are present in raw string literals.
The one case where I see it to be useful is when you process a text file into a
C++ source file as a giant string literal. However, I think the better
solution there is to have the generating utility use raw string literals
directly instead of running `clang-tidy` on it afterwards.
================
Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:1
@@ +1,2 @@
+// RUN: %check_clang_tidy %s modernize-raw-string-literal %t
+
----------------
alexfh wrote:
> As usual, please add tests ensuring that replacements are done correctly in
> templates with multiple instantiations and in macro arguments (but not in
> macro bodies), e.g.
>
> template<typename T>
> void f() {
> (void)"asdf\\\\\\";
> // CHECK-FIXES: void f() {
> // CHECK-FIXES-NEXT: {{^ }}(void)R"(asdf\\\)";{{$}}
> // CHECK-FIXES-NEXT: {{^}$}}
> }
> void g() {
> f<int>();
> f<double>();
> }
>
> #define M(x) x x x "a\\b\\"
> void h() {
> const char *const s = M("c\\d\\");
> // CHECK-FIXES: {{^}}#define M(x) x x x "a\\b\\"{{$}}
> // CHECK-FIXES: {{^ }}const char *const s = M(R"(c\d\)");
> }
>
>
I thought I could get it to handle macro arguments, but when I applied the
check, it ended up inlining the macro instead, so I don't know exactly how to
do that. In the meantime, I've simply skipped any location that came from
macro body or argument expansion. This could be improved in the future, but
prevents unwanted changes in the source file now.
http://reviews.llvm.org/D16529
_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits