Re: [PATCH] D16529: [clang-tidy] Add modernize-raw-string-literal check

Richard via cfe-commits Sun, 07 Feb 2016 16:52:35 -0800

LegalizeAdulthood marked 5 inline comments as done.

================
Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:96
@@ +95,3 @@
+  std::string Delimiter;
+  for (int Counter = 0; containsDelimiter(Bytes, Delimiter); ++Counter) {
+    Delimiter = (Counter == 0) ? "lit" : "lit" + std::to_string(Counter);
----------------
alexfh wrote:
> LegalizeAdulthood wrote:
> > alexfh wrote:
> > > Please address my comment above that relates to this code. Specifically, 
> > > I find this generic algorithm unnecessarily inefficient and too rigid, 
> > > i.e. one can't configure a custom set of delimiters that don't follow the 
> > > <prefix><number> pattern. I suggest using a list of pre-concatenated 
> > > pairs of strings (like `R"lit(` and `)lit"`).
> > Raw strings are new and not many people are using them because the don't 
> > have a good way to automatically convert disgusting quoted strings to raw 
> > strings.  So I don't think it is reasonable to draw conclusions by looking 
> > in existing code bases for raw strings.
> > 
> > We're having the same conversation we've had before.  I'm trying to do a 
> > basic check and get things working properly and the review comments are 
> > tending towards a desire for some kind of perfection.
> > 
> > I don't see why we have to make the perfect the enemy of the good.  Nothing 
> > you are suggesting **must** be present now in order for this check to 
> > function properly and reasonably.
> We're looking at this from different perspectives. From my point of view, 
> preventing a potentially wasteful code in ClangTidy checks before it's even 
> committed is much easier than tracking down performance issues when tens of 
> checks run on multiple machines analyzing millions lines of code. So I'm 
> asking to avoid writing code using string allocations and concatenations when 
> there are good alternatives. Apart from possible performance issues, in this 
> case the generic solution you suggest is targets extremely rare cases, while 
> most real-world situations can be handled with a fixed set of delimiters 
> (possibly, configured by the user, while I expect the preferences for the 
> choice of delimiters to be very different in different teams).
I believe we agree on the following:

  - In order to turn a non-raw string literal into a raw string literal I have 
to surround it with `R"(` and `)"`.
  - If the existing string literal contains `)"` already, then surrounding the 
literal with the above will result in a syntax error.  Therefore, every 
potential raw string literal will have to be searched at least once for this 
non-delimited form of raw string literal.
  - `clang-tidy` checks should never introduce syntax errors.

Therefore, I can either not transform the string literal if it contains `)"`, 
or I can come up with some delimited form of raw string literal where the 
delimiter does not appear in the string.  In order to not transform the 
literal, I first have to search for `)"` in order to know that it should not be 
transformed.  Therefore, the search for `)"` must be performed, no matter what 
algorithm is used to determine a delimited or non-delimited raw string literal.

Where we seem to disagree is what algorithm should be used to determine the 
delimiter when a delimited raw string literal is required.

My implementation is the minimum performance impact for the typical case where 
the string does not contain `)"`.

Now let's talk about the cases where searching the string repeatedly for a 
delimiter would be expensive.

First, the string literal will have to contain `)"`.

Second, the string literal would have to be lengthy for the delimiter searching 
to be expensive.  Most of the time lengthy string literals are when someone has 
transformed a text file into a giant string literal.  Such text files include 
newlines and in the latest version of the code, I've decided to skip any string 
literal containing a newline.  It simply results in too many strings being 
converted to raw string literals and obfuscates the newlines.  The one case 
where the embedded newlines makes sense is the case of the text file converted 
into a string literal.

Repeated searching of the string for the literal delimiter could be avoided by 
changing the routine to search for the delimiter.  If present, it could examine 
the characters following the literal to make the literal more unique and then 
continue searching from there to look for more occurrences of the extended 
delimiter.  It would proceed incrementally through the rest of the string to 
create a unique delimiter in a single pass through the string.  I think the 
resulting literals would be even more non-sensical than the current 
implementation, but it would result in a delimiter obtained by a single pass 
through the string.  It's a significantly more complicated implementation and 
results in an even weirder delimiter to handle a very edge case.

If I follow your suggested approach of using a fixed number of string 
delimiters, I still have to check the strings for the presence of those 
delimiters surrounded by `)` and `"`, so I am not saving anything in terms of 
performance.  In the worst case i still have to perform multiple searches 
through the string and then fail the entire transformation if all of them are 
present in the string.  Failing the transformation significantly complicates 
the entire check because more state has to be passed around up and down the 
call chain of the functions evaluating the string literal so that we can abort 
the whole replacement.

I ran this check on the entire LLVM code base and not once did the check 
produce a raw string literal containing a delimiter.  Therefore the performance 
question of how to choose a unique delimiter we are discussing is moot.  It 
never entered the picture.

I'm happy to run this check on other large code bases to determine the 
likelihood of `)"` appearing in a string and the need for determining a unique 
delimiter.

================
Comment at: clang-tidy/modernize/RawStringLiteralCheck.cpp:116
@@ +115,3 @@
+    if (containsEscapedCharacters(Result, Literal))
+      replaceWithRawStringLiteral(Result, Literal);
+  }
----------------
alexfh wrote:
> aaron.ballman wrote:
> > A configuration option makes sense to me, but I would be fine if it was a 
> > follow-on patch. I think that someone running this check is likely fine 
> > with the level of noise it will produce (which should be moderately low).
> Agreed. This can be done in a follow up.
In my current version of the code, I've decided to skip strings containing 
newlines.  There are too many heuristics involved and subjective evaluations 
about readability when newlines are present in raw string literals.

The one case where I see it to be useful is when you process a text file into a 
C++ source file as a giant string literal.  However, I think the better 
solution there is to have the generating utility use raw string literals 
directly instead of running `clang-tidy` on it afterwards.

================
Comment at: test/clang-tidy/modernize-raw-string-literal.cpp:1
@@ +1,2 @@
+// RUN: %check_clang_tidy %s modernize-raw-string-literal %t
+
----------------
alexfh wrote:
> As usual, please add tests ensuring that replacements are done correctly in 
> templates with multiple instantiations and in macro arguments (but not in 
> macro bodies), e.g.
> 
>   template<typename T>
>   void f() {
>     (void)"asdf\\\\\\";
>     // CHECK-FIXES: void f() {
>     // CHECK-FIXES-NEXT: {{^  }}(void)R"(asdf\\\)";{{$}}
>     // CHECK-FIXES-NEXT: {{^}$}}
>   }
>   void g() {
>     f<int>();
>     f<double>();
>   }
> 
>   #define M(x) x x x "a\\b\\"
>   void h() {
>     const char *const s = M("c\\d\\");
>     // CHECK-FIXES: {{^}}#define M(x) x x x "a\\b\\"{{$}}
>     // CHECK-FIXES: {{^  }}const char *const s = M(R"(c\d\)");
>   }
> 
> 
I thought I could get it to handle macro arguments, but when I applied the 
check, it ended up inlining the macro instead, so I don't know exactly how to 
do that.  In the meantime, I've simply skipped any location that came from 
macro body or argument expansion.  This could be improved in the future, but 
prevents unwanted changes in the source file now.

http://reviews.llvm.org/D16529

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Re: [PATCH] D16529: [clang-tidy] Add modernize-raw-string-literal check

Reply via email to