On Fri, 25 May 2018, Martin Sebor wrote:

Why couldn't nonzero_chars be constant in that case?

void f(const char*s){
  s[0]='a';
  // I know that strlen(s) is at least 1 here
}

I was responding specifically to your question about the strlen()
CSE.  Above there is no call to strlen().  What I understood you
were asking about is something like:

 int f (char *s)
 {
   int n0 = strlen (s);
   s[7]='a';
   int n1 = strlen (s);   // can this be replaced by n0?
   return n0 == n1;
 }

Yes, but I realized afterwards that calling strlen causes strinfo to be updated, replacing a non-tight nonzero_chars with the lhs of the call. So I need a different use of strinfo to demonstrate the issue.

The example may be completely off, but just to try and describe what I
am wondering about:

int f(const char*s,char*restrict t,char c){

The const shouldn't have been there.

// we know nothing about s for now.
s[0]='a'; // nonzero_chars should now be 1
strcpy(t,s);
s[2]=c; // this is after nonzero_chars
strcpy(t,s);
}

If we assume that writing to s[2] is after the end of the original
string s, we can remove one of the strcpy. But that's not a valid
transformation in general.

Above, s[0] creates strinfo for s with nonzero_chars == 1 and
full_string_p == false.  Then, after the first call to strcpy,
that same strinfo is invalidated/removed because the length of
s is unknown.

strcpy writes to t, which cannot alias s, so I don't see why this should invalidate the strinfo for s.

The next assignment to s[2] doesn't do anything
(in the strlen pass) because the right operand is not constant).

Well, it should at least invalidate stuff.

In the second strcpy there is no strinfo for s.

This was a bad example again, because we don't have the relevant optimization anyway (2 identical calls to strcpy(t,s) in a row are not simplified to a single one).

I originally wanted to see if we could CSE calls to strlen(s) before and
after the write, but the lhs of the first call to strlen would replace
nonzero_chars. And I don't know so well what we use strinfo for and thus
how dangerous it is not to invalidate it.

The pass does track non-constant strlen() results so I think
the optimization you describe should be possible.  For instance,
here it should be safe to eliminate the second strlen:

 int f (char *s)
 {
   s[0] = 'a';
   s[1] = 'b';
   int n0 = __builtin_strlen (s);

   s[0] = 'x';
   int n1 = __builtin_strlen (s);

   return n0 == n1;
 }

It isn't optimized today because (as I explained above) the first
strlen() replaces the constant nonzero_chars == 2 with the non-
constant result of strlen(s).

Let me try a completely different example

char*f(){
  char*p=__builtin_calloc(12,1);
  __builtin_memset(p+5,1,2);
  __builtin_memset(p,0,12);
  return p;
}

With your patch from 12/1 and -fno-tree-dse, this is miscompiled and misses the final memset.


I also still fear it may be possible to come up with a more strlen-like example along the lines of

s[0]='a'; // nonzero_chars=1
- do something with s that relies on its length and doesn't invalidate
s[2]='a'; // patch prevents it from invalidating
- do something else with s

where the fact that s has a different length in lines 2 and 4 is hidden by the patch. But maybe all the transformations are carefully written to avoid this problem...

--
Marc Glisse

Reply via email to