On 22 November 2016 at 20:18, Richard Biener <rguent...@suse.de> wrote: > On Tue, 22 Nov 2016, Prathamesh Kulkarni wrote: > >> On 21 November 2016 at 15:10, Richard Biener <rguent...@suse.de> wrote: >> > On Sun, 20 Nov 2016, Prathamesh Kulkarni wrote: >> > >> >> Hi, >> >> As suggested by Martin in PR78153 strlen's return value cannot exceed >> >> PTRDIFF_MAX. >> >> So I set it's range to [0, PTRDIFF_MAX - 1] in extract_range_basic() >> >> in the attached patch. >> >> >> >> However it regressed strlenopt-3.c: >> >> >> >> Consider fn1() from strlenopt-3.c: >> >> >> >> __attribute__((noinline, noclone)) size_t >> >> fn1 (char *p, char *q) >> >> { >> >> size_t s = strlen (q); >> >> strcpy (p, q); >> >> return s - strlen (p); >> >> } >> >> >> >> The optimized dump shows the following: >> >> >> >> __attribute__((noclone, noinline)) >> >> fn1 (char * p, char * q) >> >> { >> >> size_t s; >> >> size_t _7; >> >> long unsigned int _9; >> >> >> >> <bb 2>: >> >> s_4 = strlen (q_3(D)); >> >> _9 = s_4 + 1; >> >> __builtin_memcpy (p_5(D), q_3(D), _9); >> >> _7 = 0; >> >> return _7; >> >> >> >> } >> >> >> >> which introduces the regression, because the test expects "return 0;" in >> >> fn1(). >> >> >> >> The issue seems to be in vrp2: >> >> >> >> Before the patch: >> >> Visiting statement: >> >> s_4 = strlen (q_3(D)); >> >> Found new range for s_4: VARYING >> >> >> >> Visiting statement: >> >> _1 = s_4; >> >> Found new range for _1: [s_4, s_4] >> >> marking stmt to be not simulated again >> >> >> >> Visiting statement: >> >> _7 = s_4 - _1; >> >> Applying pattern match.pd:111, gimple-match.c:27997 >> >> Match-and-simplified s_4 - _1 to 0 >> >> Intersecting >> >> [0, 0] >> >> and >> >> [0, +INF] >> >> to >> >> [0, 0] >> >> Found new range for _7: [0, 0] >> >> >> >> __attribute__((noclone, noinline)) >> >> fn1 (char * p, char * q) >> >> { >> >> size_t s; >> >> long unsigned int _1; >> >> long unsigned int _9; >> >> >> >> <bb 2>: >> >> s_4 = strlen (q_3(D)); >> >> _9 = s_4 + 1; >> >> __builtin_memcpy (p_5(D), q_3(D), _9); >> >> _1 = s_4; >> >> return 0; >> >> >> >> } >> >> >> >> >> >> After the patch: >> >> Visiting statement: >> >> s_4 = strlen (q_3(D)); >> >> Intersecting >> >> [0, 9223372036854775806] >> >> and >> >> [0, 9223372036854775806] >> >> to >> >> [0, 9223372036854775806] >> >> Found new range for s_4: [0, 9223372036854775806] >> >> marking stmt to be not simulated again >> >> >> >> Visiting statement: >> >> _1 = s_4; >> >> Intersecting >> >> [0, 9223372036854775806] EQUIVALENCES: { s_4 } (1 elements) >> >> and >> >> [0, 9223372036854775806] >> >> to >> >> [0, 9223372036854775806] EQUIVALENCES: { s_4 } (1 elements) >> >> Found new range for _1: [0, 9223372036854775806] >> >> marking stmt to be not simulated again >> >> >> >> Visiting statement: >> >> _7 = s_4 - _1; >> >> Intersecting >> >> ~[9223372036854775807, 9223372036854775809] >> >> and >> >> ~[9223372036854775807, 9223372036854775809] >> >> to >> >> ~[9223372036854775807, 9223372036854775809] >> >> Found new range for _7: ~[9223372036854775807, 9223372036854775809] >> >> marking stmt to be not simulated again >> >> >> >> __attribute__((noclone, noinline)) >> >> fn1 (char * p, char * q) >> >> { >> >> size_t s; >> >> long unsigned int _1; >> >> size_t _7; >> >> long unsigned int _9; >> >> >> >> <bb 2>: >> >> s_4 = strlen (q_3(D)); >> >> _9 = s_4 + 1; >> >> __builtin_memcpy (p_5(D), q_3(D), _9); >> >> _1 = s_4; >> >> _7 = s_4 - _1; >> >> return _7; >> >> >> >> } >> >> >> >> Then forwprop4 turns >> >> _1 = s_4 >> >> _7 = s_4 - _1 >> >> into >> >> _7 = 0 >> >> >> >> and we end up with: >> >> _7 = 0 >> >> return _7 >> >> in optimized dump. >> >> >> >> Running ccp again after forwprop4 trivially solves the issue, however >> >> I am not sure if we want to run ccp again ? >> >> >> >> The issue is probably with extract_range_from_ssa_name(): >> >> For _1 = s_4 >> >> >> >> Before patch: >> >> VR for s_4 is set to varying. >> >> So VR for _1 is set to [s_4, s_4] by extract_range_from_ssa_name. >> >> Since VR for _1 is [s_4, s_4] it implicitly implies that _1 is equal to >> >> s_4, >> >> and vrp is able to transform _7 = s_4 - _1 to _7 = 0 (by using >> >> match.pd pattern x - x -> 0). >> >> >> >> After patch: >> >> VR for s_4 is set to [0, PTRDIFF_MAX - 1] >> >> And correspondingly VR for _1 is set to [0, PTRDIFF_MAX - 1] >> >> so IIUC, we then lose the information that _1 is equal to s_4, >> > >> > We don't lose it, it's in its set of equivalencies. >> Ah, I missed that, thanks. For some reason I had mis-conception that >> equivalences stores >> variables which have same value-ranges but are not necessarily equal. >> > >> >> and vrp doesn't transform _7 = s_4 - _1 to _7 = 0. >> >> forwprop4 does that because it sees that s_4 and _1 are equivalent. >> >> Does this sound correct ? >> > >> > Yes. So the issue is really that vrp_visit_assignment_or_call calls >> > gimple_fold_stmt_to_constant_1 with vrp_valueize[_1] which when >> > we do not have a singleton VR_RANGE does not fall back to looking >> > at equivalences (there's not a good cheap way to do that currently because >> > VRP doesn't keep a proper copy lattice but simply IORs equivalences >> > from all equivalences). In theory simply using the first set bit >> > might work. Thus sth like >> > >> > @@ -7057,6 +7030,12 @@ vrp_valueize (tree name) >> > || is_gimple_min_invariant (vr->min)) >> > && vrp_operand_equal_p (vr->min, vr->max)) >> > return vr->min; >> > + else if (vr->equiv && ! bitmap_empty_p (vr->equiv)) >> > + { >> > + unsigned num = bitmap_first_set_bit (vr->equiv); >> > + if (num < SSA_NAME_VERSION (name)) >> > + return ssa_name (num); >> > + } >> > } >> > return name; >> > } >> > >> > might work with the idea of simply doing canonicalization to one of >> > the equivalences. But as we don't allow copies in the SSA def stmt >> > (via vrp_valueize_1) I'm not sure that's good enough canonicalization. >> IIUC, we record the equivalent variables in vr->equiv >> but do not canonicalize to one of the equivalence like "copy-of value" >> in copyprop ? >> Using first set bit unfortunately doesn't help for the above case. >> >> Sorry if this sounds silly, should we just run copyprop/ccp once again >> after vrp2 to ensure that there are no copies left ? > > why? forwprop also does copy and constant propagation. For the > regression simply adjust the pass dump you scan. Well, with the patch the redundant store to and load from _7 still remains in optimized dump for fn1() in strlenopt-3.c:
__attribute__((noclone, noinline)) fn1 (char * p, char * q) { size_t s; size_t _7; long unsigned int _9; <bb 2>: s_4 = strlen (q_3(D)); _9 = s_4 + 1; __builtin_memcpy (p_5(D), q_3(D), _9); _7 = 0; return _7; } Running ccp again after forwprop4 would get rid of _7. Without the patch we have return _0; in optimized dump. Thanks, Prathamesh > >> However that might be quite expensive ? >> Or make vrp track copies like copyprop using a separate copy-of lattice ? > > Ideally we'd unify the three SSA propagation passes into one. We'd > have to have separate lattices for copy&constant and range&known-bits. > > Richard. > >> Thanks, >> Prathamesh >> > >> > Richard. >> >> > > -- > Richard Biener <rguent...@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB > 21284 (AG Nuernberg)