On Tue, Jul 25, 2017 at 12:48 PM, Richard Biener <richard.guent...@gmail.com> wrote: > On Mon, Jul 10, 2017 at 10:24 AM, Bin.Cheng <amker.ch...@gmail.com> wrote: >> On Tue, Jun 27, 2017 at 11:49 AM, Bin Cheng <bin.ch...@arm.com> wrote: >>> Hi, >>> This is a followup patch better handling below case: >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> a[i+2] = 2; >>> } >>> Instead of generating root variables by loading from memory and propagating >>> with PHI >>> nodes, like: >>> t0 = a[0]; >>> t1 = a[1]; >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> t2 = 2; >>> t0 = t1; >>> t1 = t2; >>> } >>> a[n] = t0; >>> a[n+1] = t1; >>> We can simply store loop invariant values after loop body if we know loop >>> iterates more >>> than chain->length times, like: >>> for (i = 0; i < n; i++) >>> { >>> a[i] = 1; >>> } >>> a[n] = 2; >>> a[n+1] = 2; >>> >>> Bootstrap(O2/O3) in patch series on x86_64 and AArch64. Is it OK? >> Update patch wrto changes in previous patch. >> Bootstrap and test on x86_64 and AArch64. Is it OK? > > + if (TREE_CODE (val) == INTEGER_CST || TREE_CODE (val) == REAL_CST) > + continue; > > Please use CONSTANT_CLASS_P (val) instead. I suppose VECTOR_CST or > FIXED_CST would be ok as well for example. > > Ok with that change. Did we eventually optimize this in followup > passes previously? Probably not? Given below test:
int a[10000], b[10000], c[10000]; int f(void) { int i, n = 100; int t0 = a[0]; int t1 = a[1]; for (i = 0; i < n; i++) { a[i] = 1; int t2 = 2; t0 = t1; t1 = t2; } a[n] = t0; a[n+1] = t1; return 0; } The optimized dump is as: <bb 2> [1.00%] [count: INV]: t1_8 = a[1]; ivtmp.9_17 = (unsigned long) &a; _16 = ivtmp.9_17 + 400; <bb 3> [99.00%] [count: INV]: # t1_20 = PHI <2(3), t1_8(2)> # ivtmp.9_2 = PHI <ivtmp.9_1(3), ivtmp.9_17(2)> _15 = (void *) ivtmp.9_2; MEM[base: _15, offset: 0B] = 1; ivtmp.9_1 = ivtmp.9_2 + 4; if (ivtmp.9_1 != _16) goto <bb 3>; [98.99%] [count: INV] else goto <bb 4>; [1.01%] [count: INV] <bb 4> [1.00%] [count: INV]: a[100] = t1_20; a[101] = 2; return 0; We now eliminate one phi and leave another behind. It is vrp1/dce2 when the phi is eliminated. Thanks, bin