2014/1/14 Richard Biener <richard.guent...@gmail.com>: > On Tue, Jan 14, 2014 at 1:47 PM, Ilya Enkovich <enkovich....@gmail.com> wrote: >> 2014/1/14 Richard Biener <richard.guent...@gmail.com>: >>> On Tue, Jan 14, 2014 at 10:15 AM, Ilya Enkovich <enkovich....@gmail.com> >>> wrote: >>>> Hi, >>>> >>>> I've been working for some time on the prototype of the Pointer Bounds >>>> Checker which uses function clones for instrumentation >>>> (http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03327.html). After >>>> several experiments with this approach I want to share my results and >>>> ask for some feedback to make a decision about the future steps. >>>> >>>> Firstly I want to remind the reasons for digging in this direction. In >>>> the original approach bounds of call arguments and input parameters >>>> are associated with arguments via special built-in calls. It creates >>>> implicit data flow compiler is not aware about which confuses some >>>> optimizations resulting in miss-optimization and breaks bounds data >>>> flow. Thus optimizations have to be fixed to get better pointers >>>> protection. >>>> >>>> Clones approach does not use special built-in function calls to >>>> associate bounds with call arguments and input parameters. Each >>>> function which should be instrumented gets an additional version and >>>> only this special version will be instrumented.This new version gets >>>> additional bound arguments to express input bounds. When function call >>>> is instrumented, it is redirected to instrumented version and all >>>> bounds are passed as explicit call arguments. Thus we have explicit >>>> pointer bounds flow similar to regular function parameters. It should >>>> allow to avoid changes in optimization, avoid miss-optimizations, >>>> allow existing IPA optimizations to work with bound args (e.g. >>>> propagate constant bounds value and remove checks in called function). >>>> >>>> I made a prototype implementation of this approach in the following way: >>>> >>>> - Add new IPA pass before early local passes to produce versions for >>>> all functions to be instrumented. >>>> - Put instrumentation pass after SSA pass. >>>> - Add new pass after IPA passes to remove bodies of functions which >>>> have instrumented versions. Function nodes may still be required for >>>> calls in not instrumented code. But we do not emit this code and >>>> therefore function bodies are not needed. >>>> >>>> Positive changes are: >>>> >>>> - IPA optimizations are not confused by bound parameters >>>> - bounds are now more like regular arguments; it makes their >>>> processing in expand easier >>>> - functions with bounds not attached to any pointer are allowed >>> >>> First of all thanks for trying to work in this direction. Comments on the >>> issues you encountered below (also CCed Honza as he should be more >>> familiar with reachability and clone issues). >>> >>>> On simple codes this approach worked well but on a bigger tests some >>>> issues were revealed. >>>> >>>> 1. Nodes reachability. Instrumented version is actually always >>>> reachable when original function is reachable because it is always >>>> emitted instead of the original. Thus I had to fix reachability >>>> analysis to achieve it. Another similar problem is check whether node >>>> can be removed after inline when inlining instrumented function. Not >>>> hard to fix but probably other similar problems exist. >>> >>> I suppose you do not update the callgraph / the call stmts when >>> creating the clones? Btw, is it desirable to inline the uninstrumented >>> function and then instrument the result (thus run cloning and >>> instrumentation after early inlining?)? Especially handling always_inlines >>> before cloning/isntrumentation looks very sensible. >> >> Right. Created clones have the same code as the original function and >> therefore same cgraph edges. I suppose instrumentation after early >> inlining is OK and may be preferred because inline shouldn't lead to >> any losses of bounds information. I tried variant when instrumentation >> works right after early inlining but with cloning still before early >> local passes. In general it looked OK. >> >>> >>>> 2. Function processing order. Function processing order is determined >>>> before early local passes. But during function instrumentation call >>>> graph is modified significantly and used topological order becomes >>>> outdated. That causes some troubles. E.g. function marked as 'always >>>> inline' cannot be inlined because it is not in SSA form yet. Surely >>>> inlining problem may be solved by just putting instrumentation after >>>> early inline, but similar problem may exist in other passes too. To >>>> resolve this problem I tried to split early local passes into three >>>> parts. The first one builds SSA, the second one performs >>>> instrumentation, the last one does the rest. Each part is performed on >>>> all functions before the next one starts. Thus I get all functions in >>>> SSA form and all instrumentation performed before starting early >>>> optimizations. Unfortunately such passes order leads to invalid SSA >>>> because of local_pure_const optimization affecting callers correctness >>>> (in case caller SSA was built before optimization revealed 'pure' or >>>> 'const' flag). >>> >>> Generally the processing order of early_local_passes is chosen >>> to get better optimization - changing it shouldn't affect correctness >>> and thus the issues you observe demand fixing anyway. >>> (I've noted in the past that the early_local_passes processing order >>> should more explicitely honor callgraph SCCs, eventually even iterating). >>> >>> Moving SSA build out of the early_local_passes and into a >>> separate lowering stage is possible, the pure-const stuff is >>> handled by keeping pass_fixup_cfg where it is now I think. >>> In theory you can go into SSA form in all_lowering_passes >>> already (but you have to adjust some pieces dealing with >>> late created functions). >> >> Currently pass_fixup_cfg is called at the beginning of the early local >> passes and therefore function processing order mattes much here. If >> caller is processed before the callee marked as pure then there is no >> additional pass_fixup_cfg for the caller after pure flag for callee is >> set. Probably additional IPA pass is required after early_local_passes >> to fixup all functions? > > You probably need an additional fixup_cfg pass somewhere. > >>> Note that the idea of having stuff inside a single early-local-passes >>> pipeline is that it increases cache locality and reduces memory >>> usage by reclaiming functions no longer needed after inlining. >>> >>>> In general I feel that having special function version for >>>> instrumentation has a better potential, should lead to less intrusive >>>> changes in the compiler and better code quality. >>>> >>>> But before continue this implementation I would like to get some >>>> feedback and probably some advice on how to order passes to get the >>>> best result. Currently I incline to have all functions instrumented >>>> before any local optimizations and solve pure_const problem by >>>> modifying all callers when attribute is computed for some function. >>> >>> As noted above the pure_const proble is fixed by the fixup_cfg pass. >>> >>> Generally if instrumentation is not strictly required before inlining >>> then doing cloning and instrumentation as a "real" IPA pass would >>> be the way to go. You'd create clones during IPA analysis, >>> fixup clones during IPA propagate (IPA inline should notify you >>> of any decisions it makes), and instantiate and instrument the >>> remaining required clones at IPA transform time. >> >> Instrumentation is not required before inlining but is required before >> any optimization pass. Optimizations may change pointers flow and >> therefore break bounds flow, resulting in possible false bound >> violations. It means instrumentation has to be performed before >> early_optimizations. > > Ah, ok ... > >> Thus I may split early_local_passes into three >> IPA passes: >> 1. build SSA + early_inline >> 2. Make instrumented versions + release bodies of original functions >> (for functions having instrumented version) >> 3. Perform early_optimizations. > > That defeats the purpose and functioning of early inlining though. > early inlining wants the callee to have gone through all early local > passes - doing the above breaks this. > > So I suppose your original ordering is the only sensible thing unless > somebody comes up with new great ideas ;)
Thanks for your comments! I'll continue my experiments with with my initial early_local_passes splitting. Will just put there original functions bodies release to avoid overhead for their useless optimizations. So, it will be 3 IPA passes: 1. SSA build 2. Make instrumented versions + release bodies of original functions 3. early_inline + early optimizations. Will be back with more results. Thanks, Ilya > > (eventually inlining always-inline functions can be split off and done > earlier) > > Richard. > >> >> Thanks, >> Ilya >> >>> >>> Generally if doing it before early opts is not too intrusive making it >>> a real IPA pass later shouldn't be too difficult (heh ...). >>> >>> Honza may have more comments here, of course. >>> >>> Thanks, >>> Richard. >>> >>>> Thanks, >>>> Ilya