David Malcolm <dmalc...@redhat.com> writes: > On Wed, 2015-08-05 at 16:22 -0400, Trevor Saunders wrote: >> On Wed, Aug 05, 2015 at 11:34:28AM -0400, David Malcolm wrote: >> > On Wed, 2015-08-05 at 11:28 -0400, David Malcolm wrote: >> > > On Wed, 2015-08-05 at 13:47 +0200, Richard Biener wrote: >> > > > On Wed, Aug 5, 2015 at 12:57 PM, Trevor Saunders >> > > > <tbsau...@tbsaunde.org> wrote: >> > > > > On Mon, Jul 27, 2015 at 11:06:58AM +0200, Richard Biener wrote: >> > > > >> On Sat, Jul 25, 2015 at 4:37 AM, <tbsaunde+...@tbsaunde.org> wrote: >> > > > >> > From: Trevor Saunders <tbsaunde+...@tbsaunde.org> >> > > > >> > >> > > > >> > * config/arc/arc.h, config/bfin/bfin.h, config/frv/frv.h, >> > > > >> > config/ia64/ia64-protos.h, config/ia64/ia64.c, >> > > > >> > config/ia64/ia64.h, >> > > > >> > config/lm32/lm32.h, config/mep/mep.h, config/mmix/mmix.h, >> > > > >> > config/rs6000/rs6000.c, config/rs6000/xcoff.h, >> > > > >> > config/spu/spu.h, >> > > > >> > config/visium/visium.h, defaults.h: Define >> > > > >> > ASM_OUTPUT_LABEL to >> > > > >> > the name of a function. >> > > > >> > * output.h (default_output_label): New prototype. >> > > > >> > * varasm.c (default_output_label): New function. >> > > > >> > * vmsdbgout.c: Include tm_p.h. >> > > > >> > * xcoffout.c: Likewise. >> > > > >> >> > > > >> Just a general remark - the GCC output machinery is known to be >> > > > >> slow, >> > > > >> adding indirect calls might be not the very best idea without >> > > > >> refactoring >> > > > >> some of it. >> > > > >> >> > > > >> Did you do any performance measurements for artificial testcases >> > > > >> exercising the specific bits you change? >> > > > > >> > > > > sorry about the delay, but I finally got a chance to do some >> > > > > perf tests >> > > > > of the first patch. I took three test cases fold-const.ii, >> > > > > insn-emit.ii >> > > > > and a random .i from firefox and did 3 trials of the length of 100 >> > > > > compilations. The only non default flag was -std=gnu++11. >> > > > > > [...snip results...] >> > > > > >> > > > > So, roughly that looks to me like a range from improving by .5% to >> > > > > regressing by 1%. I'm not sure what could cause an improvement, so I >> > > > > kind of wonder how valid these results are. >> > > > >> > > > Hmm, indeed. The speedup looks suspicious. >> > > > >> > > > > Another question is how one can refactor the output machinary to be >> > > > > faster. My first thought is to buffer text internally before >> > > > > calling >> > > > > stdio functions, but that seems like a giant job. >> > > > >> > > > stdio functions are already buffering, so I don't know either. >> > > > >> > > > But yes, going the libas route would improve things here, or for >> > > > example enhancing gas to be able to eat target binary data >> > > > without the need to encode it in printable characters... >> > > > >> > > > .raw_data number-of-bytes >> > > > <raw data> >> > > > >> > > > Makes it quite unparsable to editors of course ... >> > > >> > > A middle-ground might be to do both: >> > > >> > > .raw_data number-of-bytes >> > > <raw data> >> > >> > Sorry, I hit "Send" too early; I meant something like this as a >> > middle-ground: >> > >> > .raw_data number-of-bytes >> > <raw data> >> > >> > ; comment giving the formatted text >> > >> > so that cc1 etc are doing the formatting work to make the comment, so >> > that human readers can see what the raw data is meant to be, but the >> > assembler doesn't have to do work to parse it. >> >> well, having random bytes in the file might still screw up editors, and >> I'd kind of expect that to be slower over all since gcc still does the >> formating, and both gcc and as do more IO. >> >> > FWIW, I once had a go at hiding asm_out_file behind a class interface, >> > trying to build up higher-level methods on top of raw text printing. >> > Maybe that's a viable migration strategy (I didn't finish that patch). >> >> I was thinking about trying that, but I couldn't think of a good way to >> do it incrementally. >> >> Trev > > Attached is a patch from some experimentation, very much a > work-in-progress. > > It eliminates the macro ASM_OUTPUT_LABEL in favor of calls to a method > of an "output" object: > > g_output.output_label (lab); > > g_output would be a thin wrapper around asm_out_file (with the > assumption that asm_out_file never changes to point at anything else). > > One idea here is to gradually replace uses of asm_out_file with methods > of g_output, giving us a possible approach for tackling the "don't > format so much and then parse it again" optimization. > > Another idea here is to use templates and specialization in place of > target macros, to capture things in the type system; > g_output is actually: > > output<target_t> g_output; > > which has a default implementation of output_label corresponding to the > current default ASM_OUTPUT_LABEL: > > template <typename Target> > inline void > output<Target>::output_label (const char *name) > { > assemble_name (name); > puts (":\n"); > } > > ...but a specific Target traits class could have a specialization e.g. > > template <> > inline void > output<target_arm>::output_label (const char *name) > { > arm_asm_output_labelref (name); > } > > This could give us (I hope) equivalent performance to the current > macro-based approach, but without using the preprocessor, albeit adding > some C++ (the non-trivial use of templates gives me pause).
I might be missing the point, sorry, but it sounds like this enshrines the idea of having a single target. An integrated assembler or tighter asm output would be nice, but when I last checked LLVM was usually faster than GCC even when compiling to asm, even though LLVM does use indirection (in the form of virtual functions) for its output routines. I don't think indirect function calls themselves are the problem -- as long as we get the abstraction right :-) Thanks, Richard