"Parallel" mode iterators
Maybe some can help me a little bit with machine descriptions. Assuming there is a define_insn pattern "foo", and that pattern takes two arguments. The first argument sould be of the types DI, SI or HI, and the second argument is always half the size of the first argument. One can define mode iterators for (define_mode_iterator ITER1 [DI SI HI]) (define_mode_iterator ITER2 [SI HI QI]) Is it possible to write something like this: (define_insn "foo" [(set (match_operand:ITER1 0 ...) ... [(match_operand:ITER1 1 ...) (match_operand:ITER2 2 ...)] ... so that the pattern is copied only for the combinations DI-SI, SI-HI and HI-QI, not for all nine combinations of the two iterators? (Or is there another way to get mode of the second argument depending on the first argument?) Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: "Parallel" mode iterators
2014-08-21 11:39 GMT+04:00 Dominik Vogt : > One can define mode iterators for > > (define_mode_iterator ITER1 [DI SI HI]) > (define_mode_iterator ITER2 [SI HI QI]) > > Is it possible to write something like this: > > (define_insn "foo" > [(set (match_operand:ITER1 0 ...) > ... > [(match_operand:ITER1 1 ...) > (match_operand:ITER2 2 ...)] > ... > > so that the pattern is copied only for the combinations DI-SI, > SI-HI and HI-QI, not for all nine combinations of the two > iterators? (Or is there another way to get mode of the second > argument depending on the first argument?) Look at ssehalfvecmode in i386/sse.md: (define_mode_attr ssehalfvecmode [(V64QI "V32QI") (V32HI "V16HI") (V16SI "V8SI") (V8DI "V4DI") (V32QI "V16QI") (V16HI "V8HI") (V8SI "V4SI") (V4DI "V2DI") (V16QI "V8QI") (V8HI "V4HI") (V4SI "V2SI") (V16SF "V8SF") (V8DF "V4DF") (V8SF "V4SF") (V4DF "V2DF") (V4SF "V2SF")]) (define_expand "avx_vextractf128" [(match_operand: 0 "nonimmediate_operand") (match_operand:V_256 1 "register_operand") (match_operand:SI 2 "const_0_to_1_operand")] -- Ilya
Re: LTO bootstrap compare errors for ARM64
On Wed, Aug 20, 2014 at 6:12 PM, Jan Hubicka wrote: >> On Wed, Aug 20, 2014 at 9:28 AM, Venkataramanan Kumar >> wrote: >> > Hi Honza, >> > >> > After discussing with Richard Beiner via >> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62077, it look like it is >> > an existing problem in trunk and is masked due the fact that stage1 >> > and stage2 compilers in trunk are built with enable-checking and hence >> > same garbage collection tuning parameters. >> >> Note that it works on trunk with --enable-checking=release for >> whatever reason. > > Strange, do you know how the IR is affected by garbage collection? Well, for at least one remaining case it seems that type (parts) are re-used differently, so the input at the LTO-out streaming has type (parts) duplicated (or not) dependent on GC. The PR tells about the case I didn't finish tracking down. The case fixed is somebody modifying sth in place we stored into hash tables for re-use. Not sure what the second case is. I still think we should try to understand why trunk is not affected? Richard. > Honza >> >> > In release branches, stage 1 is built with some checks like "gc" but >> > stage 2 is not. >> > These gc parameters affect the LTO IR and it gets streamed differently. >> > >> > Currently for release branches we have workaround of setting same gc >> > parameters for stage1 and stage2 builds (or) build stage1 with >> > ---enable-checking=release. >> > >> > regards, >> > Venkat >> > >> > >> > On 11 August 2014 16:20, Venkataramanan Kumar >> > wrote: >> >> Hi Honza, >> >> >> >> I did not find any differences in tree level dumps. These are the dump >> >> differences in IPA >> >> >> >> In gimple-fold.c.000i.cgraph >> >> >> >> (--Snip--) >> >> < _Z25gimple_build_omp_continueP9tree_nodeS0_/761 >> >> (gimple_build_omp_continue(tree_node*, tree_node*)) @0x3ff7ebda548 >> >> --- >> >>> _Z25gimple_build_omp_continueP9tree_nodeS0_/761 >> >>> (gimple_build_omp_continue(tree_node*, tree_node*)) @0x3ff92b5a548 >> >> 28865c28865 >> >> < _Z26gimple_build_omp_taskgroupP21gimple_statement_base/760 >> >> (gimple_build_omp_taskgroup(gimple_statement_base*)) @0x3ff7ebda400 >> >> --- >> >>> _Z26gimple_build_omp_taskgroupP21gimple_statement_base/760 >> >>> (gimple_build_omp_taskgroup(gimple_statement_base*)) @0x3ff92b5a400 >> >> 28875c28875 >> >> < _Z23gimple_build_omp_masterP21gimple_statement_base/759 >> >> (gimple_build_omp_master(gimple_statement_base*)) @0x3ff7ebda2b8 >> >> --- >> >>> _Z23gimple_build_omp_masterP21gimple_statement_base/759 >> >>> (gimple_build_omp_master(gimple_statement_base*)) @0x3ff92b5a2b8 >> >> 28885c28885 >> >> < _Z24gimple_build_omp_sectionP21gimple_statement_base/758 >> >> (gimple_build_omp_section(gimple_statement_base*)) @0x3ff7ebda170 >> >> --- >> >>> _Z24gimple_build_omp_sectionP21gimple_statement_base/758 >> >>> (gimple_build_omp_section(gimple_statement_base*)) @0x3ff92b5a170 >> >> (--Snip--) >> >> >> >> >> >> In gimple.c.044i.profile_estimate >> >> >> >> (--Snip--) >> >> >> >> 1987c1987 >> >> < vec::qsort(int (*)(void const*, void >> >> const*)) (struct vec * const this, int (*) (const void *, const >> >> void *) cmp) >> >> --- >> >>> vec::qsort(int (*)(void const*, void >> >>> const*)) (struct vec * const this, int (*) (const void *, const >> >>> void *) cmp) >> >> (--Snip--) >> >> >> >> gimple.c.048i.inline >> >> >> >> (--Snip--) >> >> >> >> < min size: 6 >> >> --- >> >>> min size: 0 >> >> 6590c6590 >> >> < min size: 14 >> >> --- >> >>> min size: 0 >> >> 6607c6607 >> >> < min size: 28 >> >> (--Snip--) >> >> >> >> On 7 August 2014 19:14, Jan Hubicka wrote: >> >> As a First step I compared the "objump -D" dump between >> "stage2-gcc/gimple.o" and "stage3-gcc/gimple.o". Differences are in >> LTO sections .gnu.lto_.decls.0, .gnu.lto_.symtab. >> Ref: http://paste.ubuntu.com/7949238/ >> >>> >> >>> If you see the differences already in .o files (i.e. at compile time), I >> >>> think the next >> >>> step is to produce -fdump-tree-all -fdump-ipa-all dumps of >> >>> stage2-gcc/gimple.o >> >>> and stage3-gcc/gimple.o and see how they differ. >> >>> >> >>> Debugging misoptimization of LTO stage2 compiler will be interesting - I >> >>> guess we can >> >>> first try to identify what is wrong rahter than usual bisecting method... >> >>> >> >>> Honza >> >> No differences when when using "objdump -d". >> >> Next I passed "-save-temps" to stage2 and stage3 builds and compared >> the assembly files. The differences are in strings dumped via .ascii >> and ,string directives. >> >> Next I checked the flags passed to the stage 2 and stage 3 builds. It >> is same and below is the flag set being passed. >> >> -save-temps -O2 -g -flto -flto=jobserver -frandom-seed=1 >> -ffat-lto-objects -DIN_GCC-fno-exceptions -fno-rtti >> -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings >> -Wcast-qual
Re: [match-and-simplify] express conversions in transform ?
On Tue, Aug 19, 2014 at 4:35 PM, Richard Biener wrote: > On Tue, Aug 19, 2014 at 12:18 PM, Prathamesh Kulkarni > wrote: >> I was wondering how to write transform that involves conversions >> (without using c_expr) ? >> for eg: >> (T1)(~(T2) X) -> ~(T1) X >> // T1, T2 has same precision and X is an integral type not narrower than T1, >> T2 >> >> this could be written as: >> (simplify >> (convert (bit_not (convert@0 integral_op_p@1))) >> if-expr >> transform) >> >> How would the transform be written ? >> with c_expr we could probably write as: (is that correct?) >> (bit_not { fold_convert (type, @1); }) > > No, unfortunately while that's correct for GENERIC it doesn't work > for GIMPLE code-gen. c_exprs are required to evaluate to > "atomics", that is, non-expressions. > >> I was wondering whether we should have an explicit support for >> conversions (something equivalent of "casting") ? >> sth like: >> (bit_not (cast type @0)) > > Indeed simply writing > >(bit_not (convert @1)) > > will make the code-generator "convert" @0 to its own type > (I knew the hack to simply use the first operands type is not > going to work in all cases... ;). Other conversion operators > include FIX_TRUNC_EXPR (float -> int), FLOAT_EXPR > (int -> float). > > For most cases it works by accident as the outermost expression > knows its type via 'type'. > > In the cases where we need to specify a type I'd like to avoid > making the type-converting operators take an additional operand. > Instead can we make it use a operator flag-with-argument? Like > > (bit_not (convert:type @1)) > > or for the type of a capture (convert:t@2 @1). > > We can also put in some more intelligence in automatically > determining the type to convert to. In your example we > know the bit_not is of type 'type' and this has to match > the type of its operands thus the 'convert' needs to convert > to 'type' as well. I'm sure there are cases we can't > auto-compute though, like chains of conversions (but not sure > they matter in practice). > > So let's try that - Micha will like more intelligence in the generator ;) > > I suggest to add a predicate to the generator conversion_op_p > to catch the various converting operators and then assert that > during code-gen we are going to create it with 'type' (outermost > type), not TREE_TYPE (ops[0]). Then start from the cases > we want to support and special-case (which requires passing > down the type of the enclosing expression). Um sorry, I am not sure if I understood that correctly. How would conversion_op_p know when to use 'type' and not TREE_TYPE (ops[0]) ? conversion_op_p would be used somewhat similar to in the attached patch (illustrate only) ? Thanks, Prathamesh > > Thanks, > Richard. > >> in case we want to cast to a type of another capture >> we could use: >> (foo (cast (type @0) @1)) ? // cast @1 to @0's type >> here type would be an operator that extracts type of the capture. >> >> Another use of having type as an operator I guess would be for >> type-matching (not sure if it's useful). >> for instance: >> (foo (bar @0 @1) @(type @0)@1) >> that checks if @0, @1 have same types. >> >> I was wondering on the same lines if we could introduce new >> keyword "precision" analogous to type ? >> (precision @0) would be short-hand for >> TYPE_PRECISION (TREE_TYPE (@0)) >> >> So far I found couple patterns in fold_unary_loc with conversions >> in transform: >> (T1)(X p+ Y) into ((T1)X p+ Y) >> (T1)(~(T2) X) -> ~(T1) X >> (T1) (X * Y) -> (T1) X * (T1) Y >> >> Thanks, >> Prathamesh Index: genmatch.c === --- genmatch.c (revision 214258) +++ genmatch.c (working copy) @@ -859,6 +859,18 @@ check_no_user_id (simplify *s) /* Code gen off the AST. */ +bool +conversion_op_p (e_operation *oper) +{ + if (strcmp (oper->op->id, "CONVERT_EXPR") != 0 + && strcmp (oper->op->id, "FIX_TRUNC_EXPR") != 0 + && strcmp (oper->op->id, "FLOAT_EXPR") != 0) +return false; + + // ??? how to assert convert should use 'type' + return false; +} + void expr::gen_transform (FILE *f, const char *dest, bool gimple, int depth) { @@ -875,9 +887,13 @@ expr::gen_transform (FILE *f, const char /* ??? Have another helper that is like gimple_build but may fail if seq == NULL. */ fprintf (f, " if (!seq)\n" - "{\n" - " res = gimple_simplify (%s, TREE_TYPE (ops%d[0])", + "{\n"); + if (!conversion_op_p (operation)) + fprintf (f, " res = gimple_simplify (%s, TREE_TYPE (ops%d[0])", operation->op->id, depth); + else + fprintf (f, " res = gimple_simplify (%s, type", operation->op->id); + for (unsigned i = 0; i < ops.length (); ++i) fprintf (f, ", ops%d[%u]", depth, i); fprintf (f, ", seq, valueize);\n");
Re: LTO inhibiting dwarf lexical blocks output
On Wed, Aug 20, 2014 at 6:18 PM, Jan Hubicka wrote: >> On August 18, 2014 8:46:00 PM CEST, Jan Hubicka wrote: >> >> >> >> The following seems to fix it. In testing now. >> > >> >Will streaming as non-reference prevent DECL from being merged and >> >tails of BLOCK_VAR chains >> >to be corrupted? >> >> Yes, the decl ends up in the function section then, not the global types and >> decls one. > > Hmm, breaking one decl rule. tree-inliner used to do that years ago, too. > When function declaring > extern in local scope got inlined, we duplicated the node. I fixed that by > pushing these to > nonlocalized_list instead. Well, those decls are only there for debug info so nonlocalized stuff is only a memory optimization. I also remember trying to figure out a testcase convincing me we should stream that vector for LTO (we don't!). And I failed. > Perhaps we could do that earlier, in FE (or fixup in free lang data), for all > EXTERn decls > avoiding those duplications. So I concluded the same - either we should get rid of that or we should fix the FEs to put all non-automatics into this vector. Or simply put _all_ decls into a vector and not a DECL_CHAIN. Richard. > Honza >> >> Richard. >> >> > >> >Honza >> >> >> >> Richard. >> >> >> >> > Richard. >> >> > >> >> >> Thanks. >> >> >> Aldy >>
Re: Frame pointer optimization issues
On 21/08/14 00:24, Richard Henderson wrote: > On 08/20/2014 08:22 AM, Wilco Dijkstra wrote: >> 2. Change the mid-end to call _frame_pointer_required even when >> !flag_omit_frame_pointer. > > Um, it does that already. At least as far as I can see from > ira_setup_eliminable_regset and update_eliminables. > > It turns out to be much easier to re-enable a frame pointer for a given > function than to disable a frame pointer. Thus I believe that you should > approach -momit_leaf_frame_pointer as setting flag_omit_frame_pointer, and > then > re-enabling it in frame_pointer_required. This requires more than one line in > common/config/arch/arch.c, but it shouldn't be much more than ten. > >> A second issue with frame pointers is that update_eliminables() in reload1.c >> might set >> frame_pointer_needed to false without any checks. > > How? I don't see that path, since the very first thing update_eliminables > does > is call frame_pointer_required -- even before it calls can_eliminate. > > Incidentally, I was working on exactly this (plus improving the unwind info) > before I left on vacation a couple weeks ago. Note that you'll also need to > remove x29 from the fixed registers before eliminating the frame pointer does > any real good. Removing x29 from the list of fixed registers will cause any code relying on a frame chain to crash horribly (external profiling agents, for example); this conforms to the second option for frame-pointer use in AAPCS64. I've seen very little code that really benefits from an additional register here (performance mostly comes from savings in the prologue/epilogue), so I think users should have to explicitly remove it from the fixed list (-fcall-saved-x29) if that's their preference. R.
Re: [match-and-simplify] express conversions in transform ?
On Thu, Aug 21, 2014 at 10:42 AM, Prathamesh Kulkarni wrote: > On Tue, Aug 19, 2014 at 4:35 PM, Richard Biener > wrote: >> On Tue, Aug 19, 2014 at 12:18 PM, Prathamesh Kulkarni >> wrote: >>> I was wondering how to write transform that involves conversions >>> (without using c_expr) ? >>> for eg: >>> (T1)(~(T2) X) -> ~(T1) X >>> // T1, T2 has same precision and X is an integral type not narrower than >>> T1, T2 >>> >>> this could be written as: >>> (simplify >>> (convert (bit_not (convert@0 integral_op_p@1))) >>> if-expr >>> transform) >>> >>> How would the transform be written ? >>> with c_expr we could probably write as: (is that correct?) >>> (bit_not { fold_convert (type, @1); }) >> >> No, unfortunately while that's correct for GENERIC it doesn't work >> for GIMPLE code-gen. c_exprs are required to evaluate to >> "atomics", that is, non-expressions. >> >>> I was wondering whether we should have an explicit support for >>> conversions (something equivalent of "casting") ? >>> sth like: >>> (bit_not (cast type @0)) >> >> Indeed simply writing >> >>(bit_not (convert @1)) >> >> will make the code-generator "convert" @0 to its own type >> (I knew the hack to simply use the first operands type is not >> going to work in all cases... ;). Other conversion operators >> include FIX_TRUNC_EXPR (float -> int), FLOAT_EXPR >> (int -> float). >> >> For most cases it works by accident as the outermost expression >> knows its type via 'type'. >> >> In the cases where we need to specify a type I'd like to avoid >> making the type-converting operators take an additional operand. >> Instead can we make it use a operator flag-with-argument? Like >> >> (bit_not (convert:type @1)) >> >> or for the type of a capture (convert:t@2 @1). >> >> We can also put in some more intelligence in automatically >> determining the type to convert to. In your example we >> know the bit_not is of type 'type' and this has to match >> the type of its operands thus the 'convert' needs to convert >> to 'type' as well. I'm sure there are cases we can't >> auto-compute though, like chains of conversions (but not sure >> they matter in practice). >> >> So let's try that - Micha will like more intelligence in the generator ;) >> >> I suggest to add a predicate to the generator conversion_op_p >> to catch the various converting operators and then assert that >> during code-gen we are going to create it with 'type' (outermost >> type), not TREE_TYPE (ops[0]). Then start from the cases >> we want to support and special-case (which requires passing >> down the type of the enclosing expression). > Um sorry, I am not sure if I understood that correctly. > How would conversion_op_p know when to use 'type' and not TREE_TYPE (ops[0]) ? > conversion_op_p would be used somewhat similar to in the attached > patch (illustrate only) ? Yeah, kind of. Attached is my try which handles (bit_not (convert (plus @0 (convert @1 fine (converting @1 to the type of @0) but obviously not (convert (convert @1)) which is impossible to auto-guess and not (bit_not (convert (plus (convert @0) @1))) though that can be made work by first generating code for @1 and then for (convert @0) so it can access the type of @1 to convert to its type. I'll give the patch proper testing and will install it. Thanks, Richard. > Thanks, > Prathamesh >> >> Thanks, >> Richard. >> >>> in case we want to cast to a type of another capture >>> we could use: >>> (foo (cast (type @0) @1)) ? // cast @1 to @0's type >>> here type would be an operator that extracts type of the capture. >>> >>> Another use of having type as an operator I guess would be for >>> type-matching (not sure if it's useful). >>> for instance: >>> (foo (bar @0 @1) @(type @0)@1) >>> that checks if @0, @1 have same types. >>> >>> I was wondering on the same lines if we could introduce new >>> keyword "precision" analogous to type ? >>> (precision @0) would be short-hand for >>> TYPE_PRECISION (TREE_TYPE (@0)) >>> >>> So far I found couple patterns in fold_unary_loc with conversions >>> in transform: >>> (T1)(X p+ Y) into ((T1)X p+ Y) >>> (T1)(~(T2) X) -> ~(T1) X >>> (T1) (X * Y) -> (T1) X * (T1) Y >>> >>> Thanks, >>> Prathamesh p2-2 Description: Binary data
RE: Frame pointer optimization issues
> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Wilco Dijkstra > > One could hack this a bit further and set flag_omit_frame_pointer = 2 to > differentiate between a > user setting and the override hack, but that's just making things even worse. > So I see 3 possible > solutions: > > 1. Add a copy of flag_omit_frame_pointer, and only modify that in the > override. This is the generic > correct solution that allows any kind of modifications on the copies. This > could be done by making > all flags separate variables and automating the copy in the options parsing > code. Any code that > writes the x_flag_ variables should eventually be fixed to stop doing this to > avoid these bugs (i386 > does this 22 times and c6x 2x). I just sent a patch [1] following a similar approach for a problem on ARM target when compiling for thumb1 and optimizing for size. In this case call_used_regs will be modified to not save some registers in the prologue, ignoring any value the user might have set. [1] https://gcc.gnu.org/ml/gcc-patches/2014-08/msg01970.html So I think a general solution would be useful. Best regards, Thomas
Re: [match-and-simplify] express conversions in transform ?
On Thu, Aug 21, 2014 at 11:57 AM, Richard Biener wrote: > On Thu, Aug 21, 2014 at 10:42 AM, Prathamesh Kulkarni > wrote: >> On Tue, Aug 19, 2014 at 4:35 PM, Richard Biener >> wrote: >>> On Tue, Aug 19, 2014 at 12:18 PM, Prathamesh Kulkarni >>> wrote: I was wondering how to write transform that involves conversions (without using c_expr) ? for eg: (T1)(~(T2) X) -> ~(T1) X // T1, T2 has same precision and X is an integral type not narrower than T1, T2 this could be written as: (simplify (convert (bit_not (convert@0 integral_op_p@1))) if-expr transform) How would the transform be written ? with c_expr we could probably write as: (is that correct?) (bit_not { fold_convert (type, @1); }) >>> >>> No, unfortunately while that's correct for GENERIC it doesn't work >>> for GIMPLE code-gen. c_exprs are required to evaluate to >>> "atomics", that is, non-expressions. >>> I was wondering whether we should have an explicit support for conversions (something equivalent of "casting") ? sth like: (bit_not (cast type @0)) >>> >>> Indeed simply writing >>> >>>(bit_not (convert @1)) >>> >>> will make the code-generator "convert" @0 to its own type >>> (I knew the hack to simply use the first operands type is not >>> going to work in all cases... ;). Other conversion operators >>> include FIX_TRUNC_EXPR (float -> int), FLOAT_EXPR >>> (int -> float). >>> >>> For most cases it works by accident as the outermost expression >>> knows its type via 'type'. >>> >>> In the cases where we need to specify a type I'd like to avoid >>> making the type-converting operators take an additional operand. >>> Instead can we make it use a operator flag-with-argument? Like >>> >>> (bit_not (convert:type @1)) >>> >>> or for the type of a capture (convert:t@2 @1). >>> >>> We can also put in some more intelligence in automatically >>> determining the type to convert to. In your example we >>> know the bit_not is of type 'type' and this has to match >>> the type of its operands thus the 'convert' needs to convert >>> to 'type' as well. I'm sure there are cases we can't >>> auto-compute though, like chains of conversions (but not sure >>> they matter in practice). >>> >>> So let's try that - Micha will like more intelligence in the generator ;) >>> >>> I suggest to add a predicate to the generator conversion_op_p >>> to catch the various converting operators and then assert that >>> during code-gen we are going to create it with 'type' (outermost >>> type), not TREE_TYPE (ops[0]). Then start from the cases >>> we want to support and special-case (which requires passing >>> down the type of the enclosing expression). >> Um sorry, I am not sure if I understood that correctly. >> How would conversion_op_p know when to use 'type' and not TREE_TYPE (ops[0]) >> ? >> conversion_op_p would be used somewhat similar to in the attached >> patch (illustrate only) ? > > Yeah, kind of. Attached is my try which handles > >(bit_not (convert (plus @0 (convert @1 > > fine (converting @1 to the type of @0) but obviously not > >(convert (convert @1)) > > which is impossible to auto-guess and not > > (bit_not (convert (plus (convert @0) @1))) > > though that can be made work by first generating code for @1 > and then for (convert @0) so it can access the type of @1 > to convert to its type. > > I'll give the patch proper testing and will install it. And if we really need sth like (convert (convert @0)) then this can either use a c-expr or we can invent sth like (convert (match-type @1 (convert @0))) that is, add a "fake" binary operation to guess the type. Richard. > Thanks, > Richard. > >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Richard. >>> in case we want to cast to a type of another capture we could use: (foo (cast (type @0) @1)) ? // cast @1 to @0's type here type would be an operator that extracts type of the capture. Another use of having type as an operator I guess would be for type-matching (not sure if it's useful). for instance: (foo (bar @0 @1) @(type @0)@1) that checks if @0, @1 have same types. I was wondering on the same lines if we could introduce new keyword "precision" analogous to type ? (precision @0) would be short-hand for TYPE_PRECISION (TREE_TYPE (@0)) So far I found couple patterns in fold_unary_loc with conversions in transform: (T1)(X p+ Y) into ((T1)X p+ Y) (T1)(~(T2) X) -> ~(T1) X (T1) (X * Y) -> (T1) X * (T1) Y Thanks, Prathamesh
Possible "C++ for embedded systems" WG21 working group
Hi falks, since the last WG21 meeting held at Rapperswil, where I gave a presentation of issues and opportunities to improve in the C++ language regarding embedded systems development, I'm pursuing the creation of a Committee's Working Group. There was agreement that the issues were relevant and I was told that the group could be created in the case more people is interested and involved. I created a mailing list were these issues are being discussed and some proposals are being drafted, to be presented in the next WG21 meeting at Urbana. If anyone is interested to participate, please send me an email (daniel dot gutson at taller technologies dot com) telling me if you have ever attended a WG21 meeting and/or would be willing to participate in one. Having gcc and clang maintainers would be great to have an implementer POV in the discussions. Thanks! Daniel. -- Daniel F. Gutson Chief Engineering Officer, SPD San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211 Skype: dgutson
RE: Frame pointer optimization issues
> Richard Henderson wrote: > On 08/20/2014 08:22 AM, Wilco Dijkstra wrote: > > 2. Change the mid-end to call _frame_pointer_required even when > > !flag_omit_frame_pointer. > > Um, it does that already. At least as far as I can see from > ira_setup_eliminable_regset and update_eliminables. No, in ira_setup_eliminable_regset the frame pointer is always forced if !flag_omit_frame_pointer without allowing frame_pointer_required to override it: frame_pointer_needed = (! flag_omit_frame_pointer ... || targetm.frame_pointer_required ()); This would allow targets to choose whether to do leaf tail pointer optimization: frame_pointer_needed = ((! flag_omit_frame_pointer && targetm.frame_pointer_required ()) > It turns out to be much easier to re-enable a frame pointer for a given > function than to disable a frame pointer. Thus I believe that you should > approach -momit_leaf_frame_pointer as setting flag_omit_frame_pointer, and > then > re-enabling it in frame_pointer_required. This requires more than one line in > common/config/arch/arch.c, but it shouldn't be much more than ten. As I explained it is not correct to force flag_omit_frame_pointer to be true. This is what is done today and it fails in various cases. So unless the way options are handled is changed, this possibility is out. > > A second issue with frame pointers is that update_eliminables() in > > reload1.c might set > > frame_pointer_needed to false without any checks. > > How? I don't see that path, since the very first thing update_eliminables > does > is call frame_pointer_required -- even before it calls can_eliminate. Update_eliminables() does indeed call frame_pointer_required at the start, however this only blocks elimination *from* HARD_FRAME_POINTER_REGNUM, while the code at the end clears frame_pointer_needed if FRAME_POINTER_REGNUM can be eliminated into any register other than HARD_FRAME_POINTER_REGNUM. The middle bit of the function is not relevant as HARD_FRAME_POINTER_REGNUM should only be eliminable into SP (but even if say it could be eliminable into another register X, it will only block eliminations of X to SP). So frame_pointer_needed can be cleared even when frame_pointer_required is true... In principle if this function worked reliably then we could implement leaf FPO using this mechanism. Unfortunately it doesn't, update_eliminables is not called in trivial leaf functions even when can_eliminate always returns true, so the frame pointer is never removed. Additionally I'd be worried about compilation performance as it would introduce extra register allocation passes for ~50% of functions. Wilco
[NEW PLATFORM] [HELP] GCC on the Broadcom VideoCore IV VPU
The BCM2835 (the RPi chip) does have a custom ISA general-purpose CPU with SIMD extensions ( which does not have a MMU). As I develop a FOSS blob for the Pi(github.com/freeblob , I need something less crappy than the ACK (it can only create 5 vars per function). The GCC port for VC4 has mangled stack frames and is available at:github.com/mm120/gcc-vc4 branch vc4. Can anyone figure what is wrong in the code and submit a patch( pull request, mainlining, or a custom repo). Thanks in advance.
gcc-4.8-20140821 is now available
Snapshot gcc-4.8-20140821 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20140821/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 214296 You'll find: gcc-4.8-20140821.tar.bz2 Complete GCC MD5=e88381671a58922a04cc9f287e8ed8a8 SHA1=d834783f4f9b9b09f6bb4babfbab9097ab9e6317 Diffs from 4.8-20140814 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Is that a problem?
Hi, After applied a patch to GCC to make it warn about strict aliasing violating, like this: diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c index b6ecaa4..95e745c 100644 --- a/gcc/tree-inline.c +++ b/gcc/tree-inline.c @@ -2913,6 +2913,10 @@ setup_one_parameter (copy_body_data *id, tree p, tree value, tree fn, } } + if (warn_strict_aliasing > 2) +if (strict_aliasing_warning (TREE_TYPE (rhs), TREE_TYPE(p), rhs)) + warning (OPT_Wstrict_aliasing, "during inlining function %s into function %s", fndecl_name(fn), function_name(cfun)); + Compiling gcc/testsuite/g++.dg/opt/pmf1.C triggers that warning: gcc/testsuite/g++.dg/opt/pmf1.C: In function 'int main()': gcc/testsuite/g++.dg/opt/pmf1.C:72:42: warning: dereferencing type-punned pointer will break strict-aliasing rules. With expression: &t, type of expresssion: struct Container *, type to cast: struct WorldObject * const. [-Wstrict-aliasing] t.forward(itemfunptr, &Item::fred, 1); ^ gcc/testsuite/g++.dg/opt/pmf1.C:72:42: warning: during inlining function void WorldObject::forward(memfunT, arg1T, arg2T) [with memfunT = void (Container::*)(void (Item::*)(int), int); arg1T = void (Item::*)(int); arg2T = int; Derived = Container] into function int main() [-Wstrict-aliasing] It that a problem here? We try to case type Container to its base type WorldObject, and that violating the strict aliasing? Let's take a look at this. -- Lin Zuojian
Re: Is that a problem?
Hi, I knew what is going on now. strict_aliasing_warning has not considered tbaa. We might want to fix it. -- Lin Zuojian