include tree.h instead of tree-core.h in expr.h
In expr.h: /* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ #include "tree-core.h" However the functions tree_to_shwi(), tree_fits_shwi(), etc. are not declared in tree-core.h, but in tree.h This is not a problem because tree_to_shwi() gets called from within macros in expr.h, (eg: from ADD_PARM_SIZE and SUB_PARM_SIZE macros), and when these macros are called, tree.h has been already included in that translation unit before including expr.h (for example, in alias.c tree.h, is included before expr.h). I am replacing macros by static inline functions (http://gcc.gnu.org/ml/gcc/2013-12/msg00148.html), changing ADD_PARM_SIZE() from macro to function now gives an error, that tree_to_shwi() (and other things like ssizetype, etc.) are not defined in scope. Including tree.h removes these errors. Would it be better to include tree.h instead of tree-core.h (tree.h includes tree-core.h anyway), or shall I leave these macros untouched ? Thanks and Regards, Prathamesh
Re: include tree.h instead of tree-core.h in expr.h
On Wed, Dec 18, 2013 at 6:57 AM, Prathamesh Kulkarni wrote: > Would it be better to include tree.h instead of tree-core.h (tree.h > includes tree-core.h anyway), or shall I leave these macros untouched > ? Better leave these macros intact for now. We are trying to flatten out the #include tree. Adding tree.h to another header goes in the opposite direction. Please add a note describing the conflict. Diego.
Re: include tree.h instead of tree-core.h in expr.h
On 12/18/2013 08:08 AM, Diego Novillo wrote: On Wed, Dec 18, 2013 at 6:57 AM, Prathamesh Kulkarni wrote: Would it be better to include tree.h instead of tree-core.h (tree.h includes tree-core.h anyway), or shall I leave these macros untouched ? Better leave these macros intact for now. We are trying to flatten out the #include tree. Adding tree.h to another header goes in the opposite direction. Please add a note describing the conflict. Looks like function.c is the primary user of {ADD,SUB}_PARM_SIZE, with a single use of ADD_PARM_SIZE in calls.cI'd suggest moving both new functions to function.c and exporting the protoype for add_parm_size() in function.h. calls.c already include function.h. I can't imagine that call to ADD_PARM_SIZE in calls.c having much impact on compile time... Andrew
Re: include tree.h instead of tree-core.h in expr.h
On Wed, Dec 18, 2013 at 8:20 AM, Andrew MacLeod wrote: > On 12/18/2013 08:08 AM, Diego Novillo wrote: >> >> On Wed, Dec 18, 2013 at 6:57 AM, Prathamesh Kulkarni >> wrote: >> >>> Would it be better to include tree.h instead of tree-core.h (tree.h >>> includes tree-core.h anyway), or shall I leave these macros untouched >>> ? >> >> Better leave these macros intact for now. We are trying to flatten out >> the #include tree. Adding tree.h to another header goes in the >> opposite direction. >> >> Please add a note describing the conflict. >> >> >> > Looks like function.c is the primary user of {ADD,SUB}_PARM_SIZE, with a > single use of ADD_PARM_SIZE in calls.cI'd suggest moving both new > functions to function.c and exporting the protoype for add_parm_size() in > function.h. calls.c already include function.h. > > I can't imagine that call to ADD_PARM_SIZE in calls.c having much impact on > compile time... Ah, yes, if the usage pattern of these macros is so simple, that's a better option. Diego.
Re: include tree.h instead of tree-core.h in expr.h
On Wed, Dec 18, 2013 at 6:54 PM, Diego Novillo wrote: > On Wed, Dec 18, 2013 at 8:20 AM, Andrew MacLeod wrote: >> On 12/18/2013 08:08 AM, Diego Novillo wrote: >>> >>> On Wed, Dec 18, 2013 at 6:57 AM, Prathamesh Kulkarni >>> wrote: >>> Would it be better to include tree.h instead of tree-core.h (tree.h includes tree-core.h anyway), or shall I leave these macros untouched ? >>> >>> Better leave these macros intact for now. We are trying to flatten out >>> the #include tree. Adding tree.h to another header goes in the >>> opposite direction. >>> >>> Please add a note describing the conflict. >>> >>> >>> >> Looks like function.c is the primary user of {ADD,SUB}_PARM_SIZE, with a >> single use of ADD_PARM_SIZE in calls.cI'd suggest moving both new >> functions to function.c and exporting the protoype for add_parm_size() in >> function.h. calls.c already include function.h. >> >> I can't imagine that call to ADD_PARM_SIZE in calls.c having much impact on >> compile time... > > Ah, yes, if the usage pattern of these macros is so simple, that's a > better option. > > > Diego. ADD_PARM_SIZE is called at 4 places from the following callers: File Function Line 0 calls.c initialize_argument_information 1356 ADD_PARM_SIZE (*args_size, args[i].locate.size.var); 1 function.c assign_parm_is_stack_parm 2566ADD_PARM_SIZE (all->stack_args_size, data->locate.size.var); 2 function.c locate_and_pad_parm3866ADD_PARM_SIZE (locate->size, sizetree); 3 function.c pad_below3959 ADD_PARM_SIZE (*offset_ptr, s2); As suggested by Andrew, I shall move them into function.c and export their prototype in function.h Thanks and Regards, Prathamesh
Re: include tree.h instead of tree-core.h in expr.h
On Wed, Dec 18, 2013 at 7:13 PM, Prathamesh Kulkarni wrote: > On Wed, Dec 18, 2013 at 6:54 PM, Diego Novillo wrote: >> On Wed, Dec 18, 2013 at 8:20 AM, Andrew MacLeod wrote: >>> On 12/18/2013 08:08 AM, Diego Novillo wrote: On Wed, Dec 18, 2013 at 6:57 AM, Prathamesh Kulkarni wrote: > Would it be better to include tree.h instead of tree-core.h (tree.h > includes tree-core.h anyway), or shall I leave these macros untouched > ? Better leave these macros intact for now. We are trying to flatten out the #include tree. Adding tree.h to another header goes in the opposite direction. Please add a note describing the conflict. >>> Looks like function.c is the primary user of {ADD,SUB}_PARM_SIZE, with a >>> single use of ADD_PARM_SIZE in calls.cI'd suggest moving both new >>> functions to function.c and exporting the protoype for add_parm_size() in >>> function.h. calls.c already include function.h. >>> >>> I can't imagine that call to ADD_PARM_SIZE in calls.c having much impact on >>> compile time... >> >> Ah, yes, if the usage pattern of these macros is so simple, that's a >> better option. >> >> >> Diego. > > ADD_PARM_SIZE is called at 4 places from the following callers: >File Function Line > 0 calls.c initialize_argument_information 1356 > ADD_PARM_SIZE (*args_size, args[i].locate.size.var); > 1 function.c assign_parm_is_stack_parm 2566ADD_PARM_SIZE > (all->stack_args_size, data->locate.size.var); > 2 function.c locate_and_pad_parm3866ADD_PARM_SIZE > (locate->size, sizetree); > 3 function.c pad_below3959 > ADD_PARM_SIZE (*offset_ptr, s2); sorry for the bad formatting. ADD_PARM_SIZE is called at following places: http://pastebin.com/SiUfkX3F > > As suggested by Andrew, I shall move them into function.c and export > their prototype in function.h > > Thanks and Regards, > Prathamesh
RE: Question about omp-low.c
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Wednesday, December 18, 2013 1:58 AM > To: Iyer, Balaji V > Cc: Jason Merrill (ja...@redhat.com); 'gcc@gcc.gnu.org' > Subject: Re: Question about omp-low.c > > On Wed, Dec 18, 2013 at 04:46:40AM +, Iyer, Balaji V wrote: > > I have a question regarding the parallel for implementation. I am > implementing _Cilk_for based on the routines in omp-low.c and I would like > to create a child function but would like to move the items that > gimplify_omp_for inserts in for_pre_body in the top-level function. I need to > do this because in _Cilk_for, we insert the body of the function into a child > function and then we call a builtin CilkPlus function called > __cilkrts_cilk_for_64 and pass in the child function's name, data pointer, > loop-count and grain. > > > > The loop count computation gets to be an issue in C++ when we use > iterator. > > > > For example, if we have something like this: > > Vector array; > > For (vector::iterator iter = array.begin(); iter != array.end > > (); iter++) > > OpenMP also supports C++ iterators, so I don't see why you don't follow > that. > The iterators are lowered already by the C++ FE, what the middle-end sees is > an integral iterator. Just look at one of the several > libgomp/testsuite/libgomp.c++/for-* testcases. > I think we are talking two different things or I am not understanding you. I am using OMP's iterator handling mechanism, but the question I have is about the pre-body. It is pre-gimplifying the condition and the initial value and storing it in pre_body. The prebody is then pushed into the child function. I want the pre-body to be pushed into main function. Is it possible for me to do that? > By following what we do for OpenMP here, I'd hope you can get rid of the > loop_count you've added to the gimple structure, what is grain, is that > specific to each of the collapsed trees, or does Cilk+ support only > collapse(1), > and if so or if it is global for the _Cilk_for and not for each iterator, > just add a > clause for it instead. > Grain is a value that user specifies that is passed directly to the runtime that tells it how to divide the work (different from step-size, please see cilk_for_grain.c in my patch). It is very Cilk runtime specific. Loop-count I need it because I need it to pass into the cilk runtime function call (along with grain) to tell the loop count. I calculate this value way ahead before the gimplification of the condition, intial value and the step-size. Now, if I can some-how find a way to make GCC move the pre-body to the parent function instead of the child function, I can try to eliminate this field from the structure. > Jakub
Re: Question about omp-low.c
On Wed, Dec 18, 2013 at 03:29:04PM +, Iyer, Balaji V wrote: > > OpenMP also supports C++ iterators, so I don't see why you don't follow > > that. > > The iterators are lowered already by the C++ FE, what the middle-end sees is > > an integral iterator. Just look at one of the several > > libgomp/testsuite/libgomp.c++/for-* testcases. > > > > I think we are talking two different things or I am not understanding you. > I am using OMP's iterator handling mechanism, but the question I have is > about the pre-body. It is pre-gimplifying the condition and the initial > value and storing it in pre_body. The prebody is then pushed into the > child function. I want the pre-body to be pushed into main function. Is > it possible for me to do that? Well, unless you are adding a parallel region around your GIMPLE_FOR variant explicitly, you'd automatically evaluate the pre_body before that. If you have an artificial parallel there, you need to take care about it e.g. during gimplification for your variant. What I was complaining about is: @@ -523,6 +524,12 @@ struct GTY(()) gimple_omp_for_iter { /* Increment. */ tree incr; + + /* Loop count, only used by _Cilk_for. */ + tree loop_count; + + /* Grain value, only used by _Cilk_for. */ + tree grain; }; /* GIMPLE_OMP_FOR */ Don't do this, compute loop count during omp expansion (there is already code that does that for you, after all, for #pragma omp for the loop count is typically (unless static schedule) passed as parameter to the runtime as well. And for grain really use an artificial clause, it can be called CILK_CLAUSE__GRAIN_ (similarly how we put _s around LOOPTEMP or SIMDUID clause names, those are also artificial clauses). Jakub
RE: Question about omp-low.c
> Don't do this, compute loop count during omp expansion (there is already > code that does that for you, after all, for #pragma omp for the loop count is > typically (unless static schedule) passed as parameter to the runtime as well. Where does this happen? Is there a routine that you can point me to that will compute the loop-count? Thanks, Balaji V. Iyer.
Re: Question about omp-low.c
On Wed, Dec 18, 2013 at 04:16:57PM +, Iyer, Balaji V wrote: > > Don't do this, compute loop count during omp expansion (there is already > > code that does that for you, after all, for #pragma omp for the loop count > > is > > typically (unless static schedule) passed as parameter to the runtime as > > well. > > Where does this happen? Is there a routine that you can point me to that will > compute the loop-count? E.g. extract_omp_for_data (that does that only for collapse>1 though), otherwise you get from that routine just n1, n2, step and cond_code and from that you can easily compute it as extract_omp_for_data does: tree itype = TREE_TYPE (loop->v); if (POINTER_TYPE_P (itype)) itype = signed_type_for (itype); t = build_int_cst (itype, (loop->cond_code == LT_EXPR ? -1 : 1)); t = fold_build2_loc (loc, PLUS_EXPR, itype, fold_convert_loc (loc, itype, loop->step), t); t = fold_build2_loc (loc, PLUS_EXPR, itype, t, fold_convert_loc (loc, itype, loop->n2)); t = fold_build2_loc (loc, MINUS_EXPR, itype, t, fold_convert_loc (loc, itype, loop->n1)); if (TYPE_UNSIGNED (itype) && loop->cond_code == GT_EXPR) t = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, fold_build1_loc (loc, NEGATE_EXPR, itype, t), fold_build1_loc (loc, NEGATE_EXPR, itype, fold_convert_loc (loc, itype, loop->step))); else t = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, t, fold_convert_loc (loc, itype, loop->step)); Jakub
bootstrap failure powerpc64 FreeBSD r206072
Hi, the revision 206072 causes here on FreeBSD powerpc64 a bootstrap failure in stage 3. I'm a bit confused. What would you need from me to help me analyze the situation? PR plus stage 3 preprocessed source of tree-ssa-ifcombine.c? I'm a bit out of sync regarding gcc development, so please bear with me and update me with the latest requirements/procedures needed to track such an issue. Thanks, Andreas -- cc1-checksum.o libbackend.a main.o tree-browser.o libcommon-target.a libcommon.a ../libcpp/libcpp.a ../libdecnumber/libdecnumber.a libcommon.a ../libcpp/libcpp.a ../libbacktrace/.libs/libbacktrace.a ../libiberty/libiberty.a ../libdecnumber/libdecnumber.a -L/usr/local/lib -lmpc -lmpfr -lgmp -rdynamic -L../zlib -lz /usr/local/bin/ld: libbackend.a(tree-ssa-ifcombine.o): unknown relocation type 84952 for `*UND*' /usr/local/bin/ld: libbackend.a(tree-ssa-ifcombine.o): unknown relocation type 84960 for `*UND*' --- Revision 206070 bootstraps fine. Binutils 2.24 config: [tritium:head/objdir/gcc] andreast% ./xgcc -v Using built-in specs. COLLECT_GCC=./xgcc Target: powerpc64-unknown-freebsd11.0 Configured with: /export/devel/net/src/gcc/head/gcc/configure --prefix=/export/build/src/gcc/head/testbin --with-gmp=/usr/local --disable-nls --with-as=/usr/local/bin/as --with-ld=/usr/local/bin/ld --disable-multilib --enable-languages=c,c++,fortran Thread model: posix
RE: Question about omp-low.c
> -Original Message- > From: Jakub Jelinek [mailto:ja...@redhat.com] > Sent: Wednesday, December 18, 2013 11:28 AM > To: Iyer, Balaji V > Cc: Jason Merrill (ja...@redhat.com); 'gcc@gcc.gnu.org' > Subject: Re: Question about omp-low.c > > On Wed, Dec 18, 2013 at 04:16:57PM +, Iyer, Balaji V wrote: > > > Don't do this, compute loop count during omp expansion (there is > > > already code that does that for you, after all, for #pragma omp for > > > the loop count is typically (unless static schedule) passed as parameter > > > to > the runtime as well. > > > > Where does this happen? Is there a routine that you can point me to that > will compute the loop-count? > > E.g. extract_omp_for_data (that does that only for collapse>1 though), > otherwise you get from that routine just n1, n2, step and cond_code and > from that you can easily compute it as extract_omp_for_data does: > tree itype = TREE_TYPE (loop->v); > > if (POINTER_TYPE_P (itype)) > itype = signed_type_for (itype); > t = build_int_cst (itype, (loop->cond_code == LT_EXPR ? -1 : > 1)); > t = fold_build2_loc (loc, >PLUS_EXPR, itype, >fold_convert_loc (loc, itype, loop->step), t); > t = fold_build2_loc (loc, PLUS_EXPR, itype, t, >fold_convert_loc (loc, itype, loop->n2)); > t = fold_build2_loc (loc, MINUS_EXPR, itype, t, >fold_convert_loc (loc, itype, loop->n1)); > if (TYPE_UNSIGNED (itype) && loop->cond_code == GT_EXPR) > t = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, > fold_build1_loc (loc, NEGATE_EXPR, itype, t), > fold_build1_loc (loc, NEGATE_EXPR, itype, > fold_convert_loc (loc, itype, > loop->step))); > else > t = fold_build2_loc (loc, TRUNC_DIV_EXPR, itype, t, > fold_convert_loc (loc, itype, loop->step)); > Hi Jakub, I looked into this, but the issue I have is, for the following code: Int main (void) { _Cilk_for (int ii = W; ii < (X+Y); ii = ii + (q+z)) } It will gimplify X+Y and q+z and put it in a new variable that is stored in pre_body and replace the above code like this: Int main (void { _Cilk_for (int ii = W; ii < D1234 ; ii = ii + D1235) } The pre_body will have something like this: D1234 = X + Y; D1235 = q + z; Finally, I need to replace the _Cilk_for with something like this in the main function: Int main (void) { __cilkrts_cilk_for_64 (main.cilk_for_fn.0, &cilk_for_data, , grain); } During the expand_omp_for stage (or at any stage after gimplification), if I calculate the loop count, I should get something like this: Loop_count = (D1234 - W) / D1235 Now, the pre_body is pushed into the child function and thus the values calculated for D1234 and D1235 will be in the child function and not in main which will result in getting the wrong loop count. This is why I need to have a loop count field to calculate and then store this value from the pre-gimplification phase. Now, if I can somehow make it push the pre-body into the main function and not in the child function, I won't have this issue anymore. Can you please suggest me where the pre-body is being pushed into the child function and/or how I could stop it? I tried to warlk through the code by putting a break point in gimple_omp_for_pre_body_ptr and I am still not finding this... In most OMP code I have seen, the loop count is inserted/used in the child function and thus this is not an issue. Thanks, Balaji V. Iyer. > Jakub
Re: Question about omp-low.c
On Thu, Dec 19, 2013 at 05:14:16AM +, Iyer, Balaji V wrote: > I looked into this, but the issue I have is, for the following code: > > Int main (void) { > _Cilk_for (int ii = W; ii < (X+Y); ii = ii + (q+z)) This doesn't have a body, Int won't compile either. Can you post -fdump-tree-{original,gimple,omplower,ompexp} dump for some short simple testcase just to see what design decisions you've made so far? Perhaps something should be reconsidered... Jakub