Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
Hi Jakub, I think the current semaphore sleep system ought to be improved. I'm not sure how but since the GSoC deadline is approaching I'll just post the results without the semaphores. Instead of sleeping on a per-task basis (for example there are depend waits, task waits, taskgroup waits etc..), I think we should simply sleep the threads when the queue is empty and wake them up whenever a task finished executing or a new task has been added to the queue. This shouldn't be too difficult to implement using semaphores. However, since the current gomp semaphores are not always the most performant, I'm not absolutely certain how to do this. I'll defer this to after GSoC. Let me know if you have an idea. Ray Kim
Re: [PATCH] Builtin function roundeven folding implementation
I have made the respective changes and fixed the indentations and it passes the testing. > I encourage a followup looking for and fixing further places in the source > tree that handle round-to-integer function families (ceil / floor / trunc > / round / rint / nearbyint) and should handle roundeven as well, as that > would lead to more optimization of roundeven calls. Such places aren't > that easy to search for because most of those names are common words used > in other contexts in the compiler. But, for example, match.pd has > patterns I will follow up to make these optimizations for sure. Thanks, Tejas On Sat, 24 Aug 2019 at 02:08, Joseph Myers wrote: > > On Fri, 23 Aug 2019, Tejas Joshi wrote: > > > diff --git a/gcc/builtins.c b/gcc/builtins.c > > index 9a766e4ad63..5149d901a96 100644 > > --- a/gcc/builtins.c > > +++ b/gcc/builtins.c > > @@ -2056,6 +2056,7 @@ mathfn_built_in_2 (tree type, combined_fn fn) > > CASE_MATHFN (REMQUO) > > CASE_MATHFN_FLOATN (RINT) > > CASE_MATHFN_FLOATN (ROUND) > > +CASE_MATHFN (ROUNDEVEN) > > This should use CASE_MATHFN_FLOATN, as for the other round-to-integer > functions. > > > + /* Check lowest bit, if not set, return true. */ > > + else if (REAL_EXP (r) <= SIGNIFICAND_BITS) > > + { > > +unsigned int n = SIGNIFICAND_BITS - REAL_EXP (r); > > +int w = n / HOST_BITS_PER_LONG; > > + > > +unsigned long num = ((unsigned long)1 << (n % HOST_BITS_PER_LONG)); > > + > > +if ((r->sig[w] & num) == 0) > > + return true; > > Fix the indentation here (the braces should be indented two columns from > the "else", the contents then two columns from the braces). > > > + } > > + > > + else > > And remove the stray blank line before "else". > > > +/* Return true if R is halfway between two integers, else return > > + false. The function is not valid for rvc_inf and rvc_nan classes. */ > > + > > +bool > > +is_halfway_below (const REAL_VALUE_TYPE *r) > > +{ > > + gcc_assert (r->cl != rvc_inf); > > + gcc_assert (r->cl != rvc_nan); > > + int i; > > Explicitly check for rvc_zero and return false in that case (that seems to > be the convention in real.c, rather than relying on code using REAL_EXP to > do something sensible for zero, which has REAL_EXP of 0). > > > + else if (REAL_EXP (r) < SIGNIFICAND_BITS) > > + { > > Another place to fix indentation. > > > +void > > +real_roundeven (REAL_VALUE_TYPE *r, format_helper fmt, > > + const REAL_VALUE_TYPE *x) > > +{ > > + if (is_halfway_below (x)) > > + { > > Again, fix indentation throughout this function. > > The patch is OK with those fixes, assuming the fixed patch passes testing. > I encourage a followup looking for and fixing further places in the source > tree that handle round-to-integer function families (ceil / floor / trunc > / round / rint / nearbyint) and should handle roundeven as well, as that > would lead to more optimization of roundeven calls. Such places aren't > that easy to search for because most of those names are common words used > in other contexts in the compiler. But, for example, match.pd has > patterns > > /* trunc(trunc(x)) -> trunc(x), etc. */ > > /* f(x) -> x if x is integer valued and f does nothing for such values. */ > > /* truncl(extend(x)) -> extend(trunc(x)), etc., if x is a double. */ > > /* truncl(extend(x)) and trunc(extend(x)) -> extend(truncf(x)), etc., > if x is a float. */ > > which should apply to roundeven as well. > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: [PATCH] Builtin function roundeven folding implementation
Hi, On Sun, Aug 25 2019, Tejas Joshi wrote: > I have made the respective changes and fixed the indentations and it > passes the testing. Great, please send the patch (to me and to the mailing list too), so that I can commit it. Thanks, Martin
Re: Expansion of narrowing math built-ins into power instructions
Hello. > > Similarly addtfsf3 that multiplies TFmode and produces an SFmode result, and > so on. I want to extend this patch for FADDL and DADDL. What operand constraints should I use for TFmode alongside "f"? > In cases where long double and double have the same mode, >the daddl function should use the existing adddf3 pattern. So, should I use adddf3 for DADDL directly? How would I map the add3 optab with DADDL? Thanks, Tejas On Sat, 24 Aug 2019 at 15:23, Richard Sandiford wrote: > > Martin Jambor writes: > > Hello, > > > > On Thu, Aug 22 2019, Segher Boessenkool wrote: > >>> > Hi Tejas, > >>> > > >>> > [ Please do not top-post. ] > >> > >> On Thu, Aug 22, 2019 at 01:27:06PM +0530, Tejas Joshi wrote: > >>> > What happens then? "It does not work" is very very vague. At least it > >>> > seems the compiler does build now? > >>> > >>> Oh, compiler builds but instruction is still "bl fadd". It should be > >>> "fadds" right? > >> > >> Yes, but that means the problem is earlier, before it hits RTL perhaps. > >> > >> Compile with -dap, look at the expand dump (the lowest numbered one, 234 > >> or so), and see what it looked like in the final Gimple, and then in the > >> RTL generated from that. And then drill down. > >> > > > > Tejas sent me his patch and I looked at why it did not work. I found > > two reasons: > > > > 1. associated_internal_fn (in builtins.c) does not handle > >DEF_INTERNAL_OPTAB_FN kind of internal functions, and Tejas > >(sensibly, I'd say) used that macro to define the internal function. > >But when I worked around that by manually adding a case for it in the > >switch statement, I ran into an assert because... > > > > 2. direct_internal_fn_supported_p on which replacement_internal_fn > >depends to expand built-ins as internal functions cannot handle > >conversion optabs... and narrowing is a kind of conversion and the > >optab is added as such with OPTAB_CD. > > > > Actually, the second statement is not entirely true because somehow it > > can handle optab while_ult which is a conversion optab but a) the way it > > is handled, if I can understand it at all, seems to be a big hack and > > would be even worse if we decided to copy that for all narrowing math > > functions > > Think "big hack" is a bit unfair. The way that the internal function > maps argument types to the optab modes, and the way it expands calls > into rtl, depends on the "optab type" argument (the final argument to > DEF_INTERNAL_OPTAB_FN). This is relatively flexible in that it can use > a single-mode "direct" optab or a dual-mode "conversion" optab, with the > modes coming from whichever arguments are appropriate. New optab types > can be added as needed. > > FWIW, several other DEF_INTERNAL_OPTAB_FNs are conversion optabs too > (e.g. IFN_LOAD_LANES, IFN_STORE_LANES, IFN_MASK_LOAD, etc.). > > But... > > > and b) it gets both modes from argument types whereas we need one from > > the result type and so we would have to rewrite > > replacement_internal_fn anyway. > > ...yeah, I agree this breaks the current model. The reason IFN_WHILE_ULT > doesn't rely on the return type is that if you have: > > _2 = .WHILE_ULT (_0, _1) // returning a vector of 4 booleans > _3 = .WHILE_ULT (_0, _1) // returning a vector of 8 booleans > > then the calls look equivalent. So instead we pass an extra argument > indicating the required boolean vector "shape". > > The same "problem" could in principle apply to FADD if we ever needed > to support double+double->_Float16 for example. > > > Therefore, at least for now (GSoC deadline is kind of looming), I > > decided that the best way forward would be to not rely on internal > > functions but plug into expand_builtin() and I wrote the following, > > lightly tested patch - which of course misses testcases and stuff - but > > I'd be curious about any feedback now anyway. When I proposed a very > > similar approach for the roundeven x86_64 expansion, Uros actually then > > opted for a solution based on internal functions, so I am curious > > whether there are simple alternatives I do not see. > > > > Tejas, of course cases for other fadd variants should at least be added > > to expand_builtin. > > > > Thanks, > > > > Martin > > > > > > 2019-08-23 Tejas Joshi > > Martin Jambor > > > > * builtins.c (expand_builtin_binary_conversion): New function. > > (expand_builtin): Call it. > > * config/rs6000/rs6000.md (unspec): Add UNSPEC_ADD_NARROWING. > > (add_truncdfsf3): New define_insn. > > * optabs.def (fadd_optab): New. > > > > [...] > > diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def > > index 9461693bcd1..3f56880c23f 100644 > > --- a/gcc/internal-fn.def > > +++ b/gcc/internal-fn.def > > @@ -140,6 +140,8 @@ DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | > > ECF_NOTHROW, while_ult, while) > > DEF_INTERNAL_OPTAB_FN (VEC_SHL_INSERT, ECF_CONST | ECF_NOTHROW, > > vec_shl_insert, binary)
Re: Expansion of narrowing math built-ins into power instructions
[ Please don't top-post ] On Sun, Aug 25, 2019 at 07:32:01PM +0530, Tejas Joshi wrote: > I want to extend this patch for FADDL and DADDL. What operand > constraints should I use for TFmode alongside "f"? It depends on the instruction you use, and what registers that then works on. GPRs get "r", FPRs get "f" for SFmode but "d" otherwise, the VRs get "v", if all VSRs are allowed you get "wa". And there are some mode attributes to go with mode iterators for when you handle multiple modes (which you always do, you need to handle KF as well). What machine insns do you want to generate? There most likely is something a lot like it already, so take that as example? > > In cases where long double and double have the same mode, > >the daddl function should use the existing adddf3 pattern. Sure, that probably should be handled in generic code (not rs6000). Where it would generate an adddfdf2 it should just do an adddf3. > So, should I use adddf3 for DADDL directly? How would I map the > add3 optab with DADDL? Simply check if source and target mode are the same? Segher
gcc-10-20190825 is now available
Snapshot gcc-10-20190825 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/10-20190825/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 10 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 274915 You'll find: gcc-10-20190825.tar.xz Complete GCC SHA256=40d60384573a7bad93588a3bdc47613d4f2f331d3bee919439b5d4ee4204d0e4 SHA1=3eaadda6e8a7dc915a29e0c5400b0ac38f621602 Diffs from 10-20190818 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-10 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.