Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?
On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer wrote: > > * Romain Geissler: > > > That may fly in the open source world, however I expect some vendors > > shipping proprietary code might be fine with assembly/LTO > > representation of their product, but not source. > > They can't ship LTO today anyway due to the format incompatibility, so > that's not really an argument against source-based LTO. Source-based LTO doesn't really work unless you can re-synthesize source from the IL. At least I don't see how you can do whole-program analysis on source and then cut it into appropriate pieces, duplicating some things and some not to make up for the parallel final compile step. Richard. > Thanks, > Florian
Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status
On Sun, 21 Jul 2019, Giuliano Belinassi wrote: > Hi all, > > Here is my second evaluation report, together with a simple program that > I was able to compile with my parallel version of GCC. Keep in mind that > I still have lots of concurrent issues inside the compiler and therefore > my branch will fail to compile pretty much anything else. > > To reproduce my current branch, use the following steps: > > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1. > > 3-) Compile with --disable-bootstrap --enable-languages=c > > 4-) make > > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance. > > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc" > > 7-) compile the program using -O2 > > I a attaching my report in markdown format, which you can convert to pdf > using `pandoc` if you find it difficult to read in the current format. > > I am also open to suggestions. Please do not hesitate to comment :) Thanks for the report and it's great that you are making progress! I suggest you add a --param (edit params.def) so one can choose num_threads on the command-line instead of needing to recompile GCC. Just keep the default "safe" so that GCC build itself will still work. For most of the allocators I think that in the end we want to keep most of them global but have either per-thread freelists or a freelist implementation that can work (allocate and free) without locking, employing some RCU scheme. Not introducing per-thread state is probably leaner on the implementation. It would of course mean taking a lock when the freelist needs to be re-filled from the main pool but that's hopefully not common. I don't know a RCU allocator freelist implementation to copy/learn from, but experimenting with such before going the per thread freelist might be interesting. Maybe not all allocators need to be treated equal either. Your memory-block issue is likely that you added { if (!instance) instance = XNEW (memory_block_pool); but as misleading as it is, XNEW doesn't invoke C++ new but just malloc so the allocated structure isn't initialized since it's constructor isn't invoked. Just use instance = new memory_block_pool; with that I get helgrind to run (without complaining!) on your testcase. I also get to compile gimple-match.c with two threads for more than one minute before crashing on some EVRP global state (somehow I knew the passes global state would be quite a distraction...). I hope the project will be motivation to cleanup the way we handle pass-specific global state. Thanks again, Richard.
Re: Doubts regarding the _Dependent_ptr keyword
On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg wrote: > Hi all, > Consider part of an example(figure 20) from doc P0190R4( > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf) > shown below: > > 1. void thread1 (void) > 2. { > 3.int * volatile p; > 4.p = rcu_dereference(gip); > 5.if (p) > 6.assert(*(p+p[0]) == 42); > 7. } > The .gimple code produced is : > > 1. thread1 () > 2. { > 3. atomic int * D.1992; > 4.int * volatile p; > 5. { > 6.atomic int * * __atomic_load_ptr; > 7. atomic int * __atomic_load_tmp; > 8.try > 9. { > 10.__atomic_load_ptr = &gip; > 11._1 = __atomic_load_8 (__atomic_load_ptr, 1); > 12._2 = (atomic int *) _1; > 13.__atomic_load_tmp = _2; > 14.D.1992 = __atomic_load_tmp; > 15. } > 16.finally > 17. { > 18.__atomic_load_tmp = {CLOBBER}; > 19. } > 20. } > 21. p = D.1992; > 22. p.2_3 = p; > 23. if (p.2_3 != 0B) goto ; else goto ; > 24. : > 25. p.3_4 = p; > 26. p.4_5 = p; > 27. _6 = *p.4_5; > 28. _7 = (long unsigned int) _6; > 29. _8 = _7 * 4; > 30. _9 = p.3_4 + _8; > 31. _10 = *_9; > 32. _11 = _10 == 42; > 33. _12 = (int) _11; > 34. assert (_12); > 35. : > 36. } > > assert at line 34 in .gimple code still breaks the dependency given by the > user. I believe, there should be some ssa defined variable of p or p itself > in assert. This is happening when I am considering pointer as volatile > qualified. If I consider it as _Dependent_ptr qualified then it surely > produces the broken dependency code. Let me know, if I wrong somewhere. > > p appears as memory here which we load its value to p.3_4 which we then offset by _8 and load from the computed address into _10 which then appears in the assert condition. I think that's as good as it can get ... Richard. > -Akshat > > > > > On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg wrote: > >> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill wrote: >> >>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney >>> wrote: >>> > >>> > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote: >>> > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg >>> wrote: >>> > > >>> > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan < >>> > > > ramana@googlemail.com> wrote: >>> > > > >>> > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg >>> wrote: >>> > > >> > >>> > > >> > As we have some working front-end code for _Dependent_ptr, What >>> should >>> > > >> we do next? What I understand, we can start adding the library for >>> > > >> dependent_ptr and its functions for C corresponding to the ones >>> we created >>> > > >> as C++ template library. Then, after that, we can move on to >>> generating the >>> > > >> assembly code part. >>> > > >> > >>> > > >> >>> > > >> >>> > > >> I think the next step is figuring out how to model the Dependent >>> > > >> pointer information in the IR and figuring out what optimizations >>> to >>> > > >> allow or not with that information. At this point , I suspect we >>> need >>> > > >> a plan on record and have the conversation upstream on the lists. >>> > > >> >>> > > >> I think we need to put down a plan on record. >>> > > >> >>> > > >> Ramana >>> > > > >>> > > > [CCing gcc mailing list] >>> > > > >>> > > > So, shall I start looking over the pointer optimizations only and >>> see what >>> > > > information we may be needed on the same examples in the IR itself? >>> > > > >>> > > > - Akshat >>> > > > >>> > > I have coded an example where equality comparison kills dependency >>> from the >>> > > document P0190R4 as shown below : >>> > > >>> > > 1. struct rcutest rt = {1, 2, 3}; >>> > > 2. void thread0 () >>> > > 3. { >>> > > 4. rt.a = -42; >>> > > 5. rt.b = -43; >>> > > 6. rt.c = -44; >>> > > 7. rcu_assign_pointer(gp, &rt); >>> > > 8. } >>> > > 9. >>> > > 10. void thread1 () >>> > > 11. { >>> > > 12.int i = -1; >>> > > 13.int j = -1; >>> > > 14._Dependent_ptr struct rcutest *p; >>> > > 15. >>> > > 16.p = rcu_dereference(gp); >>> > > 17.j = p->a; >>> > > 18. if (p == &rt) >>> > > 19.i = p->b; /*Dependency breaking point*/ >>> > > 20. else if(p) >>> > > 21. i = p->c; >>> > > 22. assert(i<0); >>> > > 23. assert(j<0); >>> > > 24. } >>> > > The gimple unoptimized code produced for lines 17-24 is shown below >>> > > >>> > > 1. if (p_16 == &rt) >>> > > 2. goto ; [INV] >>> > > 3. else >>> > > 4.goto ; [INV] >>> > > 5. >>> > > 6. : >>> > > 7. i_19 = p_16->b; >>> > > 8. goto ; [INV] >>> > > 9. >>> > > 10. : >>> > > 11. if (p_16 != 0B) >>> > > 12.goto ; [INV] >>> > > 13. else >>> > > 14.goto ; [INV] >>> > > 15. >>> > > 16. : >>> > > 17. i_18 = p_16->c; >>> > > 18. >>> > > 19. : >>> > > 20. # i_7 = PHI >>> > > 21. _3 = i_7 < 0; >>> > > 22. _4 = (int) _3; >>> > > 23. assert (_4); >>> > > 24. _5 = j_17 < 0; >>> > > 25. _6 = (int) _5; >>> > > 26. assert (_6); >>> > > 27. return; >>> > > >>> > > The optimize
Re: Doubts regarding the _Dependent_ptr keyword
On Mon, Jul 22, 2019 at 2:11 PM Richard Biener wrote: > On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg wrote: > >> Hi all, >> Consider part of an example(figure 20) from doc P0190R4( >> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf) >> shown below: >> >> 1. void thread1 (void) >> 2. { >> 3.int * volatile p; >> 4.p = rcu_dereference(gip); >> 5.if (p) >> 6.assert(*(p+p[0]) == 42); >> 7. } >> The .gimple code produced is : >> >> 1. thread1 () >> 2. { >> 3. atomic int * D.1992; >> 4.int * volatile p; >> 5. { >> 6.atomic int * * __atomic_load_ptr; >> 7. atomic int * __atomic_load_tmp; >> 8.try >> 9. { >> 10.__atomic_load_ptr = &gip; >> 11._1 = __atomic_load_8 (__atomic_load_ptr, 1); >> 12._2 = (atomic int *) _1; >> 13.__atomic_load_tmp = _2; >> 14.D.1992 = __atomic_load_tmp; >> 15. } >> 16.finally >> 17. { >> 18.__atomic_load_tmp = {CLOBBER}; >> 19. } >> 20. } >> 21. p = D.1992; >> 22. p.2_3 = p; >> 23. if (p.2_3 != 0B) goto ; else goto ; >> 24. : >> 25. p.3_4 = p; >> 26. p.4_5 = p; >> 27. _6 = *p.4_5; >> 28. _7 = (long unsigned int) _6; >> 29. _8 = _7 * 4; >> 30. _9 = p.3_4 + _8; >> 31. _10 = *_9; >> 32. _11 = _10 == 42; >> 33. _12 = (int) _11; >> 34. assert (_12); >> 35. : >> 36. } >> >> assert at line 34 in .gimple code still breaks the dependency given by >> the user. I believe, there should be some ssa defined variable of p or p >> itself in assert. This is happening when I am considering pointer as >> volatile qualified. If I consider it as _Dependent_ptr qualified then it >> surely produces the broken dependency code. Let me know, if I wrong >> somewhere. >> >> > p appears as memory here which we load its value to p.3_4 which we then > offset by _8 and load from the > computed address into _10 which then appears in the assert condition. I > think that's as good as it can > get ... > > Richard. > Thank you for your reply. For, the same example above, consider this ( https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402) instruction at rtl level changed form this ( https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231) during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking any dependencies? -Akshat >> >> >> >> >> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg wrote: >> >>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill wrote: >>> On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney wrote: > > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote: > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg wrote: > > > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan < > > > ramana@googlemail.com> wrote: > > > > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg wrote: > > >> > > > >> > As we have some working front-end code for _Dependent_ptr, What should > > >> we do next? What I understand, we can start adding the library for > > >> dependent_ptr and its functions for C corresponding to the ones we created > > >> as C++ template library. Then, after that, we can move on to generating the > > >> assembly code part. > > >> > > > >> > > >> > > >> I think the next step is figuring out how to model the Dependent > > >> pointer information in the IR and figuring out what optimizations to > > >> allow or not with that information. At this point , I suspect we need > > >> a plan on record and have the conversation upstream on the lists. > > >> > > >> I think we need to put down a plan on record. > > >> > > >> Ramana > > > > > > [CCing gcc mailing list] > > > > > > So, shall I start looking over the pointer optimizations only and see what > > > information we may be needed on the same examples in the IR itself? > > > > > > - Akshat > > > > > I have coded an example where equality comparison kills dependency from the > > document P0190R4 as shown below : > > > > 1. struct rcutest rt = {1, 2, 3}; > > 2. void thread0 () > > 3. { > > 4. rt.a = -42; > > 5. rt.b = -43; > > 6. rt.c = -44; > > 7. rcu_assign_pointer(gp, &rt); > > 8. } > > 9. > > 10. void thread1 () > > 11. { > > 12.int i = -1; > > 13.int j = -1; > > 14._Dependent_ptr struct rcutest *p; > > 15. > > 16.p = rcu_dereference(gp); > > 17.j = p->a; > > 18. if (p == &rt) > > 19.i = p->b; /*Dependency breaking point*/ > > 20. else if(p) > > 21. i = p->c; > > 22. assert(i<0); > > 23. assert(j<0); > > 24. } > > The gimple unoptimized code produced for lin
Re: [GSoC'19] Parallelize GCC with Threads -- Second Evaluation Status
On Mon, 22 Jul 2019, Richard Biener wrote: > On Sun, 21 Jul 2019, Giuliano Belinassi wrote: > > > Hi all, > > > > Here is my second evaluation report, together with a simple program that > > I was able to compile with my parallel version of GCC. Keep in mind that > > I still have lots of concurrent issues inside the compiler and therefore > > my branch will fail to compile pretty much anything else. > > > > To reproduce my current branch, use the following steps: > > > > 1-) Clone https://gitlab.com/flusp/gcc/tree/giulianob_parallel > > > > 2-) Edit gcc/graphunit.c's variable `num_threads` to 1. > > > > 3-) Compile with --disable-bootstrap --enable-languages=c > > > > 4-) make > > > > 5-) Edit gcc/graphunit.c's variable `num_threads` to 2, for instance. > > > > 6-) make install DESTDIR="somewhere_else_that_doesnt_break_your_gcc" > > > > 7-) compile the program using -O2 > > > > I a attaching my report in markdown format, which you can convert to pdf > > using `pandoc` if you find it difficult to read in the current format. > > > > I am also open to suggestions. Please do not hesitate to comment :) > > Thanks for the report and it's great that you are making progress! > > I suggest you add a --param (edit params.def) so one can choose > num_threads on the command-line instead of needing to recompile GCC. > Just keep the default "safe" so that GCC build itself will still work. > > For most of the allocators I think that in the end we want to > keep most of them global but have either per-thread freelists > or a freelist implementation that can work (allocate and free) > without locking, employing some RCU scheme. Not introducing > per-thread state is probably leaner on the implementation. > It would of course mean taking a lock when the freelist needs to > be re-filled from the main pool but that's hopefully not common. > I don't know a RCU allocator freelist implementation to copy/learn > from, but experimenting with such before going the per thread freelist > might be interesting. Maybe not all allocators need to be treated > equal either. > > Your memory-block issue is likely that you added > > { > if (!instance) > instance = XNEW (memory_block_pool); > > but as misleading as it is, XNEW doesn't invoke C++ new but > just malloc so the allocated structure isn't initialized > since it's constructor isn't invoked. Just use > > instance = new memory_block_pool; > > with that I get helgrind to run (without complaining!) on your > testcase. I also get to compile gimple-match.c with two threads > for more than one minute before crashing on some EVRP global > state (somehow I knew the passes global state would be quite a > distraction...). > > I hope the project will be motivation to cleanup the way we > handle pass-specific global state. Btw, to get to "working" state quicker you might consider concentrating on a pass subset for which you can conveniently restrict optimization to just -Og, effectively parallelizing pass_all_optimizations_g only, you then probably hit more issues in infrastructure which is more interesting for the project (we know there's a lot of pass-specific global state...). Of course the time spent in pass_all_optimizations_g is minimal... I then hit tree-ssa-live.c:usedvars quickly (slap __thread on it) and after that the EVRP issue via the sprintf_length pass. Richard.
Re: Doubts regarding the _Dependent_ptr keyword
On Mon, Jul 22, 2019 at 10:54 AM Akshat Garg wrote: > On Mon, Jul 22, 2019 at 2:11 PM Richard Biener > wrote: > >> On Mon, Jul 22, 2019 at 2:27 AM Akshat Garg wrote: >> >>> Hi all, >>> Consider part of an example(figure 20) from doc P0190R4( >>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0190r4.pdf) >>> shown below: >>> >>> 1. void thread1 (void) >>> 2. { >>> 3.int * volatile p; >>> 4.p = rcu_dereference(gip); >>> 5.if (p) >>> 6.assert(*(p+p[0]) == 42); >>> 7. } >>> The .gimple code produced is : >>> >>> 1. thread1 () >>> 2. { >>> 3. atomic int * D.1992; >>> 4.int * volatile p; >>> 5. { >>> 6.atomic int * * __atomic_load_ptr; >>> 7. atomic int * __atomic_load_tmp; >>> 8.try >>> 9. { >>> 10.__atomic_load_ptr = &gip; >>> 11._1 = __atomic_load_8 (__atomic_load_ptr, 1); >>> 12._2 = (atomic int *) _1; >>> 13.__atomic_load_tmp = _2; >>> 14.D.1992 = __atomic_load_tmp; >>> 15. } >>> 16.finally >>> 17. { >>> 18.__atomic_load_tmp = {CLOBBER}; >>> 19. } >>> 20. } >>> 21. p = D.1992; >>> 22. p.2_3 = p; >>> 23. if (p.2_3 != 0B) goto ; else goto ; >>> 24. : >>> 25. p.3_4 = p; >>> 26. p.4_5 = p; >>> 27. _6 = *p.4_5; >>> 28. _7 = (long unsigned int) _6; >>> 29. _8 = _7 * 4; >>> 30. _9 = p.3_4 + _8; >>> 31. _10 = *_9; >>> 32. _11 = _10 == 42; >>> 33. _12 = (int) _11; >>> 34. assert (_12); >>> 35. : >>> 36. } >>> >>> assert at line 34 in .gimple code still breaks the dependency given by >>> the user. I believe, there should be some ssa defined variable of p or p >>> itself in assert. This is happening when I am considering pointer as >>> volatile qualified. If I consider it as _Dependent_ptr qualified then it >>> surely produces the broken dependency code. Let me know, if I wrong >>> somewhere. >>> >>> >> p appears as memory here which we load its value to p.3_4 which we then >> offset by _8 and load from the >> computed address into _10 which then appears in the assert condition. I >> think that's as good as it can >> get ... >> >> Richard. >> > > Thank you for your reply. For, the same example above, consider this ( > https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L402) > instruction at rtl level changed form this ( > https://github.com/AKG001/rtl_opt/blob/master/p0190r4_fig20.c.239r.cse1#L231) > during the cse pass. The variable p.2_3 gets replaced by a temporary _1 but > _1 is not any dependent pointer where, p.2_3 was. Is this also not breaking > any dependencies > I'm not sure. In general CSE can break dependences. If the dependent pointer chain needs to conver multiple levels of indirections from the original atomic operation you need to make sure to not expose atomics as CSEable. Thus on RTL have them all UNSPECs. Richard. > -Akshat >>> >>> >>> >>> >>> On Wed, Jul 17, 2019 at 4:23 PM Akshat Garg wrote: >>> On Tue, Jul 2, 2019 at 9:06 PM Jason Merrill wrote: > On Mon, Jul 1, 2019 at 8:59 PM Paul E. McKenney > wrote: > > > > On Tue, Jul 02, 2019 at 05:58:48AM +0530, Akshat Garg wrote: > > > On Tue, Jun 25, 2019 at 9:49 PM Akshat Garg > wrote: > > > > > > > On Tue, Jun 25, 2019 at 4:04 PM Ramana Radhakrishnan < > > > > ramana@googlemail.com> wrote: > > > > > > > >> On Tue, Jun 25, 2019 at 11:03 AM Akshat Garg > wrote: > > > >> > > > > >> > As we have some working front-end code for _Dependent_ptr, > What should > > > >> we do next? What I understand, we can start adding the library > for > > > >> dependent_ptr and its functions for C corresponding to the ones > we created > > > >> as C++ template library. Then, after that, we can move on to > generating the > > > >> assembly code part. > > > >> > > > > >> > > > >> > > > >> I think the next step is figuring out how to model the Dependent > > > >> pointer information in the IR and figuring out what > optimizations to > > > >> allow or not with that information. At this point , I suspect > we need > > > >> a plan on record and have the conversation upstream on the > lists. > > > >> > > > >> I think we need to put down a plan on record. > > > >> > > > >> Ramana > > > > > > > > [CCing gcc mailing list] > > > > > > > > So, shall I start looking over the pointer optimizations only > and see what > > > > information we may be needed on the same examples in the IR > itself? > > > > > > > > - Akshat > > > > > > > I have coded an example where equality comparison kills dependency > from the > > > document P0190R4 as shown below : > > > > > > 1. struct rcutest rt = {1, 2, 3}; > > > 2. void thread0 () > > > 3. { > > > 4. rt.a = -42; > > > 5. rt.b = -43; > > > 6. rt.c = -44; > > > 7. rcu_assign_pointer(gp, &rt); > > > 8. } > > > 9. > > >
gcc/config/arch/arch.opt: Option mask gen problem
Hi all, Is it possible, in the arch.opt file, to have GCC generate a bitmask relative to a user-defined variable without an associated name? To illustrate my problem, consider the following option file snippet: ... Variable HOST_WIDE_INT riscv_bitmanip_flags = 0 ... mbmi-zbb Target Mask(BITMANIP_ZBB) Var(riscv_bitmanip_flags) Support the base subset of the Bitmanip extension. ... This generates the following lines in the build/gcc/options.h (marker added by me for clarity): ... #define OPTION_MASK_BITMANIP_ZBB (HOST_WIDE_INT_1U << 0) // #define OPTION_MASK_BITMANIP_ZBC (HOST_WIDE_INT_1U << 1) #define OPTION_MASK_BITMANIP_ZBE (HOST_WIDE_INT_1U << 2) #define OPTION_MASK_BITMANIP_ZBF (HOST_WIDE_INT_1U << 3) #define OPTION_MASK_BITMANIP_ZBM (HOST_WIDE_INT_1U << 4) #define OPTION_MASK_BITMANIP_ZBP (HOST_WIDE_INT_1U << 5) #define OPTION_MASK_BITMANIP_ZBR (HOST_WIDE_INT_1U << 6) #define OPTION_MASK_BITMANIP_ZBS (HOST_WIDE_INT_1U << 7) #define OPTION_MASK_BITMANIP_ZBT (HOST_WIDE_INT_1U << 8) #define MASK_DIV (1U << 0) #define MASK_EXPLICIT_RELOCS (1U << 1) #define MASK_FDIV (1U << 2) #define MASK_SAVE_RESTORE (1U << 3) #define MASK_STRICT_ALIGN (1U << 4) #define MASK_64BIT (1U << 5) #define MASK_ATOMIC (1U << 6) #define MASK_BITMANIP (1U << 7) #define MASK_DOUBLE_FLOAT (1U << 8) #define MASK_HARD_FLOAT (1U << 9) #define MASK_MUL (1U << 10) #define MASK_RVC (1U << 11) #define MASK_RVE (1U << 12) ... But, I don't want the user to be able to pass "-mbmi-zbb" or "-mno-bmi-zbb" on the command line: I only want the generation of the `x_riscv_bitmanip_flags` variable, and the associated bitmasks so that I can use them elsewhere in the backend code. So, I remove the name and description from the entry, like so: ... Target Mask(BITMANIP_ZBB) Var(riscv_bitmanip_flags) ... But now, in the build/gcc/options.h file, the bitmask becomes relative to the generic `x_target_flags` variable: #define OPTION_MASK_BITMANIP_ZBC (HOST_WIDE_INT_1U << 0) #define OPTION_MASK_BITMANIP_ZBE (HOST_WIDE_INT_1U << 1) #define OPTION_MASK_BITMANIP_ZBF (HOST_WIDE_INT_1U << 2) #define OPTION_MASK_BITMANIP_ZBM (HOST_WIDE_INT_1U << 3) #define OPTION_MASK_BITMANIP_ZBP (HOST_WIDE_INT_1U << 4) #define OPTION_MASK_BITMANIP_ZBR (HOST_WIDE_INT_1U << 5) #define OPTION_MASK_BITMANIP_ZBS (HOST_WIDE_INT_1U << 6) #define OPTION_MASK_BITMANIP_ZBT (HOST_WIDE_INT_1U << 7) #define MASK_DIV (1U << 0) #define MASK_EXPLICIT_RELOCS (1U << 1) #define MASK_FDIV (1U << 2) #define MASK_SAVE_RESTORE (1U << 3) #define MASK_STRICT_ALIGN (1U << 4) #define MASK_64BIT (1U << 5) #define MASK_ATOMIC (1U << 6) #define MASK_BITMANIP (1U << 7) #define MASK_DOUBLE_FLOAT (1U << 8) #define MASK_HARD_FLOAT (1U << 9) #define MASK_MUL (1U << 10) #define MASK_RVC (1U << 11) #define MASK_RVE (1U << 12) #define MASK_BITMANIP_ZBB (1U << 13) // Could someone suggest as to a way to get around this problem in the .opt file? Best Regards, Maxim
Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?
* Richard Biener: > On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer wrote: >> >> * Romain Geissler: >> >> > That may fly in the open source world, however I expect some vendors >> > shipping proprietary code might be fine with assembly/LTO >> > representation of their product, but not source. >> >> They can't ship LTO today anyway due to the format incompatibility, so >> that's not really an argument against source-based LTO. > > Source-based LTO doesn't really work unless you can re-synthesize > source from the IL. At least I don't see how you can do whole-program > analysis on source and then cut it into appropriate pieces, duplicating > some things and some not to make up for the parallel final compile step. Oh, I meant using source code only as a portable serialization of the program, instead of serializing unstable, compiler-specific IR. If the whole program does not fit into memory, the compiler will still have to maintain on-disk data structures, but at least there wouldn't a compatibility aspect to those anymore. Thanks, Florian
Re: [EXT] Re: Can LTO minor version be updated in backward compatible way ?
On Mon, Jul 22, 2019 at 1:15 PM Florian Weimer wrote: > > * Richard Biener: > > > On Fri, Jul 19, 2019 at 10:30 AM Florian Weimer wrote: > >> > >> * Romain Geissler: > >> > >> > That may fly in the open source world, however I expect some vendors > >> > shipping proprietary code might be fine with assembly/LTO > >> > representation of their product, but not source. > >> > >> They can't ship LTO today anyway due to the format incompatibility, so > >> that's not really an argument against source-based LTO. > > > > Source-based LTO doesn't really work unless you can re-synthesize > > source from the IL. At least I don't see how you can do whole-program > > analysis on source and then cut it into appropriate pieces, duplicating > > some things and some not to make up for the parallel final compile step. > > Oh, I meant using source code only as a portable serialization of the > program, instead of serializing unstable, compiler-specific IR. If the > whole program does not fit into memory, the compiler will still have to > maintain on-disk data structures, but at least there wouldn't a > compatibility aspect to those anymore. OK, but then we'd need to re-do the compile and IPA analysis stage at each link with the appropriate frontend. But sure, that would be possible. Richard. > Thanks, > Florian
Re: Can LTO minor version be updated in backward compatible way ?
On 7/17/19 8:10 PM, Jeff Law wrote: > On 7/17/19 11:29 AM, Andi Kleen wrote: >> Romain Geissler writes: >>> >>> I have no idea of the LTO format and if indeed it can easily be updated >>> in a backward compatible way. But I would say it would be nice if it >>> could, and would allow adoption for projects spread on many teams >>> depending on each others and unable to re-build everything at each >>> toolchain update. >> >> Right now any change to an compiler option breaks the LTO format >> in subtle ways. In fact even the minor changes that are currently >> done are not frequent enough to catch all such cases. >> >> So it's unlikely to really work. > Right and stable LTO bytecode really isn't on the radar at this time. > > IMHO it's more important right now to start pushing LTO into the > mainstream for the binaries shipped by the vendors (and stripping the > LTO bits out of any static libraries/.o's shipped by the vendors). > > > SuSE's announcement today is quite ironic. Why and what is ironic about it? > Red Hat's toolchain team is > planning to propose switching to LTO by default for Fedora 32 and were > working through various details yesterday. Great! > Our proposal will almost > certainly include stripping out the LTO bits from .o's and any static > libraries. Yes, we do it as well for now. Martin > > Jeff >
Re: Can LTO minor version be updated in backward compatible way ?
On 7/22/19 8:25 AM, Martin Liška wrote: > On 7/17/19 8:10 PM, Jeff Law wrote: >> On 7/17/19 11:29 AM, Andi Kleen wrote: >>> Romain Geissler writes: I have no idea of the LTO format and if indeed it can easily be updated in a backward compatible way. But I would say it would be nice if it could, and would allow adoption for projects spread on many teams depending on each others and unable to re-build everything at each toolchain update. >>> >>> Right now any change to an compiler option breaks the LTO format >>> in subtle ways. In fact even the minor changes that are currently >>> done are not frequent enough to catch all such cases. >>> >>> So it's unlikely to really work. >> Right and stable LTO bytecode really isn't on the radar at this time. >> >> IMHO it's more important right now to start pushing LTO into the >> mainstream for the binaries shipped by the vendors (and stripping the >> LTO bits out of any static libraries/.o's shipped by the vendors). >> >> >> SuSE's announcement today is quite ironic. > > Why and what is ironic about it? Sorry, you'd have to have internal context -- we'd been discussing it within the Red Hat team for Fedora 32 the previous day. One of the questions that came up was whether or not any other major distributor was shipping with LTO enabled :-) Jeff
Re: [RFC] Disabling ICF for interrupt functions
Hi, On Fri, 19 Jul 2019 16:32:21 +0300 (MSK) Alexander Monakov wrote: > On Fri, 19 Jul 2019, Jozef Lawrynowicz wrote: > > > For MSP430, the folding of identical functions marked with the "interrupt" > > attribute by -fipa-icf-functions results in wrong code being generated. > > Interrupts have different calling conventions than regular functions, so > > inserting a CALL from one identical interrupt to another is not correct and > > will result in stack corruption. > > But ICF by creating an alias would be fine, correct? As I understand, the > real issue here is that gcc does not know how to correctly emit a call to > "interrupt" functions (because they have unusual ABI and exist basically to > have their address stored somewhere). Yes I presume in most cases an alias would be ok. It's just that users sometimes do funky things with interrupt functions to achieve the best possible performance for their programs, so I wouldn't want to rule out that identical interrupts may need distinct addresses in some situations. I cannot think of a use case for that right now though. So having the option to disable it somehow would be desirable. > > So I think the solution shouldn't be in disabling ICF altogether, but rather > in adding a way to recognize that a function has quasi-unknown ABI and thus > not directly callable (so any other optimization can see that it may not emit > a call to this function), then teaching ICF to check that when deciding to > fold by creating a wrapper. I agree, this is a nice suggestion. "call" instructions should be not be allowed to be generated at all for MSP430 (and whichever other targets) interrupt functions. Whether that be coming from the user explicitly calling the interrupt from their code, or GCC generating the call. This would have to be caught at the point that an optimization pass first considers inserting a CALL to the interrupt, i.e., if the machine description tries to prevent the generation of a call to an interrupt function once the RTL has been generated (e.g. by blanking on the define_expand for "call"), we are going to have ICEs/wrong code generated a lot of the time. Particularly in the case originally mentioned here - there would be an empty interrupt function. > > (would it be possible to tell ICF that addresses of interrupt functions are > not significant so it can fold them by creating aliases?) I'll take a look. Thanks, Jozef > > Alexander
Re: Can LTO minor version be updated in backward compatible way ?
On Wed, Jul 17, 2019 at 2:10 PM Jeff Law wrote: > > ... > SuSE's announcement today is quite ironic. Red Hat's toolchain team is > planning to propose switching to LTO by default for Fedora 32 and were > working through various details yesterday. Our proposal will almost > certainly include stripping out the LTO bits from .o's and any static > libraries. Be sure to include an ARMv7 test case where on source file uses a the default arch flags, and one source file uses -march=armv7-a -mfpu=neon. (With runtime feature checking): for example: a.cpp - default flags b.cpp - -march=armv7-a -mfpu=neon We can't seem to get around errors like this during link driven through GCC: [ 303s] /usr/lib/gcc/armv7hl-suse-linux-gnueabi/9/include/arm_neon.h:4835:48: fatal error: You must enable NEON instructions (e.g. '-mfloat-abi=softfp' '-mfpu=neon') to use these intrinsics. [ 303s] 4835 | return (uint32x4_t)__builtin_neon_vshl_nv4si ((int32x4_t) __a, __b); [ 303s] |^ [ 303s] compilation terminated. The only thing we have found to sidestep the problem is, disable LTO for ARM. Jeff
Re: [RFC] Disabling ICF for interrupt functions
On Mon, 22 Jul 2019, Jozef Lawrynowicz wrote: > This would have to be caught at the point that an optimization pass > first considers inserting a CALL to the interrupt, i.e., if the machine > description tries to prevent the generation of a call to an interrupt function > once the RTL has been generated (e.g. by blanking on the define_expand for > "call"), we are going to have ICEs/wrong code generated a lot of the time. > Particularly in the case originally mentioned here - there would be an empty > interrupt function. Yeah, I imagine it would need to be a new target hook direct_call_allowed_p receiving a function decl, or something like that. > > (would it be possible to tell ICF that addresses of interrupt functions are > > not significant so it can fold them by creating aliases?) > > I'll take a look. Sorry, I didn't say explicitly, but that was meant more as a remark to IPA maintainers: currently in GCC "address taken" implies "address significant", so "address not significant" would have to be a new attribute, or a new decl bit (maybe preferable for languages where function addresses are not significant by default). Alexander
Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
On Sun, Jul 21, 2019 at 04:46:33PM +0900, 김규래 wrote: > About the snippet below, > > if (gomp_barrier_last_thread (state)) > { > if (team->task_count == 0) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > return; > } > gomp_team_barrier_set_waiting_for_tasks (&team->barrier); > } > > Am I safe to assume that gomp_barrier_last_thread is thread-safe? Yes, you can look up the definition. gomp_ barrier_last_thread is just a bit in the state bitmask passed to the routine, it is set on the last thread that encounters the barrier, which is figured out by doing atomic subtraction from the counter. Jakub
Re: gcc/config/arch/arch.opt: Option mask gen problem
On Mon, Jul 22, 2019 at 4:05 AM Maxim Blinov wrote: > Is it possible, in the arch.opt file, to have GCC generate a bitmask > relative to a user-defined variable without an associated name? To > illustrate my problem, consider the following option file snippet: > ... > But, I don't want the user to be able to pass "-mbmi-zbb" or > "-mno-bmi-zbb" on the command line: If you don't want an option, why are you making changes to the riscv.opt file? This is specifically for supporting command line options. Adding a variable here does mean that it will automatically be saved and restored, and I can see the advantage of doing that, even if it is only indirectly tied to options. You could add a variable here, and then manually define the bitmasks yourself in riscv-opt.h or riscv.h. Or you could just add the variable to the machine_function struct in riscv.c, which will also automatically save and restore the variable. Jim
Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime
> Yes, you can look up the definition. > gomp_ barrier_last_thread is just a bit in the state bitmask passed to the > routine, it is set on the last thread that encounters the barrier, which is > figured out by doing atomic subtraction from the counter. I saw the implementation, just wanted to be sure that's the general case. Thanks. Ray Kim -Original Message- From: "Jakub Jelinek" To: "김규래"; Cc: ; Sent: 2019-07-23 (화) 03:54:13 (GMT+09:00) Subject: Re: Re: [GSoC'19, libgomp work-stealing] Task parallelism runtime On Sun, Jul 21, 2019 at 04:46:33PM +0900, 김규래 wrote: > About the snippet below, > > if (gomp_barrier_last_thread (state)) > { > if (team->task_count == 0) > { > gomp_team_barrier_done (&team->barrier, state); > gomp_mutex_unlock (&team->task_lock); > gomp_team_barrier_wake (&team->barrier, 0); > return; > } > gomp_team_barrier_set_waiting_for_tasks (&team->barrier); > } > > Am I safe to assume that gomp_barrier_last_thread is thread-safe? Yes, you can look up the definition. gomp_ barrier_last_thread is just a bit in the state bitmask passed to the routine, it is set on the last thread that encounters the barrier, which is figured out by doing atomic subtraction from the counter. Jakub
flow control statement
from which header file are they(for, while, loops) called? can you tell me where they(like the for or while loops) are stored? like a iostream! I wonder their source code. and I'll look(investigate ) at them and try to write a new header.