from:"Alexander Monakov"

Re: [PATCH][RFC] c/106800 - support vector condition operation in C

2024-07-18 Thread Alexander Monakov

On Thu, 18 Jul 2024, Richard Biener wrote: > >If both b and c are scalars and the type of true?b:c has the same size > >as the element type of a, then b and c are converted to a vector type > >whose elements have this type and with the same number of elements as a. > > > > (in https:

Re: [PATCH][RFC] c/106800 - support vector condition operation in C

2024-07-18 Thread Alexander Monakov

On Thu, 18 Jul 2024, Richard Biener wrote: > > (also, in C op2 and op3 of a ternary operator always have integer promotions > > applied, but for vector selection we should use unpromoted types) > > Yes. So a good testcase would use char typed variable then. It’s > unfortunate C and C++ do no

Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-15 Thread Alexander Monakov

build-time dependency on installed headers. So maybe we have that as an option too. Alexander On Fri, 22 Dec 2023, Alexander Monakov wrote: > From: Daniil Frolov > > PR 66487 is asking to provide sanitizer-like detection for C++ object > lifetime violations that are worked around with

[PATCH] tree-into-ssa: speed up sorting in prune_unused_phi_nodes [PR114480]

2024-05-15 Thread Alexander Monakov

In PR 114480 we are hitting a case where tree-into-ssa scales quadratically due to prune_unused_phi_nodes doing O(N log N) work for N basic blocks, for each variable individually. Sorting the 'defs' array is especially costly. It is possible to assist gcc_qsort by laying out dfs_out entries in the

Re: [x86 SSE] Improve handling of ternlog instructions in i386/sse.md (v2)

2024-05-20 Thread Alexander Monakov

Hello! I looked at ternlog a bit last year, so I'd like to offer some drive-by comments. If you want to tackle them in a follow-up patch, or leave for someone else to handle, please let me know. On Fri, 17 May 2024, Roger Sayle wrote: > This revised patch has been tested on x86_64-pc-linux-gnu

Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Alexander Monakov

On Tue, 28 May 2024, Richard Biener wrote: > On Wed, May 15, 2024 at 12:59 PM Alexander Monakov wrote: > > > > > > Hello, > > > > I'd like to ask if anyone has any new thoughts on this patch. > > > > Let me also point out that valgrind/memcheck.

Re: [PATCH v2] object lifetime instrumentation for Valgrind [PR66487]

2024-05-28 Thread Alexander Monakov

On Tue, 28 May 2024, Richard Biener wrote: > > I am a bit confused what you mean by "cheaper". Could it be that we are not > > on the same page regarding the machine code behind client requests? > > Probably "cheaper" in register usage. But it doesn't matter considering that execution under Va

Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-08 Thread Alexander Monakov

295,155 branch-misses:u #0.65% of all branches ( +- 0.11% ) 0.033242 +- 0.000418 seconds time elapsed ( +- 1.26% ) Is the revised patch below still ok? I've rolled the configury changes into it, and dropped the (now unnecessary) AVX2 helper

Re: [PATCH] fix single argument static_assert

2024-08-22 Thread Alexander Monakov

On Thu, 22 Aug 2024, Marc Poulhiès wrote: > Single argument static_assert is C++17 only. > > libcpp/ChangeLog: > > * lex.cc: fix static_assert to use 2 arguments. When pushing, please fix the entry to mention the function name if that's not too much trouble: * lex.cc (search_line

[PATCH] libcpp: bump padding size in _cpp_convert_input [PR116458]

2024-08-22 Thread Alexander Monakov

The recently introduced search_line_fast_ssse3 raised padding requirement from 16 to 64, which was adjusted in read_file_guts, but the corresponding ' + 16' in _cpp_convert_input was overlooked. libcpp/ChangeLog: PR preprocessor/116458 * charset.cc (_cpp_convert_input): Bump paddi

[PATCH] libcpp: deduplicate definition of padding size

2024-08-24 Thread Alexander Monakov

Tie together the two functions that ensure tail padding with search_line_ssse3 via CPP_BUFFER_PADDING macro. libcpp/ChangeLog: * internal.h (CPP_BUFFER_PADDING): New macro; use it ... * charset.cc (_cpp_convert_input): ...here, and ... * files.cc (read_file_guts): ...here,

Re: [PATCH 0/3] [APX CFCMOV] Support APX CFCMOV

2024-06-14 Thread Alexander Monakov

On Fri, 14 Jun 2024, Kong, Lingling wrote: > APX CFCMOV[1] feature implements conditionally faulting which means that all > memory faults are suppressed > when the condition code evaluates to false and load or store a memory > operand. Now we could load or store a > memory operand may trap or

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Alexander Monakov

On Tue, 30 Jul 2024, Richard Biener wrote: > > Oh, and please add a small comment why we don't use XFmode here. > > Will do. > > /* Do not enable XFmode, there is padding in it and it suffers >from normalization upon load like SFmode and DFmode when >not using S

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Alexander Monakov

On Tue, 30 Jul 2024, Jakub Jelinek wrote: > On Tue, Jul 30, 2024 at 03:43:25PM +0300, Alexander Monakov wrote: > > > > On Tue, 30 Jul 2024, Richard Biener wrote: > > > > > > Oh, and please add a small comment why we don't use XFmode here. > > >

Re: [PATCH 2/3][x86][v2] implement TARGET_MODE_CAN_TRANSFER_BITS

2024-07-30 Thread Alexander Monakov

On Tue, Jul 30, 2024 at 03:00:49PM +0200, Richard Biener wrote: > > What mangling fld performs depends on the contents of the FP control > > word which is awkward. For float/double loads (FLDS and FLDL) we know format conversion changes SNaNs to QNaNs, but it's a widening conversion, so e.g. ro

Re: [PATCH 2/2] Add AVX2 code path to lexer

2024-07-30 Thread Alexander Monakov

Hi, On Tue, 30 Jul 2024, Andi Kleen wrote: > AVX2 is widely available on x86 and it allows to do the scanner line > check with 32 bytes at a time. The code is similar to the SSE2 code > path, just using AVX and 32 bytes at a time instead of SSE2 16 bytes. > > Also adjust the code to allow inlini

Re: [PATCH 2/2] Add AVX2 code path to lexer

2024-07-30 Thread Alexander Monakov

On Tue, 30 Jul 2024, Andi Kleen wrote: > > I have looked at this code before. When AVX2 is available, so is SSSE3, > > and then a much more efficient approach is available: instead of comparing > > against \r \n \\ ? one-by-one, build a vector > > > > 0 1 2 3 4 5 6 7 8 9a bc

[PATCH 0/3] libcpp: improve x86 vectorized helpers

2024-08-06 Thread Alexander Monakov

Hello! As discussed, I'm sending patches that reimplement our SSE4.2 search_line_fast helper with SSSE3, and then add the corresponding AVX2 helper. They are on top of Andi's "Remove MMX code path in lexer" patch, which was approved, but not committed yet (Andi, can you push your own patch?). App

[PATCH 1/3] libcpp: configure: check for AVX2 instead of SSE4

2024-08-06 Thread Alexander Monakov

Upcoming patches first drop Binutils ISA support from SSE4.2 to SSSE3, then bump it to AVX2. Instead of fiddling with detection, just bump our configure check to AVX2 immediately: if by some accident somebody builds GCC without AVX2 support in the assembler, they will get SSE2 vectorized lexer, whi

[PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-06 Thread Alexander Monakov

Since the characters we are searching for (CR, LF, '\', '?') all have distinct ASCII codes mod 16, PSHUFB can help match them all at once. libcpp/ChangeLog: * lex.cc (search_line_sse42): Replace with... (search_line_ssse3): ... this new function. Adjust the use... (init_v

[PATCH 3/3] libcpp: add AVX2 helper

2024-08-06 Thread Alexander Monakov

Use the same PSHUFB-based matching as in the SSSE3 helper, just 2x wider. Directly use the new helper if __AVX2__ is defined. It makes the other helpers unused, so mark them inline to prevent warnings. Rewrite and simplify init_vectorized_lexer. libcpp/ChangeLog: * files.cc (read_file_g

Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-06 Thread Alexander Monakov

On Tue, 6 Aug 2024, Alexander Monakov wrote: > --- a/libcpp/files.cc > +++ b/libcpp/files.cc [...] > + pad = HAVE_AVX2 ? 32 : 16; This should have been #ifdef HAVE_AVX2 pad = 32; #else pad = 16; #endif Alexander

Re: [PATCH 3/3] libcpp: add AVX2 helper

2024-08-07 Thread Alexander Monakov

On Wed, 7 Aug 2024, Richard Biener wrote: > OK with that change. > > Did you think about a AVX512 version (possibly with 32 byte vectors)? > In case there's a more efficient variant of pshufb/pmovmskb available > there - possibly > the load on the branch unit could be lessened with using maskin

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov

On Wed, 7 Aug 2024, Richard Biener wrote: > > > + data = *(const v16qi_u *)s; > > > + /* Prevent propagation into pshufb and pcmp as memory operand. */ > > > + __asm__ ("" : "+x" (data)); > > > > It would probably make sense to a file a PR on this separately, > > to eventually fi

Re: [PATCH 2/3] libcpp: replace SSE4.2 helper with an SSSE3 one

2024-08-07 Thread Alexander Monakov

On Wed, 7 Aug 2024, Richard Biener wrote: > > > This is probably to work around bugs in older compiler versions? If > > > not I agree. > > > > This is deliberate hand-tuning to avoid a subtle issue: pshufb is not > > macro-fused on Intel, so with propagation it is two uops early in the > > CPU

Re: [C++ patch] Set attributes for C++ runtime library calls

2013-08-22 Thread Alexander Monakov

On Thu, 22 Aug 2013, Gabriel Dos Reis wrote: > > - I would like to recall issue if we can make NEW_EXPR annotated with > >MALLOC attribute. Without it, it is basically impossible to track > >any dynamically allocated objects in the middle-end > > operator new is replaceable by user progr

Re: [C++ patch] Set attributes for C++ runtime library calls

2013-08-22 Thread Alexander Monakov

On Thu, 22 Aug 2013, Jan Hubicka wrote: > > On Thu, 22 Aug 2013, Gabriel Dos Reis wrote: > > > > - I would like to recall issue if we can make NEW_EXPR annotated with > > > >MALLOC attribute. Without it, it is basically impossible to track > > > >any dynamically allocated objects in th

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-04 Thread Alexander Monakov

Hello, Could you use the existing facilities instead, such as adjust_priority hook, or making the compare-branch insn sequence a SCHED_GROUP? Alexander

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-04 Thread Alexander Monakov

On Wed, Sep 4, 2013 at 9:53 PM, Steven Bosscher wrote: > > On Wed, Sep 4, 2013 at 10:58 AM, Alexander Monakov wrote: > > Hello, > > > > Could you use the existing facilities instead, such as adjust_priority hook, > > or making the compare-branch insn sequ

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-10 Thread Alexander Monakov

On Fri, 6 Sep 2013, Wei Mi wrote: > SCHED_GROUP works after I add chain_to_prev_insn after > add_branch_dependences, in order to chain control dependences to prev > insn for sched group. chain_to_prev_insn is done in the end of deps_analyze_insn, why is that not sufficient? Alexander

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Alexander Monakov

On Tue, 10 Sep 2013, Wei Mi wrote: > Because deps_analyze_insn only analyzes data deps but no control deps. > Control deps are included by add_branch_dependences. Without the > chain_to_prev_insn in the end of add_branch_dependences, jmp will be > control dependent on every previous insn in the

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-11 Thread Alexander Monakov

On Wed, 11 Sep 2013, Wei Mi wrote: > I tried that and it caused some regressions, so I choosed to do > chain_to_prev_insn another time in add_branch_dependences. There could > be some dependence between those two functions. (please don't top-post on this list) In that case you can adjust 'last

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-12 Thread Alexander Monakov

On Wed, 11 Sep 2013, Wei Mi wrote: > I agree with you that explicit handling in sched-deps.c for this > feature looks not good. So I move it to sched_init (Instead of > ix86_sched_init_global because ix86_sched_init_global is used to > install scheduling hooks), and then it is possible for other

Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion

2013-09-13 Thread Alexander Monakov

On Thu, 12 Sep 2013, Wei Mi wrote: > Thanks, fixed. New patch attached. Thanks. At this point you need feedback from x86 and scheduler maintainers. I would recommend you to resubmit the patch with a Changelog text, and with the text of the patch inline in the email (your last mail has the patch

Re: [PATCH][RFC] Detect and use implementations of BLAS routines

2013-10-02 Thread Alexander Monakov

Hello, You probably want to disable this transformation when the number of iterations is predicted to be small, right? Shouldn't dot product transform be predicated on -fassociative-math? Do you have a vision of a generalized pattern matcher to allow adding other routines easily? I'm curious wh

[PATCH i386] Use ordered comparisons for rounding builtins when -mno-ieee-fp

2013-10-11 Thread Alexander Monakov

, unconditionally. That looks historical, or an oversight rather than deliberate, and the following patch makes expansion use ix86_fp_compare_mode like in other places. Bootstrapped and regtested on x86_64-linux. 2013-10-11 Alexander Monakov * config/i386/i386.c

Re: patch to improve register preferencing in IRA and to remove regmove pass

2013-10-29 Thread Alexander Monakov

Hello, A very minor nit: in common.opt, entries for the options should be changed to fregmove Common Ignore Does nothing. Preserved for backward compatibility. instead of removing them altogether, so the compiler does not start rejecting build commands with such options. There are now a few suc

[gomp-nvptx 04/13] nvptx backend: add support for placing variables in shared memory

2016-01-20 Thread Alexander Monakov

This patch allows to use __attribute__((shared)) to place non-automatic variables in shared memory. * config/nvptx/nvptx.c (nvptx_encode_section_info): Handle "shared" attribute. (nvptx_handle_shared_attribute): New. Use it... (nvptx_attribute_table): ... here (new

[gomp-nvptx 03/13] nvptx backend: silence warning

2016-01-20 Thread Alexander Monakov

* config/nvptx/nvptx.c (nvptx_declare_function_name): Fix warning. --- gcc/ChangeLog.gomp-nvptx | 4 gcc/config/nvptx/nvptx.c | 2 +- 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 45aebdd..f63f840 100644 --- a/g

[gomp-nvptx 10/13] libgomp testsuite: add -foffload=-lgfortran

2016-01-20 Thread Alexander Monakov

Link libgfortran for offloaded code as well. * testsuite/libgomp.fortran/fortran.exp (lang_link_flags): Pass -foffload=-lgfortran in addition to -lgfortran. * testsuite/libgomp.oacc-fortran/fortran.exp (lang_link_flags): Ditto. --- libgomp/ChangeLog.gomp-nvptx

[gomp-nvptx 05/13] libgomp: remove sections.c, splay-tree.c

2016-01-20 Thread Alexander Monakov

This patch removes two zero-size stubs, there's no need for these overrides. * config/nvptx/section.c: Delete. * config/nvptx/splay-tree.c: Delete. --- libgomp/ChangeLog.gomp-nvptx | 5 + libgomp/config/nvptx/sections.c | 0 libgomp/config/nvptx/splay-tree.c | 0 3 file

[gomp-nvptx 01/13] nvptx backend: new patterns for OpenMP SIMD-via-SIMT

2016-01-20 Thread Alexander Monakov

This patch adds a few insn patterns used for OpenMP SIMD reduction/lastprivate/ordered lowering for SIMT execution. OpenMP lowering produces GOMP_SIMT_... internal functions when lowering SIMD constructs that can be offloaded to a SIMT device. After lto stream-in, those internal functions are tri

[gomp-nvptx 00/13] SIMD, teams, Fortran

2016-01-20 Thread Alexander Monakov

Hello, I'm pushing this patch series to the gomp-nvptx branch. It adds the following: - backend and omp-low.c changes for SIMT-style SIMD region handling - libgomp changes for running the fortran testsuite - libgomp changes for spawning multiple OpenMP teams I'll perform a trunk merge and

[gomp-nvptx 09/13] libgomp: use generic fortran.c on nvptx

2016-01-20 Thread Alexander Monakov

This patch removes the nvptx fortran.c stub that provides only _gfortran_abort. It is possible to link libgfortran on NVPTX with -foffload=-lgfortran. * config/nvptx/fortran.c: Delete. --- libgomp/ChangeLog.gomp-nvptx | 4 libgomp/config/nvptx/fortran.c | 40 -

[gomp-nvptx 06/13] libgomp: add nvptx time.c

2016-01-20 Thread Alexander Monakov

This patch implements time.c on NVPTX with the %clock64 register. The PTX documentation describes %globaltimer as explicitely off-limits for us. * config/nvptx/time.c: New. --- libgomp/ChangeLog.gomp-nvptx | 4 libgomp/config/nvptx/time.c | 49 +

[gomp-nvptx 07/13] libgomp plugin: set __nvptx_clocktick

2016-01-20 Thread Alexander Monakov

This is the libgomp plugin side of omp_clock_wtime support on NVPTX. Query GPU frequency and copy the value into the device image. At the moment CUDA driver sets GPU to a fixed frequency when a CUDA context is created (the default is to use the highest non-boost frequency, but it can be altered w

[gomp-nvptx 02/13] omp-low: extend SIMD lowering for SIMT execution

2016-01-20 Thread Alexander Monakov

This patch extends SIMD-via-SIMT lowering in omp-low.c to handle all loops, lowering reduction/lastprivate/ordered appropriately (but it still chickens out on collapsed loops, handling them as if safelen=1). New SIMT lowering snippets use new internal functions that are folded for non-SIMT targets

Re: [RFC] [nvptx] Try to cope with cuLaunchKernel returning CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES

2016-01-20 Thread Alexander Monakov

On Tue, 19 Jan 2016, Alexander Monakov wrote: > > You mean you already have implemented something along the lines I > > proposed? > > Yes, I was implementing OpenMP teams, and it made sense to add warps per block > limiting at the same time (i.e. query CU_FUNC_ATTRIBUTE_... a

[gomp-nvptx 08/13] libgomp: add nvptx lock.c

2016-01-20 Thread Alexander Monakov

uot;../../lock.c" #ifdef LIBGOMP_GNU_SYMBOL_VERSIONING /* gomp_mutex_* can be safely locked in one thread and diff --git a/libgomp/config/nvptx/lock.c b/libgomp/config/nvptx/lock.c index e69de29..7731704 100644 --- a/libgomp/config/nvptx/lock.c +++ b/libgomp/config/nvptx/lock.c @@ -0,0 +1,41

[gomp-nvptx 11/13] pick GOMP_target_ext changes from the hsa branch

2016-01-20 Thread Alexander Monakov

This adds necessary plumbing to spawn multiple teams. To be reverted on this branch prior to merge. --- gcc/builtin-types.def| 7 +- gcc/fortran/types.def| 5 +- gcc/omp-builtins.def | 2 +- gcc/omp-low.c

[gomp-nvptx 12/13] libgomp: handle multiple teams on NVPTX

2016-01-20 Thread Alexander Monakov

* config/nvptx/icv-device.c (omp_get_num_teams): Update. (omp_get_team_num): Ditto. * config/nvptx/target.c (GOMP_teams): Update. * config/nvptx/team.c (nvptx_thrs): Place in shared memory. * icv.c (gomp_num_teams_var): Define. * libgomp.h (gomp_num_t

[gomp-nvptx 13/13] libgomp plugin: handle multiple teams

2016-01-20 Thread Alexander Monakov

This complements multiple teams support on the libgomp plugin side. * plugin/plugin-nvptx.c (struct targ_fn_descriptor): Add new fields. (struct ptx_device): Ditto. Set them... (nvptx_open_device): ...here. (GOMP_OFFLOAD_load_image): Set new targ_fn_descriptor fiel

Re: [hsa merge 07/10] IPA-HSA pass

2016-01-21 Thread Alexander Monakov

On Wed, 20 Jan 2016, Ilya Verbin wrote: > I agree that OpenMP doesn't guarantee that all target regions must be executed > on the device, but in this case a user can't be sure that some library > function > always will offload (because the library might be replaced by fallback > version), > and h

Re: (Non-)offloading diagnostics (was: [hsa 0/10] Merge of HSA branch)

2016-01-26 Thread Alexander Monakov

On Tue, 26 Jan 2016, Thomas Schwinge wrote: > A very similar problem also exists for nvptx offloading (Nathan CCed), > where we emit similar warnings (enabled by default). As nvptx offloading > happens during link-time (not compile-time, as with hsa offloading), > these don't affect GCC's compile

Re: [PATCH] Remove PTX link option

2016-01-29 Thread Alexander Monakov

On Mon, 11 Jan 2016, Alexander Monakov wrote: > On Mon, 11 Jan 2016, Thomas Schwinge wrote: > > Alexander, would you please also submit a fix for that for nvptx-tools' > > nvptx-run.c? (Or want me to do that?) > > I can do that, along with another small change I used f

[PATCH] nvptx: do not use alternative spelling of unsigned comparisons

2016-02-01 Thread Alexander Monakov

Hello! The following patch fixes subtle breakage on NVPTX when unsigned comparisons would be sometimes mistranslated by PTX JIT. The new test demonstrates that by using the %nctaid.x (number of blocks) register, but a comparison against a value in constant memory can also trigger that (I could ea

Re: Default compute dimensions (runtime)

2016-02-03 Thread Alexander Monakov

Hello, On Wed, 3 Feb 2016, Nathan Sidwell wrote: > 1) extend the -fopenacc-dim=X:Y:Z syntax to allow '-' indicating a runtime > choice. (0 also indicates that, but I thought best to have an explicit syntax > as well). Does it work when the user specifies one of the dimensions, so that references

Re: Default compute dimensions (runtime)

2016-02-03 Thread Alexander Monakov

On Wed, 3 Feb 2016, Nathan Sidwell wrote: > You can only override at runtime those dimensions that you said you'd override > at runtime when you compiled your program. Ah, I see. That's not obvious to me, so perhaps added documentation can be expanded to explain that? (I now see that the plugin

[gomp-nvptx 0/5] Reorganize soft-stack setup

2016-02-15 Thread Alexander Monakov

t 3 arguments to untagged entries. Old libgomp plugins unaware of the change should be able to detect failure to provide sufficient arguments to entries emitted from new compiler from the failure of cuLaunchKernel Alexander Monakov (5): libgomp plugin: correct types Revert "nvptx plugin:

[gomp-nvptx 2/5] Revert "nvptx plugin: bump heap size to 1GB"

2016-02-15 Thread Alexander Monakov

. Revert 2015-12-09 Alexander Monakov * plugin/plugin-nvptx.c (nvptx_open_device): Adjust heap size. diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c index 79fd253..cb6a3ac 100644 --- a/libgomp/plugin/plugin-nvptx.c +++ b/libgomp/plugin/plugin

[gomp-nvptx 1/5] libgomp plugin: correct types

2016-02-15 Thread Alexander Monakov

Handling of arguments array wrongly assumed that 'long' would match the size of 'void *'. As that would break on MinGW, use 'intptr_t'. Use 'int' for 'teams' and 'threads', as that's what cuLaunchKernel accepts. * plugin/plugin-nvptx.c (nvptx_adjust_launch_bounds): Adjust types.

[gomp-nvptx 3/5] nvptx backend: set up stacks in entry code

2016-02-15 Thread Alexander Monakov

This patch implements the NVPTX backend part of the transition to host-allocated soft stacks. The compiler-emitted kernel entry code now accepts a pointer to stack storage and per-warp stack size, and initialized __nvptx_stacks based on that (as well as trivially initializing __nvptx_uni). The re

[gomp-nvptx 4/5] libgomp: remove __nvptx_stacks setup code

2016-02-15 Thread Alexander Monakov

This patch implements the NVPTX libgomp part of the transition to host-allocated soft stacks. The wrapper around gomp_nvptx_main previously responsible for that is no longer needed. This mostly reverts commit b408f1293e29a009ba70a3fda7b800277e1f310a. * config/nvptx/team.c: (gomp_nvptx_ma

[gomp-nvptx 5/5] libgomp plugin: manage soft-stack storage

2016-02-15 Thread Alexander Monakov

This patch implements the libgomp plugin part of the transition to host-allocated soft stacks. For now only a simple scheme with allocation/deallocation per launch is implemented; a followup change is planned to cache and reuse allocations when appropriate. The call to cuLaunchKernel is changed t

Re: [gomp-nvptx 3/5] nvptx backend: set up stacks in entry code

2016-02-22 Thread Alexander Monakov

On Mon, 22 Feb 2016, Nathan Sidwell wrote: > On 02/15/16 13:44, Alexander Monakov wrote: > > This patch implements the NVPTX backend part of the transition to > > > + static const char template64[] = ENTRY_TEMPLATE ("64", "8", > > "ma

Re: [gomp-nvptx 3/5] nvptx backend: set up stacks in entry code

2016-02-22 Thread Alexander Monakov

On Mon, 22 Feb 2016, Nathan Sidwell wrote: > On 02/22/16 15:25, Alexander Monakov wrote: > > > Template strings have an embedded nul character at the position where ORIG > > goes, so template_2 is set to point at the position following the embedded > > nul > >

Re: [PTX] port some cleanups from gomp4

2016-03-03 Thread Alexander Monakov

Hello Nathan, On Wed, 9 Sep 2015, Nathan Sidwell wrote: > I've applied this patch to port some cleanups, mainly formatting and loop > idioms from the gomp4 branch. This patch that you committed to trunk in September 2015 forcefully disables generation of line number information, undoing a part of

Re: [AArch64] Tighten direct call pattern to repair -fno-plt

2015-07-16 Thread Alexander Monakov

> Attachment is the patch which repair -fno-plt support for AArch64. > > aarch64_is_noplt_call_p will only be true if: > > * gcc is generating position independent code. > * function symbol has declaration. > * either -fno-plt or "(no_plt)" attribute specified. > * it's a external functio

[gomp4 1/8] nvptx: remove assumption of OpenACC attrs presence

2015-09-23 Thread Alexander Monakov

This patch makes one OpenACC-specific path in nvptx_record_offload_symbol optional. * config/nvptx/nvptx.c (nvptx_record_offload_symbol): Allow missing OpenACC attributes. --- gcc/config/nvptx/nvptx.c | 19 +++ 1 file changed, 11 insertions(+), 8 deletions(-) diff

[gomp4 4/8] libgomp: minimal OpenMP support in plugin-nvptx.c

2015-09-23 Thread Alexander Monakov

This is a minimal patch for NVPTX OpenMP offloading, using Jakub's initial implementation. It allows to successfully run '#pragma omp target', without any parallel execution: 1 team of 1 thread is spawned on the device, and target regions with '#pragma omp parallel' will fail with a link error.

[gomp4 3/8] libgomp: provide target-to-host fallback diagnostic

2015-09-23 Thread Alexander Monakov

This patch allows to see when target regions are executed on host with GOMP_DEBUG=1 in the environment. * target.c (GOMP_target): Use gomp_debug on fallback path. --- libgomp/target.c | 1 + 1 file changed, 1 insertion(+) diff --git a/libgomp/target.c b/libgomp/target.c index 6ca80ad..1c

[gomp4 2/8] nvptx mkoffload: do not restrict to OpenACC

2015-09-23 Thread Alexander Monakov

This patch allows to meaningfully invoke mkoffload with -fopenmp. The check for -fopenacc flag is specific to gomp4 branch: trunk does not have it. * config/nvptx/mkoffload.c (main): Do not check for -fopenacc. --- gcc/config/nvptx/mkoffload.c | 7 ++- 1 file changed, 2 insertions(+)

[gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Alexander Monakov

Hello, This patch series implements some minimally required changes to have OpenMP offloading working for NVPTX target on the gomp4 branch. '#pragma omp target' and data updates should work, but all parallel execution functionality remains stubbed out (uses of '#pragma omp parallel' in target reg

[gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-23 Thread Alexander Monakov

Although newlib headers define most pthreads types, pthread_attr_t is not available. Macro-replace it by 'void' to keep the prototype of gomp_init_thread_affinity unchanged, and do not declare gomp_thread_attr. * libgomp.h: Define pthread_attr_t to void on NVPTX. --- libgomp/libgomp.h |

[gomp4 6/8] libgomp: provide stub bar.h on nvptx

2015-09-23 Thread Alexander Monakov

This stub header only provides empty struct gomp_barrier_t. For now I've punted on providing a minimally-correct implementation. * config/nvptx/bar.h: New file. --- libgomp/config/nvptx/bar.h | 38 ++ 1 file changed, 38 insertions(+) create mode 10064

[gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-23 Thread Alexander Monakov

This patch ports env.c to NVPTX. It drops all environment parsing routines since there's no "environment" on the device. For now, the useful effect of the patch is providing 'omp_is_initial_device' to distinguish host execution from target execution in user code. Several functions use gomp_icv,

[gomp4 5/8] libgomp: provide sem.h, mutex.h, ptrlock.h on nvptx

2015-09-23 Thread Alexander Monakov

@@ -0,0 +1,67 @@ +/* Copyright (C) 2015 Free Software Foundation, Inc. + Contributed by Alexander Monakov + + This file is part of the GNU Offloading and Multi Processing Library + (libgomp). + + Libgomp is free software; you can redistribute it and/or modify it + under the terms of the GNU

Re: [gomp4 0/8] NVPTX: initial OpenMP offloading

2015-09-23 Thread Alexander Monakov

On Wed, 23 Sep 2015, Bernd Schmidt wrote: > I have two major concerns here. Can I ask you how much experience you have > with GPU programming and ptx? I'd say I have a good understanding of the programming model and nvidia hardware architecture, having used CUDA tools and payed attention to r&d ne

Re: [gomp4 8/8] libgomp: provide ICVs via env.c on nvptx

2015-09-24 Thread Alexander Monakov

On Thu, 24 Sep 2015, Jakub Jelinek wrote: > On Wed, Sep 23, 2015 at 08:22:22PM +0300, Alexander Monakov wrote: > > This patch ports env.c to NVPTX. It drops all environment parsing routines > > since there's no "environment" on the device. For now, the use

Re: [gomp4 7/8] libgomp: work around missing pthread_attr_t on nvptx

2015-09-24 Thread Alexander Monakov

> I'd prefer here the https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01418.html > changes to libgomp.h and associated configury changes. OK, like the following? [gomp4] libgomp: guard pthreads usage by LIBGOMP_USE_PTHREADS This allows to avoid referencing pthread types and functions on nvptx.

Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-15 Thread Alexander Monakov

On Thu, 15 Oct 2015, Jakub Jelinek wrote: > Looking at Cuda, for async target region kernels we'd probably use > a non-default stream and enqueue the async kernel in there. I see > we can e.g. cudaEventRecord into the stream and then either cudaEventQuery > to busy poll the event, or cudaEventSync

Re: [gomp4.1] depend nowait support for target {update,{enter,exit} data}

2015-10-15 Thread Alexander Monakov

On Thu, 15 Oct 2015, Jakub Jelinek wrote: > > - this functionality doesn't currently work through CUDA MPS > > ("multi-process > > server", for funneling CUDA calls from different processes through a > > single "server" process, avoiding context-switch overhead on the device, > > som

[gomp4 01/14] nvptx: emit kernels for 'omp target entrypoint' only for OpenACC

2015-10-20 Thread Alexander Monakov

The NVPTX backend emits each functions either as .func (callable only from the device code) or as .kernel (entry point for a parallel region). OpenMP lowering adds "omp target entrypoint" attribute to functions outlined from target regions. Unlike OpenACC offloading, OpenMP offloading does not in

[gomp4 00/14] NVPTX: further porting

2015-10-20 Thread Alexander Monakov

Hello, This patch series moves libgomp/nvptx porting further along to get initial bits of parallel execution working, mostly unbreaking the testsuite. Please have a look! I'm interested in feedback, and would like to know if it's suitable to become a part of a branch. This patch series ports en

[gomp4 04/14] nvptx: fix output of _Bool global variables

2015-10-20 Thread Alexander Monakov

Due to special treatment of types, emitting variables of type _Bool in global scope is impossible: extern references are emitted with .u8, but definitions use .u64. This patch fixes the issue by treating boolean type as integer types. * config/nvptx/nvptx.c (init_output_initializer): Also

[gomp4 03/14] nvptx: expand support for address spaces

2015-10-20 Thread Alexander Monakov

This allows to emit decls in 'shared' memory from the middle-end. * config/nvptx/nvptx.c (nvptx_legitimate_address_p): Adjust prototype. (nvptx_section_for_decl): If type of decl has a specific address space, return it. (nvptx_addr_space_from_address): Ditto.

[gomp4 11/14] libgomp: avoid variable-length stack allocation in team.c

2015-10-20 Thread Alexander Monakov

NVPTX does not support alloca or variable-length stack allocations, thus heap allocation needs to be used instead. I've opted to make this a generic change instead of guarding it with an #ifdef: libgomp usually leaves thread stack size up to libc, so avoiding unbounded stack allocation makes sense

[gomp4 14/14] libgomp: use more generic implementations on nvptx

2015-10-20 Thread Alexander Monakov

This patch removes 0-size libgomp stubs where generic implementations can be compiled for the NVPTX target. It also removes non-stub critical.c, which contains assembly implementations for GOMP_atomic_{start,end}, but does not contain implementations for GOMP_critical_*. My understanding is that

[gomp4 08/14] libgomp nvptx: populate proc.c

2015-10-20 Thread Alexander Monakov

This provides minimal implementations of gomp_dynamic_max_threads and omp_get_num_procs. * config/nvptx/proc.c: New. --- libgomp/config/nvptx/proc.c | 40 1 file changed, 40 insertions(+) diff --git a/libgomp/config/nvptx/proc.c b/libgomp/config/n

[gomp4 05/14] omp-low: set 'omp target entrypoint' only on entypoints

2015-10-20 Thread Alexander Monakov

(note to reviewers: I'm not sure what we're after here, on the high level; will be happy to rework the patch in a saner manner based on feedback, or even drop it for now) At the moment the attribute setting logic in omp-low.c is such that if a function that should be present in target code does no

[gomp4 12/14] libgomp: fixup error.c on nvptx

2015-10-20 Thread Alexander Monakov

NVPTX provides vprintf, but there's no stream separation: everything is printed as if into stdout. This is the minimal change to get error.c working. * error.c [__nvptx__]: Replace vfprintf, fputs, fputc with [v]printf. --- libgomp/error.c | 5 + 1 file changed, 5 insertions(+) diff

[gomp4 07/14] libgomp nvptx plugin: launch target functions via gomp_nvptx_main

2015-10-20 Thread Alexander Monakov

The approach I've taken in libgomp/nvptx is to have a single entry point, gomp_nvptx_main, that can take care of initial allocation, transferring control to target region function, and finalization. At the moment it has the prototype: void gomp_nvptx_main(void (*fn)(void*), void *fndata); but it'

[gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-10-20 Thread Alexander Monakov

(This patch serves as a straw man proposal to have something concrete for discussion and further patches) On PTX, stack memory is private to each thread. When master thread constructs 'omp_data_o' on its own stack and passes it to other threads via GOMP_parallel by reference, other threads cannot

[gomp4 10/14] libgomp: arrange a team of pre-started threads via gomp_nvptx_main

2015-10-20 Thread Alexander Monakov

This patch ports team.c to nvptx by arranging an initialization/cleanup routine, gomp_nvptx_main, that all (pre-started) threads can run. It initializes a thread pool and proceeds to run gomp_thread_start in all threads except thread zero, which runs original target region function. Thread-privat

[gomp4 02/14] nvptx: emit pointers to OpenMP target region entry points

2015-10-20 Thread Alexander Monakov

Note: this patch will have to be more complex if we go with 'approach 2' described in a later patch, 07/14 "launch target functions via gomp_nvptx_main". For OpenMP offloading, libgomp invokes 'gomp_nvptx_main' as the accelerator kernel, passing it a pointer to outlined target region function. Th

[gomp4 13/14] libgomp: provide minimal GOMP_teams

2015-10-20 Thread Alexander Monakov

On NVPTX, we don't need most of target.c functionality, except for GOMP_teams. Provide it as a copy of the generic implementation for now (it most likely will need to change down the line: on NVPTX we do need to spawn several thread blocks for #pragma omp teams). Alternatively, it might make sense

[gomp4 09/14] libgomp: provide barriers on NVPTX

2015-10-20 Thread Alexander Monakov

On NVPTX, there's 16 hardware barriers for each thread team, each barrier has a variable waiter count. The instruction 'bar.sync N, M;' allows to wait on barrier number N until M threads have arrived. M should be pre-multiplied by warp width. It's also possible to 'post' the barrier without susp

Re: [gomp4 04/14] nvptx: fix output of _Bool global variables

2015-10-20 Thread Alexander Monakov

On Tue, 20 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > Due to special treatment of types, emitting variables of type _Bool in > > global scope is impossible: extern references are emitted with .u8, but > > definitions use .u64.

Re: [gomp4 03/14] nvptx: expand support for address spaces

2015-10-20 Thread Alexander Monakov

On Tue, 20 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > This allows to emit decls in 'shared' memory from the middle-end. > > > > * config/nvptx/nvptx.c (nvptx_legitimate_address_p): Adjust prototype. > >

Re: [gomp4 07/14] libgomp nvptx plugin: launch target functions via gomp_nvptx_main

2015-10-20 Thread Alexander Monakov

On Tue, 20 Oct 2015, Bernd Schmidt wrote: > On 10/20/2015 08:34 PM, Alexander Monakov wrote: > > 2. Make gomp_nvptx_main a device (.func) function. To have that work, we'd > > need to additionally emit a "trampoline" of sorts in the NVPTX backend. For > &g

< 1 2 3 4 5 6 7 8 9 10 >

101 - 200 of 1023 matches

Mail list logo