Am 02.03.2017 um 13:02 schrieb Jakub Jelinek:
And this needs to use *matmul_fn instead of *matmul_p too.
The whole point is that matmul_p is only loaded using __atomic_load_n
and only optionally stored using __atomic_store_n.
Ok with those changes.
Thanks! Committed as
https://gcc.gnu.org/vie
On Thu, Mar 02, 2017 at 12:57:05PM +0100, Thomas Koenig wrote:
> --- m4/matmul.m4 (Revision 245836)
> +++ m4/matmul.m4 (Arbeitskopie)
> @@ -123,9 +123,14 @@ void matmul_'rtype_code` ('rtype` * const restrict
> 'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
>
Hi Jakub,
Actually, I see a problem, but not related to this patch.
I bet e.g. tsan would complain heavily on the wrappers, because the code
is racy:
Here is a patch implementing your suggestion. Tested at least so
far that all matmul test cases pass on my machine.
OK for trunk?
Regards
On Thu, Mar 02, 2017 at 11:45:59AM +0100, Thomas Koenig wrote:
> Here's the updated version, which just uses FMA for AVX2.
>
> OK for trunk?
>
> Regards
>
> Thomas
>
> 2017-03-01 Thomas Koenig
>
> PR fortran/78379
> * m4/matmul.m4: (matmul_'rtype_code`_avx2): Also gene
On Thu, Mar 02, 2017 at 11:45:59AM +0100, Thomas Koenig wrote:
> Here's the updated version, which just uses FMA for AVX2.
>
> OK for trunk?
>
> Regards
>
> Thomas
>
> 2017-03-01 Thomas Koenig
>
> PR fortran/78379
> * m4/matmul.m4: (matmul_'rtype_code`_avx2): Also gene
Here's the updated version, which just uses FMA for AVX2.
OK for trunk?
Regards
Thomas
2017-03-01 Thomas Koenig
PR fortran/78379
* m4/matmul.m4: (matmul_'rtype_code`_avx2): Also generate for
reals. Add fma to target options.
(matmul_'rtype_code`):
On Thu, Mar 02, 2017 at 10:03:28AM +0100, Thomas Koenig wrote:
> Am 02.03.2017 um 09:43 schrieb Jakub Jelinek:
> > On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote:
> > > @@ -101,7 +93,7 @@
> > > `static void
> > > 'matmul_name` ('rtype` * const restrict retarray,
> > > 'rtype` * c
Am 02.03.2017 um 09:43 schrieb Jakub Jelinek:
On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote:
@@ -101,7 +93,7 @@
`static void
'matmul_name` ('rtype` * const restrict retarray,
'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
- int blas_limit, b
On Wed, Mar 01, 2017 at 10:00:08PM +0100, Thomas Koenig wrote:
> @@ -101,7 +93,7 @@
> `static void
> 'matmul_name` ('rtype` * const restrict retarray,
> 'rtype` * const restrict a, 'rtype` * const restrict b, int try_blas,
> - int blas_limit, blas_call gemm) __attribute__((__target__("
On Thu, Mar 02, 2017 at 10:09:31AM +0200, Janne Blomqvist wrote:
> > Here's something from the new matmul_r8_avx2:
> >
> > 156c: c4 62 e5 b8 fd vfmadd231pd %ymm5,%ymm3,%ymm15
> > 1571: c4 c1 79 10 04 06 vmovupd (%r14,%rax,1),%xmm0
> > 1577: c4 62 dd b8 d
On Thu, Mar 2, 2017 at 9:09 AM, Janne Blomqvist
wrote:
> On Thu, Mar 2, 2017 at 9:50 AM, Thomas Koenig wrote:
>> Am 02.03.2017 um 08:32 schrieb Janne Blomqvist:
>>>
>>> On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig
>>> wrote:
Hello world,
the attached patch enables FMA for t
On Thu, Mar 2, 2017 at 9:50 AM, Thomas Koenig wrote:
> Am 02.03.2017 um 08:32 schrieb Janne Blomqvist:
>>
>> On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig
>> wrote:
>>>
>>> Hello world,
>>>
>>> the attached patch enables FMA for the AVX2 and AVX512F variants of
>>> matmul. This should bring a v
Am 02.03.2017 um 08:32 schrieb Janne Blomqvist:
On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig wrote:
Hello world,
the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul. This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a
On Wed, Mar 1, 2017 at 11:00 PM, Thomas Koenig wrote:
> Hello world,
>
> the attached patch enables FMA for the AVX2 and AVX512F variants of
> matmul. This should bring a very nice speedup (although I have
> been unable to run benchmarks due to lack of a suitable machine).
In lieu of benchmarks,
Hi Jerry,
I would prefer that it was tested on the actual expected platform. Does
anyone anywhere on this list have access to one of these machines to test?
If anybody wants to test who does not have --enable-maintainer-mode
activated, here is a patch that works "out of the box".
Regards
On 03/01/2017 01:00 PM, Thomas Koenig wrote:
Hello world,
the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul. This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a suitable machine).
Question: Is this still appropriate for
Hello world,
the attached patch enables FMA for the AVX2 and AVX512F variants of
matmul. This should bring a very nice speedup (although I have
been unable to run benchmarks due to lack of a suitable machine).
Question: Is this still appropriate for the current state of trunk?
Or rather, OK for
17 matches
Mail list logo