PING. Slightly updated patch attached, which further improves the generic size fallback that is used when the element size is not 2/4/8 bytes. Changing the us_perf benchmark to use real(10), with the v2 patch the performance is:
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 59.028550429522085 86.019754350948787
8 79.028327063130590 95.803502000733374
16 99.980457395413296 138.68367462874946
32 122.56886206338788 180.05609910155042
64 152.00478266944486 212.69931319407567
128 197.74137934940202 235.19728791956828
256 155.36245780017779 244.60578379215929
512 157.13385845966246 245.07467397691480
1024 177.26553799130201 260.44908357795623
2048 208.22852888945587 260.21587143113527
4096 222.88410474980634 262.66162209490591
8192 226.71167580652920 265.81191407123663
16384 206.51818241747065 263.59395165591724
32768 230.18707026455866 265.88990325026526
65536 229.19783089391504 268.04485112932684
131072 231.12215662044449 267.40543904427710
262144 230.72012123598142 267.60086931504122
524288 230.48959460456055 268.78750211303725
With the new v3 patch I get
Unformatted sequential write/read performance test
Record size Write MB/s Read MB/s
==========================================================
4 59.779061121239941 92.777125264010024
8 92.727504266051341 126.64775563782673
16 128.94793911163904 184.69194300482837
32 169.78916283536847 267.06752001266767
64 209.50296476919556 341.60515130910238
128 236.36709738360679 416.73212655882151
256 251.79029695383340 465.46804746749740
512 259.62269939828633 500.87346060356265
1024 265.08842337586458 508.95530627428275
2048 268.71795530051884 532.12211365683640
4096 280.86546884821030 546.88907054369884
8192 286.96049684823578 569.60958187426183
16384 292.04368984868103 608.11503416324865
32768 292.96677387959392 629.80651297065833
65536 291.69098580137114 624.27103478079641
131072 292.75666234956418 605.99766136491496
262144 291.35520038228975 611.59061455535834
524288 292.15446100501691 623.76232623081580
On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist
<[email protected]> wrote:
> On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener
> <[email protected]> wrote:
>> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <[email protected]>
>> wrote:
>>> Janne Blomqvist <[email protected]> writes:
>>>
>>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c
>>>> index c8ecc3a..bf2250a 100644
>>>> --- a/libgfortran/io/file_pos.c
>>>> +++ b/libgfortran/io/file_pos.c
>>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp,
>>>> gfc_unit *u)
>>>> }
>>>> else
>>>> {
>>>> + uint32_t u32;
>>>> + uint64_t u64;
>>>> switch (length)
>>>> {
>>>> case sizeof(GFC_INTEGER_4):
>>>> - reverse_memcpy (&m4, p, sizeof (m4));
>>>> + memcpy (&u32, p, sizeof (u32));
>>>> + u32 = __builtin_bswap32 (u32);
>>>> + m4 = *(GFC_INTEGER_4*)&u32;
>>>
>>> Isn't that an aliasing violation?
>>
>> It looks like one. Why not simply do
>>
>> m4 = (GFC_INTEGER_4) u32;
>>
>> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed?
>
> Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do
> the above, C99 6.3.1.3(3) says that if the unsigned value is outside
> the range of the signed variable, the result is
> implementation-defined. Though I suppose the sensible
> "implementation-defined behavior" in this case on a two's complement
> target is to just do a bitwise copy.
>
> Anyway, to be really safe one could use memcpy instead; the compiler
> optimizes small fixed size memcpy's just fine. Updated patch attached.
>
>
> --
> Janne Blomqvist
--
Janne Blomqvist
bswap3.diff
Description: Binary data
