PING. Slightly updated patch attached, which further improves the generic size fallback that is used when the element size is not 2/4/8 bytes. Changing the us_perf benchmark to use real(10), with the v2 patch the performance is:
Unformatted sequential write/read performance test Record size Write MB/s Read MB/s ========================================================== 4 59.028550429522085 86.019754350948787 8 79.028327063130590 95.803502000733374 16 99.980457395413296 138.68367462874946 32 122.56886206338788 180.05609910155042 64 152.00478266944486 212.69931319407567 128 197.74137934940202 235.19728791956828 256 155.36245780017779 244.60578379215929 512 157.13385845966246 245.07467397691480 1024 177.26553799130201 260.44908357795623 2048 208.22852888945587 260.21587143113527 4096 222.88410474980634 262.66162209490591 8192 226.71167580652920 265.81191407123663 16384 206.51818241747065 263.59395165591724 32768 230.18707026455866 265.88990325026526 65536 229.19783089391504 268.04485112932684 131072 231.12215662044449 267.40543904427710 262144 230.72012123598142 267.60086931504122 524288 230.48959460456055 268.78750211303725 With the new v3 patch I get Unformatted sequential write/read performance test Record size Write MB/s Read MB/s ========================================================== 4 59.779061121239941 92.777125264010024 8 92.727504266051341 126.64775563782673 16 128.94793911163904 184.69194300482837 32 169.78916283536847 267.06752001266767 64 209.50296476919556 341.60515130910238 128 236.36709738360679 416.73212655882151 256 251.79029695383340 465.46804746749740 512 259.62269939828633 500.87346060356265 1024 265.08842337586458 508.95530627428275 2048 268.71795530051884 532.12211365683640 4096 280.86546884821030 546.88907054369884 8192 286.96049684823578 569.60958187426183 16384 292.04368984868103 608.11503416324865 32768 292.96677387959392 629.80651297065833 65536 291.69098580137114 624.27103478079641 131072 292.75666234956418 605.99766136491496 262144 291.35520038228975 611.59061455535834 524288 292.15446100501691 623.76232623081580 On Sat, Jan 5, 2013 at 11:13 PM, Janne Blomqvist <blomqvist.ja...@gmail.com> wrote: > On Sat, Jan 5, 2013 at 5:35 PM, Richard Biener > <richard.guent...@gmail.com> wrote: >> On Fri, Jan 4, 2013 at 11:35 PM, Andreas Schwab <sch...@linux-m68k.org> >> wrote: >>> Janne Blomqvist <blomqvist.ja...@gmail.com> writes: >>> >>>> diff --git a/libgfortran/io/file_pos.c b/libgfortran/io/file_pos.c >>>> index c8ecc3a..bf2250a 100644 >>>> --- a/libgfortran/io/file_pos.c >>>> +++ b/libgfortran/io/file_pos.c >>>> @@ -140,15 +140,21 @@ unformatted_backspace (st_parameter_filepos *fpp, >>>> gfc_unit *u) >>>> } >>>> else >>>> { >>>> + uint32_t u32; >>>> + uint64_t u64; >>>> switch (length) >>>> { >>>> case sizeof(GFC_INTEGER_4): >>>> - reverse_memcpy (&m4, p, sizeof (m4)); >>>> + memcpy (&u32, p, sizeof (u32)); >>>> + u32 = __builtin_bswap32 (u32); >>>> + m4 = *(GFC_INTEGER_4*)&u32; >>> >>> Isn't that an aliasing violation? >> >> It looks like one. Why not simply do >> >> m4 = (GFC_INTEGER_4) u32; >> >> ? I suppose GFC_INTEGER_4 is always the same size as uint32_t but signed? > > Yes, GFC_INTEGER_4 is a typedef for int32_t. As for why I didn't do > the above, C99 6.3.1.3(3) says that if the unsigned value is outside > the range of the signed variable, the result is > implementation-defined. Though I suppose the sensible > "implementation-defined behavior" in this case on a two's complement > target is to just do a bitwise copy. > > Anyway, to be really safe one could use memcpy instead; the compiler > optimizes small fixed size memcpy's just fine. Updated patch attached. > > > -- > Janne Blomqvist -- Janne Blomqvist
bswap3.diff
Description: Binary data