http://gcc.gnu.org/bugzilla/show_bug.cgi?id=56981
--- Comment #3 from Tobias Burnus <burnus at gcc dot gnu.org> 2013-04-17 09:39:58 UTC --- (In reply to comment #2) > There is a seek inside next_record_w_unf. That function is used for DIRECT > I/O. > Looks conceptually wrong to me for sequential unformatted. I won't have time > for a few days to look at this further. Well, what gfortran does is: * write place-holder record length in the heading record marker * write actual data * write tailing record marker (1st call to write_us_marker in next_record_w_unf) * write actual length of this record, i.e. seek back + write_us_marker + see to past the tailing record marker (all in next_record_w_unf) I think what other compilers do is to make use of the following item in the Fortran standard: "The value of the RECL= specifier shall be positive. It specifies the length of each record in a file being connected for direct access, or specifies the maximum length of a record in a file being connected for sequential access." (F2008, "9.5.6.15 RECL= specifier in the OPEN statement") I tried the following program: ------------------------------- integer, allocatable :: array(:) integer :: rl, i open(99,file="/dev/null",form="unformatted") inquire(99,recl=rl) allocate(array(1024*1024*100)) array = 0 print *,rl, size(array)/4 write(99) (array, i=1,1000) close(99) end ------------------------------- With gfortran, it takes only: 0.203s and one has: 19 mmap 26 open 392 lseek 1784 write The question is why there are that many seeks. There should be only a single record! With pathf95, it fails after 0.099s with the error: "This request exceeds the maximum record size." And with g95, it takes 4.946s (!) until it fails with "Writing more data than the record size (RECL)": 11 close 17 fstat 20 mprotect 21 stat 25 mmap 30 write 47 open where the mmap+munmap pairs seem to take the lion share of the time. However, one can do better: NAG f95 only needs 0.007s and does: 5 read 6 lseek 8 mprotect 10 fstat 23 stat 29 mmap 40 open 2003 write Maybe something like the following would work: * Create a reasonable sized buffer * Use it to buffer the writes, and if it fits, write the length, the buffer, the length. * If the argument is a (too) big array, write the length of data in the buffer plus array byte size, then the data - and only if another item comes, seek to the beginning and update the length. That should take care of: write(99) i, j, k write(99) i, j, k, small_array write(99) big_array and even write(99) i, j, k, big_array but it will not help for write(99) big_array1, big_array2 I think that covers the most important cases. One question is how large the buffer should be initially, whether it should be resizable - and how long it should remain allocated. Even a small buffer of 1024 kbyte (= 128 real(8) values) will help when writing small data like in the example of comment 0. If it is larger, the issue of freeing the data and/or resizing becomes more important - and one needs to be careful not to require huge amount of memory and/or do do very frequent memory allocation+freeing, which causes the problems with g95. * * * Closer look at NAG: It does the following (allocate moved before open, inquire removed): open("/dev/null", O_RDWR) = 3 mmap(NULL, 3856, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaab0e000 mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaab22000 fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b1c90) = -1 ENOTTY (Inappropriate ioctl for device) fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0 ioctl(3, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fff645b2200) = -1 ENOTTY (Inappropriate ioctl for device) mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaac22000 lseek(3, 0, SEEK_CUR) = 0 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 419426304) = 419426304 (and 999 further write lines) lseek(3, 0, SEEK_SET) = 0 read(3, "", 4096) = 0 lseek(3, 12, SEEK_CUR) = 0 write(3, "\0\0\0\250", 4) = 4 lseek(3, 18446744072233156608, SEEK_SET) = 0 read(3, "", 4096) = 0 lseek(3, 20, SEEK_CUR) = 0 lseek(3, 0, SEEK_CUR) = 0 ftruncate(3, 0) = -1 EINVAL (Invalid argument) close(3) = 0 munmap(0x2aaaaac22000, 4096) = 0 munmap(0x2aaaab7a3000, 419430400) = 0