I'm still thinking about whether it is possible to use buffered read in more cases of mapfile.
(1) Suggestion: use buffered read for a regular file for delim != '\n'? > It won't work on any system that doesn't return -1/ESPIPE when you try > to lseek on a terminal device. I found a related interesting discussion in a FreeBSD mailing list: https://lists.freebsd.org/pipermail/freebsd-hackers/2011-November/036863.html The BUGS section mentioned in this FreeBSD discussion can be found on the FreeBSD man page: https://www.freebsd.org/cgi/man.cgi?query=lseek&sektion=2 lseek(2) may succeed on non-seekable streams including some terminal device (mentioned in Chet's reply) and a tape device (mentioned in the above FreeBSD discussion). According to the discussion, BSD tar has changed to use lseek(2) for the seek capability test in FreeBSD only when the file descriptor is a regular file or a block device. The current devel branch (builtins/mapfile.def L187..L194) is equivalent to the following code: #ifndef __CYGWIN__ if (delim == '\n') unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE); else unbuffered_read = 1; /* #1 */ #else unbuffered_read = 1; #endif Here I suggest modifying the line #1 in the following way so as to switch to the buffered read when the file descriptor is a regular file and lseek(2) succeeds on it. (BSD tar seemed to treat block devices the same as regular files, but I'm afraid that the above-mentioned tape device might be implemented as a block device, so I just allowed the buffered read only for regular files): #ifndef __CYGWIN__ if (delim == '\n') unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE); else { struct stat st; unbuffered_read = 1; if (fstat (fd, &st) >= 0 && S_ISREG(st.st_mode) && lseek (fd, 0L, SEEK_CUR) >= 0) unbuffered_read = 0; } #else unbuffered_read = 1; #endif ---- (2) Suggestion: use unbuffered_read only when the options `-n/-C' are used? It seems that the unbuffered read is only needed when `mapfile' is supplied with the option `-n max_nlines' or the option `-C callback'. In the other cases, `mapfile' anyway reads the stream till the end without stopping, so it seems that we don't have to care about the unprocessed data in the read buffer. What do you think of turning on unbuffered_read only when the option `-n' or the option `-C' is specified? ---- (3) Suggestion: change the order of `run_callback' and `zsyncfd'? I noticed that `zsyncfd' is called before `run_callback': builtins/mapfile.def L215..L223 > /* Has a callback been registered and if so is it time to call it? */ > if (callback && line_count && (line_count % callback_quantum) == 0) > { > run_callback (callback, array_index, line); > > /* Reset the buffer for bash own stream. */ > if (unbuffered_read == 0) > zsyncfd (fd); > } I feel it is better to call `zsyncfd' before `run_callback'. If the `callback' reads data from the same file descriptor when there is unprocessed data in the read buffer, something strange will happen. `zsyncfd' will just shift the stream position backward by the same number of bytes as the unprocessed data. Since it is more natural that `callback' finds the unprocessed data in the stream, I think `zsyncfd' is better to be called before `run_callback'. ---- I attach a combined patch for (1)--(3). This time, I didn't include the special treatment on __GLIBC__, but maybe we can skip the check of S_ISREG (st.st_mode) if __GLIBC__ is defined. Remark: In the current devel branch, when delim == '\n' and lseek(2) fails with errno != ESPIPE, `mapfile' still performs buffered read. I kept this behavior in the attached patch, but maybe this could also be changed. I doubt that we could still expect that `zsyncfd' will work when lseek(2) fails (even though errno != ESPIPE).
0001-mapfile-buffered-read-in-more-cases.patch
Description: Binary data