Re: Memory leak in function with local array

2021-05-03 Thread Chet Ramey

On 5/2/21 8:19 AM, thomas.gren...@aalto.fi wrote:


Bash Version: 5.1
Patch Level: 4
Release Status: release

Description:
 There seems to be a memory leak in bash when a local array is declared 
within a function.
 I also tested this on "GNU bash, version 5.0.17(1)-release 
(x86_64-redhat-linux-gnu)" and that one works without any leaks.


Repeat-By:
 #!/bin/bash
 leak() { local arr=("leak"); }
 while :; do leak; done


Thanks for the report. This was fixed back in January as the result of
https://lists.gnu.org/archive/html/bug-bash/2021-01/msg00082.html.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?

2021-05-03 Thread Chet Ramey

On 5/2/21 9:51 AM, Koichi Murase wrote:

Maybe I'm asking a stupid question, but, as in the subject, why does
the builtin "mapfile -d delim" use unbuffered read when delim != '\n'?


It's the shell being careful in the general case. You need to guarantee
behavior in all of the cases where read(2) will not return until it sees
a newline.


  Can we use buffered read for seekable file descriptors the same as
the `delim == '\n'' case?


You should be able to, yes.



This treatment of `mapfile' for "delim != '\n'" exists since the
mapfile delimiter is first introduced by commit 25a0eacfe "commit
bash-20140625 snapshot". Would it be a problem to change to the
buffered read also for non-LF delimiters? If we could remove the above
two lines (i.e., if (delim != '\n') unbuffered_read = 1;), I'd be very
happy...


Try it out and see.

Chet
--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?

2021-05-03 Thread Chet Ramey

On 5/3/21 10:14 AM, Chet Ramey wrote:




This treatment of `mapfile' for "delim != '\n'" exists since the
mapfile delimiter is first introduced by commit 25a0eacfe "commit
bash-20140625 snapshot". Would it be a problem to change to the
buffered read also for non-LF delimiters? If we could remove the above
two lines (i.e., if (delim != '\n') unbuffered_read = 1;), I'd be very
happy...


Try it out and see.


I should have mentioned up front that this will not work in the general
case on at least macOS, the BSDs, and (I believe) AIX.

It won't work on any system that doesn't return -1/ESPIPE when you try
to lseek on a terminal device. Glibc does; the above systems don't;
POSIX makes the behavior undefined.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRUc...@case.eduhttp://tiswww.cwru.edu/~chet/



Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?

2021-05-03 Thread Koichi Murase
2021年5月4日(火) 3:40 Chet Ramey :
> On 5/3/21 10:14 AM, Chet Ramey wrote:
> >> This treatment of `mapfile' for "delim != '\n'" exists since the
> >> mapfile delimiter is first introduced by commit 25a0eacfe "commit
> >> bash-20140625 snapshot". Would it be a problem to change to the
> >> buffered read also for non-LF delimiters? If we could remove the above
> >> two lines (i.e., if (delim != '\n') unbuffered_read = 1;), I'd be very
> >> happy...
> >
> > Try it out and see.

Actually, I had tried it with my Linux before I sent my previous mail,
and it seemed to work without problems. But I was not sure if it works
in every situation so asked the question.

> I should have mentioned up front that this will not work in the general
> case on at least macOS, the BSDs, and (I believe) AIX.
>
> It won't work on any system that doesn't return -1/ESPIPE when you try
> to lseek on a terminal device. Glibc does; the above systems don't;
> POSIX makes the behavior undefined.

Oh, I see. Thank you for the explanation. I was somehow assuming that
seekable file descriptors can be always correctly detected by the
previous lines (builtins/mapfile.def L188).  Can we check __GLIBC__ to
detect glibc lseek and safely skip the check on the delimiter?
Something like

#ifndef __GLIBC__
  /* Use the buffered read only when the delimiter is a newline because
 unseekable fd's may not be detected by non-glibc lseek(2). */
  if (delim != '\n')
unbuffered_read = 1;
#endif

Thank you,
Koichi



Re: Why does "mapfile -d delim" (delim != '\n') use unbuffered read?

2021-05-03 Thread Koichi Murase
I'm still thinking about whether it is possible to use buffered read
in more cases of mapfile.

(1) Suggestion: use buffered read for a regular file for delim != '\n'?

> It won't work on any system that doesn't return -1/ESPIPE when you try
> to lseek on a terminal device.

I found a related interesting discussion in a FreeBSD mailing list:
https://lists.freebsd.org/pipermail/freebsd-hackers/2011-November/036863.html
The BUGS section mentioned in this FreeBSD discussion can be found on
the FreeBSD man page:
https://www.freebsd.org/cgi/man.cgi?query=lseek&sektion=2

lseek(2) may succeed on non-seekable streams including some terminal
device (mentioned in Chet's reply) and a tape device (mentioned in the
above FreeBSD discussion).  According to the discussion, BSD tar has
changed to use lseek(2) for the seek capability test in FreeBSD only
when the file descriptor is a regular file or a block device.

The current devel branch (builtins/mapfile.def L187..L194) is
equivalent to the following code:

#ifndef __CYGWIN__
  if (delim == '\n')
unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
  else
unbuffered_read = 1; /* #1 */
#else
  unbuffered_read = 1;
#endif

Here I suggest modifying the line #1 in the following way so as to
switch to the buffered read when the file descriptor is a regular file
and lseek(2) succeeds on it. (BSD tar seemed to treat block devices
the same as regular files, but I'm afraid that the above-mentioned
tape device might be implemented as a block device, so I just allowed
the buffered read only for regular files):

#ifndef __CYGWIN__
  if (delim == '\n')
unbuffered_read = (lseek (fd, 0L, SEEK_CUR) < 0) && (errno == ESPIPE);
  else
{
  struct stat st;
  unbuffered_read = 1;
  if (fstat (fd, &st) >= 0 &&
S_ISREG(st.st_mode) &&
lseek (fd, 0L, SEEK_CUR) >= 0)
unbuffered_read = 0;
}
#else
  unbuffered_read = 1;
#endif


(2) Suggestion: use unbuffered_read only when the options `-n/-C' are used?

It seems that the unbuffered read is only needed when `mapfile' is
supplied with the option `-n max_nlines' or the option `-C callback'.
In the other cases, `mapfile' anyway reads the stream till the end
without stopping, so it seems that we don't have to care about the
unprocessed data in the read buffer.  What do you think of turning on
unbuffered_read only when the option `-n' or the option `-C' is
specified?


(3) Suggestion: change the order of `run_callback' and `zsyncfd'?

I noticed that `zsyncfd' is called before `run_callback':

builtins/mapfile.def L215..L223
> /* Has a callback been registered and if so is it time to call it? */
> if (callback && line_count && (line_count % callback_quantum) == 0)
>   {
> run_callback (callback, array_index, line);
>
> /* Reset the buffer for bash own stream. */
> if (unbuffered_read == 0)
>   zsyncfd (fd);
>   }

I feel it is better to call `zsyncfd' before `run_callback'.  If the
`callback' reads data from the same file descriptor when there is
unprocessed data in the read buffer, something strange will happen.
`zsyncfd' will just shift the stream position backward by the same
number of bytes as the unprocessed data.  Since it is more natural
that `callback' finds the unprocessed data in the stream, I think
`zsyncfd' is better to be called before `run_callback'.


I attach a combined patch for (1)--(3).  This time, I didn't include
the special treatment on __GLIBC__, but maybe we can skip the check of
S_ISREG (st.st_mode) if __GLIBC__ is defined.

Remark: In the current devel branch, when delim == '\n' and lseek(2)
fails with errno != ESPIPE, `mapfile' still performs buffered read.  I
kept this behavior in the attached patch, but maybe this could also be
changed.  I doubt that we could still expect that `zsyncfd' will work
when lseek(2) fails (even though errno != ESPIPE).


0001-mapfile-buffered-read-in-more-cases.patch
Description: Binary data