Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Chet Ramey
On 3/2/12 6:47 AM, Jean-François Gagné wrote:

> uname output: Linux  2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 
> 2011 x86_64 GNU/Linux
> Machine Type: x86_64-pc-linux-gnu
> 
> Bash Version: 4.1
> Patch Level: 5
> Release Status: release
> 
> Description:
> When reading data with the 'read' buildin from a redirection, read has 
> unexpected behavior after reading 2G of data.  
> 
> Repeat-By:
> 
> 
> yes "0123456789abcdefghijklmnopqrs" | head -n 1 > file
> while read line; do file=${line:0:10}; echo $file; done < file | uniq -c
> 
> 
> results in
> 
> 
> 71582790 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
> ...
> 
> So the problem happens after reading 71.582.790 x30 = 2.147.483.700 bytes of 
> data, just a little over 2^31.

Compile and run the attached program.  If it prints out `4', which it does
on all of the Debian systems I've tried, file offsets are limited to 32
bits, and accessing files greater than 2 GB is going to be unreliable.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
#include 
#include 
#include 

main(int c, char **v)
{
printf("%d\n", (int)sizeof(off_t));
exit(0);
}


Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Jonathan Nieder
Hi Chet,

Chet Ramey wrote:

> Compile and run the attached program.  If it prints out `4', which it does
> on all of the Debian systems I've tried, file offsets are limited to 32
> bits, and accessing files greater than 2 GB is going to be unreliable.

off_t is typedef'd to off64_t if you compile with -D_FILE_OFFSET_BITS=64.
The AC_SYS_LARGEFILE autoconf macro is supposed to cause bash to be
built with -D_FILE_OFFSET_BITS=64 (and seems to work that way here,
based on the result of "eu-readelf -s /bin/bash | grep open").

Jean-François is using a 64-bit architecture, where _FILE_OFFSET_BITS=64
is the default, anyway.

So I fear the problem is somewhere else.  (Maybe an "int" is used as an
offset somewhere.)

Hope that helps,
Jonathan



Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Bob Proulx
Chet Ramey wrote:
> Jean-François Gagné wrote:
> > uname output: Linux  2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 
> > 2011 x86_64 GNU/Linux
> > Machine Type: x86_64-pc-linux-gnu
>
> Compile and run the attached program.  If it prints out `4', which it does
> on all of the Debian systems I've tried, file offsets are limited to 32
> bits, and accessing files greater than 2 GB is going to be unreliable.

Apparently all of the Debian systems you have tried are 32-bits
systems.  On the reporter's 64-bit amd64 system it will print out 8.

Additionally the bash configure script includes the AC_SYS_LARGEFILE
macro which will test the ability of the system to use large files and
if the system is capable it will define _FILE_OFFSET_BITS=64 and in
that case the size off_t will be 8 bytes too.  If you compile the test
program with -D_FILE_OFFSET_BITS=64 the result will also be 8 even on
32-bit systems.

By default the 32-bit bash will be large file aware on all systems
that support large files and will have been compiled with
_FILE_OFFSET_BITS=64.  I just looked in the config.log from a build of
bash and it included these lines in the resulting config.log file.

  configure:4710: checking for special C compiler options needed for large files
  configure:4805: result: no
  configure:4811: checking for _FILE_OFFSET_BITS value needed for large files
  ...
  configure:4922: result: 64
  ...
  ac_cv_sys_file_offset_bits=64

Bob



Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Andreas Schwab
Bob Proulx  writes:

> Chet Ramey wrote:
>> Jean-François Gagné wrote:
>> > uname output: Linux  2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 
>> > 2011 x86_64 GNU/Linux
>> > Machine Type: x86_64-pc-linux-gnu
>>
>> Compile and run the attached program.  If it prints out `4', which it does
>> on all of the Debian systems I've tried, file offsets are limited to 32
>> bits, and accessing files greater than 2 GB is going to be unreliable.
>
> Apparently all of the Debian systems you have tried are 32-bits
> systems.  On the reporter's 64-bit amd64 system it will print out 8.

But it won't help if you don't use it.

diff --git a/lib/sh/zread.c b/lib/sh/zread.c
index 0fd1199..3731a41 100644
--- a/lib/sh/zread.c
+++ b/lib/sh/zread.c
@@ -161,7 +161,7 @@ zsyncfd (fd)
  int fd;
 {
   off_t off;
-  int r;
+  off_t r;
 
   off = lused - lind;
   r = 0;

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Chet Ramey
On 3/4/12 3:51 PM, Andreas Schwab wrote:
> Bob Proulx  writes:
> 
>> Chet Ramey wrote:
>>> Jean-François Gagné wrote:
 uname output: Linux  2.6.32-5-amd64 #1 SMP Tue Jun 14 09:42:28 UTC 
 2011 x86_64 GNU/Linux
 Machine Type: x86_64-pc-linux-gnu
>>>
>>> Compile and run the attached program.  If it prints out `4', which it does
>>> on all of the Debian systems I've tried, file offsets are limited to 32
>>> bits, and accessing files greater than 2 GB is going to be unreliable.
>>
>> Apparently all of the Debian systems you have tried are 32-bits
>> systems.  On the reporter's 64-bit amd64 system it will print out 8.
> 
> But it won't help if you don't use it.
> 
> diff --git a/lib/sh/zread.c b/lib/sh/zread.c
> index 0fd1199..3731a41 100644
> --- a/lib/sh/zread.c
> +++ b/lib/sh/zread.c
> @@ -161,7 +161,7 @@ zsyncfd (fd)
>   int fd;
>  {
>off_t off;
> -  int r;
> +  off_t r;
>  
>off = lused - lind;
>r = 0;

That's true, and I made this change some months ago.  The question is
whether or not it makes a real difference, since the only use of that
variable is to check whether the return value from lseek is -1.  I
suppose under the right set of circumstances it could.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/



Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Chet Ramey
On 3/4/12 4:32 PM, Chet Ramey wrote:

> That's true, and I made this change some months ago.  The question is
> whether or not it makes a real difference, since the only use of that
> variable is to check whether the return value from lseek is -1.  I
> suppose under the right set of circumstances it could.

Sorry, I forgot to attach the patch.

Chet


-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/
*** ../bash-4.2-patched/lib/sh/zread.c	Mon Mar  2 08:54:45 2009
--- lib/sh/zread.c	Thu Jul 28 18:16:53 2011
***
*** 161,166 
   int fd;
  {
!   off_t off;
!   int r;
  
off = lused - lind;
--- 161,165 
   int fd;
  {
!   off_t off, r;
  
off = lused - lind;
***
*** 169,173 
  r = lseek (fd, -off, SEEK_CUR);
  
!   if (r >= 0)
  lused = lind = 0;
  }
--- 168,172 
  r = lseek (fd, -off, SEEK_CUR);
  
!   if (r != -1)
  lused = lind = 0;
  }


Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Andreas Schwab
Chet Ramey  writes:

> That's true, and I made this change some months ago.  The question is
> whether or not it makes a real difference,

Of course it does, int != off_t.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



Re: Bash scripting and large files: input with the read builtin from a redirection gives unexpected result with files larger than 2GB.

2012-03-04 Thread Chet Ramey
On 3/2/12 6:47 AM, Jean-François Gagné wrote:

> Description:
> When reading data with the 'read' buildin from a redirection, read has 
> unexpected behavior after reading 2G of data.  
> 
> Repeat-By:
> 
> 
> yes "0123456789abcdefghijklmnopqrs" | head -n 1 > file
> while read line; do file=${line:0:10}; echo $file; done < file | uniq -c
> 
> 
> results in
> 
> 
> 71582790 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
>   1 mnopqrs
>   3 0123456789
> ...
> 
> So the problem happens after reading 71.582.790 x30 = 2.147.483.700 bytes of 
> data, just a little over 2^31.
> 
> but  the following:
> 
> cat file | while read line; do file=${line:0:10}; echo $file; done | uniq -c
> 
> works fine:
> 
> 1 0123456789

This works fine with the patch I posted.

Chet
-- 
``The lyf so short, the craft so long to lerne.'' - Chaucer
 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, ITS, CWRUc...@case.eduhttp://cnswww.cns.cwru.edu/~chet/