[issue4681] mmap offset should be off_t instead of ssize_t

2008-12-17 Thread saa

New submission from saa :

In trying to use the mmap offset to allow me to work with a 12GB file, I 
encountered the fact that the new offset parameter was a ssize_t, which 
is only 32 bits on my platform (OS X).  The mmap offset in C is defined 
to be an off_t 
(http://opengroup.org/onlinepubs/007908775/xsh/mmap.html), which is 64 
bits on my platform.  The attached patch attempts to fix the non-Windows 
code to have an off_t offset parameter and allows my code to access all 
12 GB of my file.  As a warning, this is the first time I've even looked 
at the Python source, and wouldn't even consider myself a Python coder; 
I just use Python every few months.  Review the patch carefully.

--
components: Extension Modules
files: mmap_off_t.patch
keywords: patch
messages: 77957
nosy: saa
severity: normal
status: open
title: mmap offset should be off_t instead of ssize_t
type: behavior
versions: Python 2.6
Added file: http://bugs.python.org/file12375/mmap_off_t.patch

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t

2008-12-17 Thread saa

saa  added the comment:

Sorry, I should have explained that the bulk of my patch was derived from 
existing code.  The code that references fpos_t is basically a cut and 
paste of code from Modules/bz2module.c.  I don't know if this is a valid 
usage of fpos_t, but if it isn't, you should probably file a bug on the 
existing usage in bz2module and suggest a fix.  Thanks for reviewing!

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t

2008-12-17 Thread saa

saa  added the comment:

Thanks again, revised patch attached.

Added file: http://bugs.python.org/file12376/mmap_off_t_2.patch

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t

2008-12-17 Thread saa

saa  added the comment:

Ah, I think I'm beginning to understand.  Thanks for the explanation.  So, 
if I understand correctly, I should refactor the patch and just remove all 
of the Py_off_t and just use off_t because the standard says that *is* the 
type.  Correct?

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected

2008-12-19 Thread saa

saa  added the comment:

There are bigger problems here.  When the code calculates the size to map 
if size is passed as zero, it simply assigns the 64-bit st_size to the 
size to mmap.  On a 32-bit system, it is obvious that the amount to map 
must be less than a 32-bit value.  The calculation should take into 
account at least the following two things: 1) it should be the size of the 
file minus the offset, and 2) it shouldn't ask for an unreasonable amount 
of memory.  I will experiment with this and submit a new patch (and delete 
the two I've already uploaded) within a couple of days, or at a minimum, 
provide an update to this bug.

--
title: mmap offset should be off_t instead of ssize_t -> mmap offset should be 
off_t instead of ssize_t, and size calculation needs corrected

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected

2008-12-19 Thread saa

saa  added the comment:

Notes on the problem.

The Python mmap documentation says: If length is 0, the maximum length 
of the map will be the current size of the file when mmap is called.

Currently, on a system where off_t is 64-bit and size_t is 32-bit, if 
the size of the file (st_size) is > 0x', the code will in effect 
try to map (st_size & 0x').  This is suboptimal (imagine if the 
size of the file is 0x1''1000 or even 0x1'').

In addition, it seems weird that a caller would have to check the size 
of the returned mmap against the size of the file to see if the entire 
file was mapped or not.

Finally, it appears that there isn't a reliable, architecture 
independent, quick, and easy way to figure out what the maximum mmap 
size is.

With these points in mind, I am going to work on coming up with a patch 
that will change the behavior to simply return an error when a zero 
length is passed to mmap and the entire file can not be mapped.  This 
will put the onus on the caller to determine an appropriate mmap size 
for "chunking" the processing of the file, but the behavior will become 
more consistent.

Of course, comments and suggestions are welcome.  However, I'm going to 
have to wrap up my immediate involvement on this in the next couple of 
days (Yay, vacation!).

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected

2008-12-19 Thread saa

saa  added the comment:

Ok, put a fork in me, 'cuz I'm done with this.  The latest is that 
mmap.size() is defined to return the size of the file and not the size 
of the data.  It tries to return it as a ssize_t, which of course, on 
systems where off_t is 64-bits and ssize_t is 32-bits, won't work for 
sizes that won't fit in 32-bits.

Without the size of the data, it is unclear how one would traverse a 
file using the offset parameter.  As part of fixing mmap, I would 
suggest someone should write an example of how it should be used in 
these cases and use that as a test case.

I'm going to leave this up to someone with more knowledge of how this 
stuff *should* work in Python, but I'm going to go redo my program in C.
Thanks for listening and the future efforts.

___
Python tracker 
<http://bugs.python.org/issue4681>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com