[issue4681] mmap offset should be off_t instead of ssize_t
New submission from saa : In trying to use the mmap offset to allow me to work with a 12GB file, I encountered the fact that the new offset parameter was a ssize_t, which is only 32 bits on my platform (OS X). The mmap offset in C is defined to be an off_t (http://opengroup.org/onlinepubs/007908775/xsh/mmap.html), which is 64 bits on my platform. The attached patch attempts to fix the non-Windows code to have an off_t offset parameter and allows my code to access all 12 GB of my file. As a warning, this is the first time I've even looked at the Python source, and wouldn't even consider myself a Python coder; I just use Python every few months. Review the patch carefully. -- components: Extension Modules files: mmap_off_t.patch keywords: patch messages: 77957 nosy: saa severity: normal status: open title: mmap offset should be off_t instead of ssize_t type: behavior versions: Python 2.6 Added file: http://bugs.python.org/file12375/mmap_off_t.patch ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t
saa added the comment: Sorry, I should have explained that the bulk of my patch was derived from existing code. The code that references fpos_t is basically a cut and paste of code from Modules/bz2module.c. I don't know if this is a valid usage of fpos_t, but if it isn't, you should probably file a bug on the existing usage in bz2module and suggest a fix. Thanks for reviewing! ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t
saa added the comment: Thanks again, revised patch attached. Added file: http://bugs.python.org/file12376/mmap_off_t_2.patch ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t
saa added the comment: Ah, I think I'm beginning to understand. Thanks for the explanation. So, if I understand correctly, I should refactor the patch and just remove all of the Py_off_t and just use off_t because the standard says that *is* the type. Correct? ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected
saa added the comment: There are bigger problems here. When the code calculates the size to map if size is passed as zero, it simply assigns the 64-bit st_size to the size to mmap. On a 32-bit system, it is obvious that the amount to map must be less than a 32-bit value. The calculation should take into account at least the following two things: 1) it should be the size of the file minus the offset, and 2) it shouldn't ask for an unreasonable amount of memory. I will experiment with this and submit a new patch (and delete the two I've already uploaded) within a couple of days, or at a minimum, provide an update to this bug. -- title: mmap offset should be off_t instead of ssize_t -> mmap offset should be off_t instead of ssize_t, and size calculation needs corrected ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected
saa added the comment: Notes on the problem. The Python mmap documentation says: If length is 0, the maximum length of the map will be the current size of the file when mmap is called. Currently, on a system where off_t is 64-bit and size_t is 32-bit, if the size of the file (st_size) is > 0x', the code will in effect try to map (st_size & 0x'). This is suboptimal (imagine if the size of the file is 0x1''1000 or even 0x1''). In addition, it seems weird that a caller would have to check the size of the returned mmap against the size of the file to see if the entire file was mapped or not. Finally, it appears that there isn't a reliable, architecture independent, quick, and easy way to figure out what the maximum mmap size is. With these points in mind, I am going to work on coming up with a patch that will change the behavior to simply return an error when a zero length is passed to mmap and the entire file can not be mapped. This will put the onus on the caller to determine an appropriate mmap size for "chunking" the processing of the file, but the behavior will become more consistent. Of course, comments and suggestions are welcome. However, I'm going to have to wrap up my immediate involvement on this in the next couple of days (Yay, vacation!). ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4681] mmap offset should be off_t instead of ssize_t, and size calculation needs corrected
saa added the comment: Ok, put a fork in me, 'cuz I'm done with this. The latest is that mmap.size() is defined to return the size of the file and not the size of the data. It tries to return it as a ssize_t, which of course, on systems where off_t is 64-bits and ssize_t is 32-bits, won't work for sizes that won't fit in 32-bits. Without the size of the data, it is unclear how one would traverse a file using the offset parameter. As part of fixing mmap, I would suggest someone should write an example of how it should be used in these cases and use that as a test case. I'm going to leave this up to someone with more knowledge of how this stuff *should* work in Python, but I'm going to go redo my program in C. Thanks for listening and the future efforts. ___ Python tracker <http://bugs.python.org/issue4681> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com