[ python-Bugs-1451466 ] reading very large files
Bugs item #1451466, was opened at 2006-03-16 18:21
Message generated for change (Comment added) made by richardchristen
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1451466&group_id=5470
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
>Group: Python 2.5
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: christen (richardchristen)
Assigned to: Nobody/Anonymous (nobody)
Summary: reading very large files
Initial Comment:
I work on the human genome
I extracted words from chromosomes using a suffix tree
(C compiled for 64 done on a SUN with 300 Go RAM, since
my suffix tree requires 150 Go RAM for chromosome 1,
the largest one)
this gave some >5 Go files, for example with 163763326
lines for chr 4, the one presently analyzed.
Using python 2.4.2 on a windows 32-computer (1.5 Go
RAM), reading this file line by line either
for li in file:
do something
or
while li!='':
li=file.readline()
I got problems seemingly around the 4 Go boundary
(after reading the problematic first line), for some
lines (not all), the li returned the correct content
but with the first word of the next line also within li
(see below)
As a result a simple
file1=open('1')
file2=open('2','w')
li=file1.readline()
while li!='':
file2.write(li)
li=file1.readline()
produced a second file of only
163754385 lines
problem lines were "seemingly random", i.e. not in a
row, with the last line being OK.
The same code on the same file but on my OSX
64-dualcore machine went fine, despite the use of
default Python 2.2.3 and "file Python" showing it is a
Mach-0 executable ppc, i.e. a 32 bit app.
Everything was run from the command line.
the first file looks like that
...
TCAGCCACAGCAGAAAGTGA:\t33240 551212 751185
TCAGCCACAGCAGAAAGTGC:\t131324047
TCAGCCACAGCACTGTGTTA:\t61641912
the second file contains lines like these :
TCAGCCACAGCAGAAAGTGC:\t131324047TCAGCCACAGCAGAAGAAGA:
which is 'first line'+'1rst word of next line'
PS1 : no problem to read the big file with UEdit on the
windows machine. Therefore the OS itself is not the
problem (also I transfered the bigfile from the Windows
to the Mac, if the file had had problems, it would have
been corrupted on the Mac)
PS2 : I tried python 2.3.5 on windows with the same
problem.
PS3: If needed, I can run the same test on a similar
file but for chromosome 8 which is slightly below the 4
Go limit (3.99).
PS4: I think I remember having done a similar parsing
on a Linux Athlon 64 monoCPU a month ago, with no trouble.
--
>Comment By: christen (richardchristen)
Date: 2007-07-02 09:11
Message:
Logged In: YES
user_id=1477618
Originator: YES
In 2006, I signaled a bug in windows 32 for reading very large files :
python-Bugs-1451466
I have now tried with a windows 64 machines and python 2.5
I find the same bug
For very large files (the two I tried were around 7-8 Go), the end of line
is sometimes not taken into account
The file is fine, as viewed in hexa, the end of line characters are
perfectly ok at the place where the parser goes wrong.
Everything seems to be ok with the same script on my Mac OSX
Exemple :
Original file reads:
###
.
Query= 10|ENSG0203288|pseudogene|105829416|105829650|-
1|ENSE1440927|105829519|105829650|-1|1
(132 letters)
Database: Homo_sapiens.NCBI36.45.dna.chromosome17
1 sequences; 78,774,742 total letters
...
###
in hexa:
###
...
c5bd3500h: 32 2E 0D 0A 0D 0A 51 75 65 72 79 3D 20 31 30 7C ; 2.Query=
10|
c5bd3510h: 45 4E 53 47 30 30 30 30 30 32 30 33 32 38 38 7C ;
ENSG0203288|
c5bd3520h: 70 73 65 75 64 6F 67 65 6E 65 7C 31 30 35 38 32 ;
pseudogene|10582
c5bd3530h: 39 34 31 36 7C 31 30 35 38 32 39 36 35 30 7C 2D ;
9416|105829650|-
c5bd3540h: 0D 0A 31 7C 45 4E 53 45 30 30 30 30 31 34 34 30 ;
..1|ENSE1440
c5bd3550h: 39 32 37 7C 31 30 35 38 32 39 35 31 39 7C 31 30 ;
927|105829519|10
c5bd3560h: 35 38 32 39 36 35 30 7C 2D 31 7C 31 0D 0A 20 20 ;
5829650|-1|1..
c5bd3570h: 20 20 20 20 20 20 20 28 31 33 32 20 6C 65 74 74 ;(132
lett
c5bd3580h: 65 72 73 29 0D 0A 0D 0A 44 61 74 61 62 61 73 65 ;
ers)Database
c5bd3590h: 3A 20 48 6F 6D 6F 5F 73 61 70 69 65 6E 73 2E 4E ; :
Homo_sapiens.N
c5bd35a0h: 43 42 49 33 36 2E 34 35 2E 64 6E 61 2E 63 68 72 ;
CBI36.45.dna.chr
c5bd35b0h: 6F 6D 6F 73 6F 6D 65 31 37 20 0D 0A 20 20 20 20 ; omosome17 ..
c5bd35c0h: 20 20 20 20 20 20 20 31 20 73 65 71 75 65 6E 63 ;1
sequenc
c5bd35d0h: 65 73 3B 20 37 38 2C 37 37 34 2C 37 34 32 20 74 ; es;
78,774,742 t
c5bd35e0h: 6F 74 61 6C 20 6C 65 74 74 65 72 73 0D 0A 0D 0A ; otal
letters
...
###
Demo: python script :
###
[ python-Bugs-1746071 ] class mutex doesn't do anything atomically
Bugs item #1746071, was opened at 2007-07-01 10:49 Message generated for change (Comment added) made by dbenbenn You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Benbennick (dbenbenn) Assigned to: Nobody/Anonymous (nobody) Summary: class mutex doesn't do anything atomically Initial Comment: >>> import mutex >>> print mutex.mutex.testandset.__doc__ Atomic test-and-set -- grab the lock if it is not set, return True if it succeeded. The above docstring is wrong: the method is not atomic. This is easy to see by inspecting the method's code: def testandset(self): """Atomic test-and-set -- grab the lock if it is not set, return True if it succeeded.""" if not self.locked: self.locked = 1 return True else: return False Therefore, it is possible for two threads to lock the same mutex simultaneously. So the mutex module cannot be used for mutual exclusion. The documentation for mutex says "The mutex module defines a class that allows mutual-exclusion via acquiring and releasing locks." [http://docs.python.org/lib/module-mutex.html]. Perhaps it would be a good idea to make the module actually do what the documentation says. -- >Comment By: David Benbennick (dbenbenn) Date: 2007-07-02 05:23 Message: Logged In: YES user_id=95581 Originator: YES > How are you using mutex with threads, can you please provide some information. I'm attaching an example program that demonstrates two threads both locking the same mutex at the same time. > If muobj is an instance of mutex class. > muobj.testandset() for process-a will set the lock. > muobj.testandset() for process-b will be dealt with self.lock = True and > wont be able to set. That isn't correct. It is possible for testandset to return True in both thread-a and thread-b. What can happen is the following: 1) Thread a calls testandset(). It executes the line "if not self.locked", and finds the result to be True. 2) The OS switches threads. 3) Thread b calls testandset(). It executes the line "if not self.locked", and finds the result to be True. 4) Thread b sets "self.locked = 1" and returns True 5) Thread a sets "self.locked = 1" and returns True File Added: ex.py -- Comment By: O.R.Senthil Kumaran (orsenthil) Date: 2007-07-01 23:10 Message: Logged In: YES user_id=942711 Originator: NO Hi David, I just fired up the docs and found this: "The mutex module defines a class that allows mutual-exclusion via acquiring and releasing locks. It does not require (or imply) threading or multi-tasking, though it could be useful for those purposes." The docs dont say about threads using mutex object, but instead say if you want to use threading you can use mutex obj. How are you using mutex with threads, can you please provide some information. If muobj is an instance of mutex class. muobj.testandset() for process-a will set the lock. muobj.testandset() for process-b will be dealt with self.lock = True and wont be able to set. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1706815 ] socket.error exceptions not subclass of StandardError
Bugs item #1706815, was opened at 2007-04-24 11:09 Message generated for change (Comment added) made by greg You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: John Nagle (nagle) >Assigned to: Gregory P. Smith (greg) Summary: socket.error exceptions not subclass of StandardError Initial Comment: The "socket.error" exception is a subclass of Exception, but not of StandardError. It needs to be placed properly in the exception hierarchy, presumably somewhere under IOError. Socket errors have some known problems. See also: [ 805194 ] Inappropriate error received using socket timeout [ 1019808 ] wrong socket error returned [ 1571878 ] Improvements to socket module exceptions [ 708927 ] socket timeouts produce wrong errors in win32 Just figuring out what exceptions can be raised from the socket module is tough. I've seen exceptions derived from "socket.error", exceptions from IOError, and exceptions from the SSL layer, which patches the sockets module when loaded. These are non-bug exceptions; that is, the problem is out in the network, external to the program. Some are retryable, some indicate that a different approach (different port, different protocol) should be tried, and some mean that some named resource doesn't exist. Programs need to make those distinctions reliably. The most important distinction with sockets is "external network problem" vs. "local program program". To resolve this, I suggest a "NetworkException" in the exception hierarchy, with all the things that can go wrong due to conditions external to the local machine under that exception. I'd suggest the following: 1. Add "NetworkError" under "IOError" in the exception hierarchy. 2. Put the existing "socket.error" under "NetworkError". Since "socket.error" needs to be reparented anyway (it's currently a direct descendant of "Exception") this provides a good place for it. 3. Find any places where the socket module can raise IOError or OSError due to an external network condition, and make them raise something under NetworkError instead. Code that catches IOError will still work. 4. Put all errors in the various SSL modules (SSLError, etc.) which can be raised due to external network conditions under "NetworkError" 5. Move "urllib2.URLError", which is currently under IOError, down a level under NetworkError. 6. Move the misc. errors from "urllib", like "ContentTooShortError", which are currently under IOError, down a level under NetworkError. 7. URL translation errors from the IDNA (Unicode URL encoding) module probably should raise an error similar to that for an incorrect URL, rather than raising a UnicodeError. Then, programs that catch NetworkError could be sure of catching all network trouble conditions, but not local code bugs. With these changes, any exception that's being caught now will still be caught. I'd suggest doing 1) above immediately, since that's a clear bug, but the others need to be discussed. -- >Comment By: Gregory P. Smith (greg) Date: 2007-07-02 16:39 Message: Logged In: YES user_id=413 Originator: NO agreed! the above suggestions sound good. for number (3) if there are any places that raise OSError, that could lead to code upgrade headaches as the new NetworkError would not be a subclass of OSError. imho thats fine but others may disagree. i am looking at implementing the immediate (1) and (2) as a starting point. -- Comment By: John Nagle (nagle) Date: 2007-04-24 11:12 Message: Logged In: YES user_id=5571 Originator: YES (See also PEP 352). -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir
Bugs item #1746880, was opened at 2007-07-02 21:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Installation Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Roger Upole (rupole) Assigned to: Nobody/Anonymous (nobody) Summary: AMD64 installer does not place python25.dll in system dir Initial Comment: Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll in the base python directory. Attaching install log. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir
Bugs item #1746880, was opened at 2007-07-02 21:02 Message generated for change (Comment added) made by rupole You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Installation Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Roger Upole (rupole) Assigned to: Nobody/Anonymous (nobody) Summary: AMD64 installer does not place python25.dll in system dir Initial Comment: Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll in the base python directory. Attaching install log. -- >Comment By: Roger Upole (rupole) Date: 2007-07-02 21:17 Message: Logged In: YES user_id=771074 Originator: YES This looks like a simple typo in msi.py: if msilib.Win64: SystemFolderName = "[SystemFolder64]" It should be System64Folder instead. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir
Bugs item #1746880, was opened at 2007-07-03 11:02 Message generated for change (Comment added) made by quiver You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Installation Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Roger Upole (rupole) >Assigned to: Martin v. Löwis (loewis) Summary: AMD64 installer does not place python25.dll in system dir Initial Comment: Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll in the base python directory. Attaching install log. -- >Comment By: George Yoshida (quiver) Date: 2007-07-03 11:49 Message: Logged In: YES user_id=671362 Originator: NO Martin, can you take a look? -- Comment By: Roger Upole (rupole) Date: 2007-07-03 11:17 Message: Logged In: YES user_id=771074 Originator: YES This looks like a simple typo in msi.py: if msilib.Win64: SystemFolderName = "[SystemFolder64]" It should be System64Folder instead. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir
Bugs item #1746880, was opened at 2007-07-03 04:02 Message generated for change (Comment added) made by loewis You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Installation Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Roger Upole (rupole) Assigned to: Martin v. Löwis (loewis) Summary: AMD64 installer does not place python25.dll in system dir Initial Comment: Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll in the base python directory. Attaching install log. -- >Comment By: Martin v. Löwis (loewis) Date: 2007-07-03 07:49 Message: Logged In: YES user_id=21627 Originator: NO Will do. Most likely, rupole is right with his analysis. -- Comment By: George Yoshida (quiver) Date: 2007-07-03 04:49 Message: Logged In: YES user_id=671362 Originator: NO Martin, can you take a look? -- Comment By: Roger Upole (rupole) Date: 2007-07-03 04:17 Message: Logged In: YES user_id=771074 Originator: YES This looks like a simple typo in msi.py: if msilib.Win64: SystemFolderName = "[SystemFolder64]" It should be System64Folder instead. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1745108 ] 2.5.1 curses panel segfault in new_panel on aix 5.3
Bugs item #1745108, was opened at 2007-06-28 17:13 Message generated for change (Comment added) made by nnorwitz You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Extension Modules Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Mattias Wadenstein (maswan) Assigned to: Nobody/Anonymous (nobody) Summary: 2.5.1 curses panel segfault in new_panel on aix 5.3 Initial Comment: I've compiled python 2.5.1 on AIX 5.3 with ncurses 5.6 and I get segmentation faults as soon as any curses.panel tries to make a new panel. The following test program gives a segmentation fault for me (remove the new_panel line and it works fine): import curses from curses import panel def mkpanel(scr): win = curses.newwin(8,8,1,1) pan = panel.new_panel(win) curses.wrapper(mkpanel) Also the test_curses program triggers this segfault. A traceback puts the problem in: root_panel(), line 57 in "p_new.c" new_panel(win = 0x000110246dc0), line 90 in "p_new.c" PyCurses_new_panel(self = (nil), args = 0x000110246dc0), line 396 in "_curses_panel.c" PyCFunction_Call(func = 0x00011024a368, arg = 0x000110246dc0, kw = (nil)), line 73 in "methodobject.c" Note that the ncurses I've compiled works fine with the shipped test programs, so it seems to be an issue with the python interaction. Please let me know if there is anything else that I can provide to help track this bug down. -- >Comment By: Neal Norwitz (nnorwitz) Date: 2007-07-02 23:21 Message: Logged In: YES user_id=33168 Originator: NO No python developer has access to AIX AFAIK. So you will likely need to debug this problem yourself or provide access to an AIX box. Here are some questions to get you started: * Does this problem happen as a 32-bit exe rather than 64-bit? * Did you use xlc, gcc, or some other compiler? * What happens if you switch compilers? * Does this happen if you disable optimization? * What happens if you build a debug version of python (./configure --with-pydebug)? * Do you have any memory debugging tool that you can use to track this down? It looks like there is a problem derefencing a function pointer. I don't know why that might happen. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
