[ python-Bugs-1451466 ] reading very large files

2007-07-02 Thread SourceForge.net
Bugs item #1451466, was opened at 2006-03-16 18:21
Message generated for change (Comment added) made by richardchristen
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1451466&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Interpreter Core
>Group: Python 2.5
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: christen (richardchristen)
Assigned to: Nobody/Anonymous (nobody)
Summary: reading very large files

Initial Comment:
I work on the human genome
I extracted words from chromosomes using a suffix tree
(C compiled for 64 done on a SUN with 300 Go RAM, since
my suffix tree requires 150 Go RAM for chromosome 1,
the largest one)

this gave some >5 Go files, for example with 163763326
lines for chr 4, the one presently analyzed.

Using python 2.4.2 on a windows 32-computer (1.5 Go
RAM), reading this file line by line either

for li in file:
do something

or

while li!='':
li=file.readline()

I got problems seemingly around the 4 Go boundary
(after reading the problematic first line), for some
lines (not all), the li returned the correct content
but with the first word of the next line also within li
(see below)

As a result a simple
file1=open('1')
file2=open('2','w')
li=file1.readline()
while li!='':
file2.write(li) 
li=file1.readline()

produced a second file of only
163754385 lines
problem lines were "seemingly random", i.e. not in a
row, with the last line being OK.


The same code on the same file but on my OSX
64-dualcore machine went fine, despite the use of
default Python 2.2.3 and "file Python" showing it is a
Mach-0 executable ppc, i.e. a 32 bit app.

Everything was run from the command line.


the first file looks like that
...
TCAGCCACAGCAGAAAGTGA:\t33240 551212 751185
TCAGCCACAGCAGAAAGTGC:\t131324047
TCAGCCACAGCACTGTGTTA:\t61641912


the second file contains lines like these :
TCAGCCACAGCAGAAAGTGC:\t131324047TCAGCCACAGCAGAAGAAGA:  

which is 'first line'+'1rst word of next line'

PS1 : no problem to read the big file with UEdit on the
windows machine. Therefore the OS itself is not the
problem (also I transfered the bigfile from the Windows
to the Mac, if the file had had problems, it would have
been corrupted on the Mac)
PS2 : I tried python 2.3.5 on windows with the same
problem.
PS3: If needed, I can run the same test on a similar
file but for chromosome 8 which is slightly below the 4
Go limit (3.99).
PS4: I think I remember having done a similar parsing
on a Linux Athlon 64 monoCPU a month ago, with no trouble.

--

>Comment By: christen (richardchristen)
Date: 2007-07-02 09:11

Message:
Logged In: YES 
user_id=1477618
Originator: YES

In 2006, I signaled a bug in windows 32 for reading very large files :
python-Bugs-1451466

I have now tried with a windows 64 machines and python 2.5
I find the same bug

For very large files (the two I tried were around 7-8 Go), the end of line
is sometimes not taken into account

The file is fine, as viewed in hexa, the end of line characters are
perfectly ok at the place where the parser goes wrong.
Everything seems to be ok with the same script on my Mac OSX

Exemple :
Original file reads:
###
.
Query= 10|ENSG0203288|pseudogene|105829416|105829650|-
1|ENSE1440927|105829519|105829650|-1|1
 (132 letters)

Database: Homo_sapiens.NCBI36.45.dna.chromosome17 
   1 sequences; 78,774,742 total letters
...
###

in hexa:
###
...
c5bd3500h: 32 2E 0D 0A 0D 0A 51 75 65 72 79 3D 20 31 30 7C ; 2.Query=
10|
c5bd3510h: 45 4E 53 47 30 30 30 30 30 32 30 33 32 38 38 7C ;
ENSG0203288|
c5bd3520h: 70 73 65 75 64 6F 67 65 6E 65 7C 31 30 35 38 32 ;
pseudogene|10582
c5bd3530h: 39 34 31 36 7C 31 30 35 38 32 39 36 35 30 7C 2D ;
9416|105829650|-
c5bd3540h: 0D 0A 31 7C 45 4E 53 45 30 30 30 30 31 34 34 30 ;
..1|ENSE1440
c5bd3550h: 39 32 37 7C 31 30 35 38 32 39 35 31 39 7C 31 30 ;
927|105829519|10
c5bd3560h: 35 38 32 39 36 35 30 7C 2D 31 7C 31 0D 0A 20 20 ;
5829650|-1|1..  
c5bd3570h: 20 20 20 20 20 20 20 28 31 33 32 20 6C 65 74 74 ;(132
lett
c5bd3580h: 65 72 73 29 0D 0A 0D 0A 44 61 74 61 62 61 73 65 ;
ers)Database
c5bd3590h: 3A 20 48 6F 6D 6F 5F 73 61 70 69 65 6E 73 2E 4E ; :
Homo_sapiens.N
c5bd35a0h: 43 42 49 33 36 2E 34 35 2E 64 6E 61 2E 63 68 72 ;
CBI36.45.dna.chr
c5bd35b0h: 6F 6D 6F 73 6F 6D 65 31 37 20 0D 0A 20 20 20 20 ; omosome17 .. 
  
c5bd35c0h: 20 20 20 20 20 20 20 31 20 73 65 71 75 65 6E 63 ;1
sequenc
c5bd35d0h: 65 73 3B 20 37 38 2C 37 37 34 2C 37 34 32 20 74 ; es;
78,774,742 t
c5bd35e0h: 6F 74 61 6C 20 6C 65 74 74 65 72 73 0D 0A 0D 0A ; otal
letters
...
###


Demo: python script :
###

[ python-Bugs-1746071 ] class mutex doesn't do anything atomically

2007-07-02 Thread SourceForge.net
Bugs item #1746071, was opened at 2007-07-01 10:49
Message generated for change (Comment added) made by dbenbenn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: class mutex doesn't do anything atomically

Initial Comment:
>>> import mutex
>>> print mutex.mutex.testandset.__doc__
Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded.


The above docstring is wrong: the method is not atomic.  This is easy to see by 
inspecting the method's code:

def testandset(self):
"""Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded."""
if not self.locked:
self.locked = 1
return True
else:
return False

Therefore, it is possible for two threads to lock the same mutex 
simultaneously.  So the mutex module cannot be used for mutual exclusion.

The documentation for mutex says "The mutex module defines a class that allows 
mutual-exclusion via acquiring and releasing locks."  
[http://docs.python.org/lib/module-mutex.html].  Perhaps it would be a good 
idea to make the module actually do what the documentation says.

--

>Comment By: David Benbennick (dbenbenn)
Date: 2007-07-02 05:23

Message:
Logged In: YES 
user_id=95581
Originator: YES

> How are you using mutex with threads, can you please provide some
information.

I'm attaching an example program that demonstrates two threads both
locking the same mutex at the same time.

> If muobj is an instance of mutex class.
> muobj.testandset() for process-a will set the lock.
> muobj.testandset() for process-b will be dealt with self.lock = True
and
> wont be able to set.

That isn't correct.  It is possible for testandset to return True in both
thread-a and thread-b.  What can happen is the following:

1) Thread a calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
2) The OS switches threads.
3) Thread b calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
4) Thread b sets "self.locked = 1" and returns True
5) Thread a sets "self.locked = 1" and returns True
File Added: ex.py

--

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-07-01 23:10

Message:
Logged In: YES 
user_id=942711
Originator: NO

Hi David,
I just fired up the docs and found this:
"The mutex module defines a class that allows mutual-exclusion via
acquiring and releasing locks. It does not require (or imply) threading or
multi-tasking, though it could be useful for those purposes."

The docs dont say about threads using mutex object, but instead say if you
want to use threading you can use mutex obj.

How are you using mutex with threads, can you please provide some
information.

If muobj is an instance of mutex class.
muobj.testandset() for process-a will set the lock.
muobj.testandset() for process-b will be dealt with self.lock = True and
wont be able to set.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1706815 ] socket.error exceptions not subclass of StandardError

2007-07-02 Thread SourceForge.net
Bugs item #1706815, was opened at 2007-04-24 11:09
Message generated for change (Comment added) made by greg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Nagle (nagle)
>Assigned to: Gregory P. Smith (greg)
Summary: socket.error exceptions not subclass of StandardError

Initial Comment:
The "socket.error" exception is a subclass of Exception, but not of 
StandardError.  It needs to be placed properly in the exception hierarchy, 
presumably somewhere under IOError.

Socket errors have some known problems.  See also:

[ 805194 ] Inappropriate error received using socket timeout
[ 1019808 ] wrong socket error returned
[ 1571878 ] Improvements to socket module exceptions
[ 708927 ] socket timeouts produce wrong errors in win32 

Just figuring out what exceptions can be raised from the socket module is 
tough.  I've seen exceptions derived from "socket.error", exceptions from 
IOError, and exceptions from the SSL layer, which patches the
sockets module when loaded.  These are non-bug exceptions; that is, the problem 
is out in the network, external to the program.

Some are retryable, some indicate that a different approach (different port, 
different protocol) should be tried, and some mean that some named resource 
doesn't exist.  Programs need to make those distinctions reliably.

The most important distinction with sockets is "external network problem" vs. 
"local program program".  To resolve this, I suggest a "NetworkException" in 
the exception hierarchy, with all the things that can go wrong due to 
conditions external to the local machine under that exception.

I'd suggest the following:

1.  Add "NetworkError" under "IOError" in the exception hierarchy.

2.  Put the existing "socket.error" under "NetworkError". Since "socket.error" 
needs to be reparented anyway (it's currently a direct descendant of 
"Exception") this provides a good place for it.

3.  Find any places where the socket module can raise IOError or OSError due to 
an external network condition, and make them raise something under NetworkError 
instead.  Code that catches IOError will still work.

4.  Put all errors in the various SSL modules (SSLError, etc.) which can be 
raised due to external network conditions under "NetworkError"

5.  Move "urllib2.URLError", which is currently under IOError, down a level 
under NetworkError.

6.  Move the misc. errors from "urllib", like "ContentTooShortError", which are 
currently under IOError, down a level under NetworkError.

7.  URL translation errors from the IDNA (Unicode URL encoding) module probably 
should raise an error similar to that for an incorrect URL, rather than raising 
a UnicodeError.  

Then, programs that catch NetworkError could be sure of catching all network 
trouble conditions, but not local code bugs. 

With these changes, any exception that's being caught now will still be caught.

I'd suggest doing 1) above immediately, since that's a clear bug, but the 
others need to be discussed.

 

--

>Comment By: Gregory P. Smith (greg)
Date: 2007-07-02 16:39

Message:
Logged In: YES 
user_id=413
Originator: NO

agreed! the above suggestions sound good.

for number (3) if there are any places that raise OSError, that could lead
to code upgrade headaches as the new NetworkError would not be a subclass
of OSError.  imho thats fine but others may disagree.

i am looking at implementing the immediate (1) and (2) as a starting
point.

--

Comment By: John Nagle (nagle)
Date: 2007-04-24 11:12

Message:
Logged In: YES 
user_id=5571
Originator: YES

(See also PEP 352).

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir

2007-07-02 Thread SourceForge.net
Bugs item #1746880, was opened at 2007-07-02 21:02
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Installation
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roger Upole (rupole)
Assigned to: Nobody/Anonymous (nobody)
Summary: AMD64 installer does not place python25.dll in system dir

Initial Comment:
Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll 
in the base python directory.  Attaching install log.



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir

2007-07-02 Thread SourceForge.net
Bugs item #1746880, was opened at 2007-07-02 21:02
Message generated for change (Comment added) made by rupole
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Installation
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roger Upole (rupole)
Assigned to: Nobody/Anonymous (nobody)
Summary: AMD64 installer does not place python25.dll in system dir

Initial Comment:
Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll 
in the base python directory.  Attaching install log.



--

>Comment By: Roger Upole (rupole)
Date: 2007-07-02 21:17

Message:
Logged In: YES 
user_id=771074
Originator: YES

This looks like a simple typo in msi.py:
if msilib.Win64:
SystemFolderName = "[SystemFolder64]"

It should be System64Folder instead.



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir

2007-07-02 Thread SourceForge.net
Bugs item #1746880, was opened at 2007-07-03 11:02
Message generated for change (Comment added) made by quiver
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Installation
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roger Upole (rupole)
>Assigned to: Martin v. Löwis (loewis)
Summary: AMD64 installer does not place python25.dll in system dir

Initial Comment:
Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll 
in the base python directory.  Attaching install log.



--

>Comment By: George Yoshida (quiver)
Date: 2007-07-03 11:49

Message:
Logged In: YES 
user_id=671362
Originator: NO

Martin, can you take a look?

--

Comment By: Roger Upole (rupole)
Date: 2007-07-03 11:17

Message:
Logged In: YES 
user_id=771074
Originator: YES

This looks like a simple typo in msi.py:
if msilib.Win64:
SystemFolderName = "[SystemFolder64]"

It should be System64Folder instead.



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746880 ] AMD64 installer does not place python25.dll in system dir

2007-07-02 Thread SourceForge.net
Bugs item #1746880, was opened at 2007-07-03 04:02
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Installation
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Roger Upole (rupole)
Assigned to: Martin v. Löwis (loewis)
Summary: AMD64 installer does not place python25.dll in system dir

Initial Comment:
Even with All Users selected, the 2.5.1 AMD64 msi installer puts python25.dll 
in the base python directory.  Attaching install log.



--

>Comment By: Martin v. Löwis (loewis)
Date: 2007-07-03 07:49

Message:
Logged In: YES 
user_id=21627
Originator: NO

Will do. Most likely, rupole is right with his analysis.

--

Comment By: George Yoshida (quiver)
Date: 2007-07-03 04:49

Message:
Logged In: YES 
user_id=671362
Originator: NO

Martin, can you take a look?

--

Comment By: Roger Upole (rupole)
Date: 2007-07-03 04:17

Message:
Logged In: YES 
user_id=771074
Originator: YES

This looks like a simple typo in msi.py:
if msilib.Win64:
SystemFolderName = "[SystemFolder64]"

It should be System64Folder instead.



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746880&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1745108 ] 2.5.1 curses panel segfault in new_panel on aix 5.3

2007-07-02 Thread SourceForge.net
Bugs item #1745108, was opened at 2007-06-28 17:13
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mattias Wadenstein (maswan)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.5.1 curses panel segfault in new_panel on aix 5.3

Initial Comment:
I've compiled python 2.5.1 on AIX 5.3 with ncurses 5.6 and I get segmentation 
faults as soon as any curses.panel  tries to make a new panel.

The following test program gives a segmentation fault for me (remove the 
new_panel line and it works fine):

import curses
from curses import panel
def mkpanel(scr):
win = curses.newwin(8,8,1,1)
pan = panel.new_panel(win)
curses.wrapper(mkpanel)

Also the test_curses program triggers this segfault. A traceback puts the 
problem in:

root_panel(), line 57 in "p_new.c"
new_panel(win = 0x000110246dc0), line 90 in "p_new.c"
PyCurses_new_panel(self = (nil), args = 0x000110246dc0), line 396 in 
"_curses_panel.c"
PyCFunction_Call(func = 0x00011024a368, arg = 0x000110246dc0, kw = 
(nil)), line 73 in "methodobject.c"

Note that the ncurses I've compiled works fine with the shipped test programs, 
so it seems to be an issue with the python interaction.

Please let me know if there is anything else that I can provide to help track 
this bug down.

--

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-07-02 23:21

Message:
Logged In: YES 
user_id=33168
Originator: NO

No python developer has access to AIX AFAIK.  So you will likely need to
debug this problem yourself or provide access to an AIX box.  Here are some
questions to get you started:

 * Does this problem happen as a 32-bit exe rather than 64-bit?
 * Did you use xlc, gcc, or some other compiler?
 * What happens if you switch compilers?
 * Does this happen if you disable optimization? 
 * What happens if you build a debug version of python (./configure
--with-pydebug)?
 * Do you have any memory debugging tool that you can use to track this
down?

It looks like there is a problem derefencing a function pointer.  I don't
know why that might happen.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com