[ python-Bugs-1745108 ] 2.5.1 curses panel segfault in new_panel on aix 5.3

2007-07-03 Thread SourceForge.net
Bugs item #1745108, was opened at 2007-06-29 00:13
Message generated for change (Comment added) made by maswan
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Extension Modules
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Mattias Wadenstein (maswan)
Assigned to: Nobody/Anonymous (nobody)
Summary: 2.5.1 curses panel segfault in new_panel on aix 5.3

Initial Comment:
I've compiled python 2.5.1 on AIX 5.3 with ncurses 5.6 and I get segmentation 
faults as soon as any curses.panel  tries to make a new panel.

The following test program gives a segmentation fault for me (remove the 
new_panel line and it works fine):

import curses
from curses import panel
def mkpanel(scr):
win = curses.newwin(8,8,1,1)
pan = panel.new_panel(win)
curses.wrapper(mkpanel)

Also the test_curses program triggers this segfault. A traceback puts the 
problem in:

root_panel(), line 57 in "p_new.c"
new_panel(win = 0x000110246dc0), line 90 in "p_new.c"
PyCurses_new_panel(self = (nil), args = 0x000110246dc0), line 396 in 
"_curses_panel.c"
PyCFunction_Call(func = 0x00011024a368, arg = 0x000110246dc0, kw = 
(nil)), line 73 in "methodobject.c"

Note that the ncurses I've compiled works fine with the shipped test programs, 
so it seems to be an issue with the python interaction.

Please let me know if there is anything else that I can provide to help track 
this bug down.

--

>Comment By: Mattias Wadenstein (maswan)
Date: 2007-07-03 08:12

Message:
Logged In: YES 
user_id=1831392
Originator: YES

We'll look into the issue of temporarily giving someone access, but it is
somewhat problematic.

Some answers until then:

* Same behaviour on both 32-bit and 64-bit
* Compiled with xlc v8.0
* I will try and set gcc up
* Same behaviour with or without optimization, debug flags, etc (just
slightly different backtraces, less verbose without debug symbols)
* Same thing --with-pydebug or without, just slightly different output
* I'll look, but nothing comes to mind right now

--

Comment By: Neal Norwitz (nnorwitz)
Date: 2007-07-03 06:21

Message:
Logged In: YES 
user_id=33168
Originator: NO

No python developer has access to AIX AFAIK.  So you will likely need to
debug this problem yourself or provide access to an AIX box.  Here are some
questions to get you started:

 * Does this problem happen as a 32-bit exe rather than 64-bit?
 * Did you use xlc, gcc, or some other compiler?
 * What happens if you switch compilers?
 * Does this happen if you disable optimization? 
 * What happens if you build a debug version of python (./configure
--with-pydebug)?
 * Do you have any memory debugging tool that you can use to track this
down?

It looks like there is a problem derefencing a function pointer.  I don't
know why that might happen.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1745108&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746088 ] long.__str__ is quadratic time

2007-07-03 Thread SourceForge.net
Bugs item #1746088, was opened at 2007-07-01 15:42
Message generated for change (Comment added) made by marketdickinson
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: long.__str__ is quadratic time

Initial Comment:
In the 2.5.1 source code, Objects/longobject.c:long_format() is used to convert 
a long int to a string.  When base is not a power of 2, it is quadratic time in 
the length of the output string.  Of course, this is fine on small numbers, but 
is catastrophic on huge numbers.

Suppose base is 10.  The problem is that the algorithm basically does the 
following: take the number mod 10 to get a digit, stick that digit on the 
output, divide the number by 10, and repeat.

To print an n digit number, there is an O(n log n) algorithm, using 
divide-and-conquer.  You break the number into 2 pieces, each n/2 digits long, 
and iterate on both pieces.

Converting string to long has the same quadratic time problem, in 
PyLong_FromString().  The solution is the same, in reverse: break the string in 
half, convert each piece to a long, and combine the two longs into one.


Alternatively, Python could just use GMP (GNU MP Bignum Library, 
http://gmplib.org/) to provide long integers.  That would make other 
operations, such as * and /, more efficient, too.  But it would require a much 
bigger change.

--

Comment By: Mark Dickinson (marketdickinson)
Date: 2007-07-03 14:22

Message:
Logged In: YES 
user_id=703403
Originator: NO

I'd call this a feature request rather than a bug.

If I understand correctly, an O(n^(1+epsilon)) printing algorithm would
rely on having an FFT-based fast multiplication algorithm, together with
some form of divide-and-conquer division---is this right?  These algorithms
are nontrivial to implement efficiently, and even then the crossover point
(where the FFT-based method becomes faster than the quadratic method) is
likely to be in the thousands of digits.  So I can't imagine there's much
demand for this---even a 4096-bit RSA key is only 1233 (or 1234) digits
long.  If you just want subquadratic printing (O(n^1.585) or so) then you'd
still need a subquadratic division (Python already has Karatsuba
multiplication for large integers); here I guess the crossover would be
smaller.  A subquadratic division for Python might well be of interest to
the developers, if someone could be persuaded to write and test one, and
demonstrate a significant positive impact on performance.

What's your use-case for printing huge integers fast?  It doesn't seem
like a very common need.

Regarding GMP, I believe there are licensing issues:  it's not legal to
include GMP in core Python and release Python under its current non-GPL
license, or something like that---I don't know anything about the details. 
But have you encountered Martelli's gmpy? 

http://code.google.com/p/gmpy/



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746088 ] long.__str__ is quadratic time

2007-07-03 Thread SourceForge.net
Bugs item #1746088, was opened at 2007-07-01 11:42
Message generated for change (Comment added) made by dbenbenn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: long.__str__ is quadratic time

Initial Comment:
In the 2.5.1 source code, Objects/longobject.c:long_format() is used to convert 
a long int to a string.  When base is not a power of 2, it is quadratic time in 
the length of the output string.  Of course, this is fine on small numbers, but 
is catastrophic on huge numbers.

Suppose base is 10.  The problem is that the algorithm basically does the 
following: take the number mod 10 to get a digit, stick that digit on the 
output, divide the number by 10, and repeat.

To print an n digit number, there is an O(n log n) algorithm, using 
divide-and-conquer.  You break the number into 2 pieces, each n/2 digits long, 
and iterate on both pieces.

Converting string to long has the same quadratic time problem, in 
PyLong_FromString().  The solution is the same, in reverse: break the string in 
half, convert each piece to a long, and combine the two longs into one.


Alternatively, Python could just use GMP (GNU MP Bignum Library, 
http://gmplib.org/) to provide long integers.  That would make other 
operations, such as * and /, more efficient, too.  But it would require a much 
bigger change.

--

>Comment By: David Benbennick (dbenbenn)
Date: 2007-07-03 14:23

Message:
Logged In: YES 
user_id=95581
Originator: YES

> rely on having an FFT-based fast multiplication algorithm, together
with
> some form of divide-and-conquer division---is this right?

Yes, that's true: fast str() relies on fast division.  I had assumed
Python already had fast division; if it doesn't, I'd consider that a bug,
too.

> What's your use-case for printing huge integers fast?

Note that it's not a question of printing them *fast*.  With a quadratic
time algorithm, it's infeasible to print huge numbers *at all*.  My
personal use case is doing computations in Thompson's group F; an element
of F is a list of humongous fractions.  But I expect it's a problem that
often comes up in mathematical programming.  I'll admit it isn't a very
important bug, since anyone who is harmed by it will either use a different
language, or use gmpy, or print in hex.  But it's still a bug.

> Regarding GMP, I believe there are licensing issues:  it's not legal to
> include GMP in core Python and release Python under its current non-GPL
> license, or something like that---I don't know anything about the
details.

I don't see what the problem would be.  Python's LICENSE file says that
Python's license is GPL compatible.  And in any case, GMP is LGPL, not GPL,
so any program can link to it.

--

Comment By: Mark Dickinson (marketdickinson)
Date: 2007-07-03 10:22

Message:
Logged In: YES 
user_id=703403
Originator: NO

I'd call this a feature request rather than a bug.

If I understand correctly, an O(n^(1+epsilon)) printing algorithm would
rely on having an FFT-based fast multiplication algorithm, together with
some form of divide-and-conquer division---is this right?  These algorithms
are nontrivial to implement efficiently, and even then the crossover point
(where the FFT-based method becomes faster than the quadratic method) is
likely to be in the thousands of digits.  So I can't imagine there's much
demand for this---even a 4096-bit RSA key is only 1233 (or 1234) digits
long.  If you just want subquadratic printing (O(n^1.585) or so) then you'd
still need a subquadratic division (Python already has Karatsuba
multiplication for large integers); here I guess the crossover would be
smaller.  A subquadratic division for Python might well be of interest to
the developers, if someone could be persuaded to write and test one, and
demonstrate a significant positive impact on performance.

What's your use-case for printing huge integers fast?  It doesn't seem
like a very common need.

Regarding GMP, I believe there are licensing issues:  it's not legal to
include GMP in core Python and release Python under its current non-GPL
license, or something like that---I don't know anything about the details. 
But have you encountered Martelli's gmpy? 

http://code.google.com/p/gmpy/



--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746088&group_id=5470
___
Python-bugs-lis

[ python-Bugs-1746071 ] class mutex doesn't do anything atomically

2007-07-03 Thread SourceForge.net
Bugs item #1746071, was opened at 2007-07-01 20:19
Message generated for change (Comment added) made by orsenthil
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: class mutex doesn't do anything atomically

Initial Comment:
>>> import mutex
>>> print mutex.mutex.testandset.__doc__
Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded.


The above docstring is wrong: the method is not atomic.  This is easy to see by 
inspecting the method's code:

def testandset(self):
"""Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded."""
if not self.locked:
self.locked = 1
return True
else:
return False

Therefore, it is possible for two threads to lock the same mutex 
simultaneously.  So the mutex module cannot be used for mutual exclusion.

The documentation for mutex says "The mutex module defines a class that allows 
mutual-exclusion via acquiring and releasing locks."  
[http://docs.python.org/lib/module-mutex.html].  Perhaps it would be a good 
idea to make the module actually do what the documentation says.

--

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-07-04 01:06

Message:
Logged In: YES 
user_id=942711
Originator: NO

Thanks David, there is something 'interesting' being observed here.
At a point:
Calling testandset in thread 1, m.locked is False
Calling testandset in thread 0, m.locked is False
Thread 0 locked
Resetting, trying again

Another place:
Calling testandset in thread 1, m.locked is False
Calling testandset in thread 0, m.locked is False
Thread 0 locked
Thread 1 locked
Hah, all these threads locked at the same time: [0, 1]

My doubts are still with threading, but am unable to derive anything.
Should someone more experienced look into? Or mind taking this for
suggestions to c.l.p?



--

Comment By: David Benbennick (dbenbenn)
Date: 2007-07-02 14:53

Message:
Logged In: YES 
user_id=95581
Originator: YES

> How are you using mutex with threads, can you please provide some
information.

I'm attaching an example program that demonstrates two threads both
locking the same mutex at the same time.

> If muobj is an instance of mutex class.
> muobj.testandset() for process-a will set the lock.
> muobj.testandset() for process-b will be dealt with self.lock = True
and
> wont be able to set.

That isn't correct.  It is possible for testandset to return True in both
thread-a and thread-b.  What can happen is the following:

1) Thread a calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
2) The OS switches threads.
3) Thread b calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
4) Thread b sets "self.locked = 1" and returns True
5) Thread a sets "self.locked = 1" and returns True
File Added: ex.py

--

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-07-02 08:40

Message:
Logged In: YES 
user_id=942711
Originator: NO

Hi David,
I just fired up the docs and found this:
"The mutex module defines a class that allows mutual-exclusion via
acquiring and releasing locks. It does not require (or imply) threading or
multi-tasking, though it could be useful for those purposes."

The docs dont say about threads using mutex object, but instead say if you
want to use threading you can use mutex obj.

How are you using mutex with threads, can you please provide some
information.

If muobj is an instance of mutex class.
muobj.testandset() for process-a will set the lock.
muobj.testandset() for process-b will be dealt with self.lock = True and
wont be able to set.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1746071 ] class mutex doesn't do anything atomically

2007-07-03 Thread SourceForge.net
Bugs item #1746071, was opened at 2007-07-01 10:49
Message generated for change (Comment added) made by dbenbenn
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: David Benbennick (dbenbenn)
Assigned to: Nobody/Anonymous (nobody)
Summary: class mutex doesn't do anything atomically

Initial Comment:
>>> import mutex
>>> print mutex.mutex.testandset.__doc__
Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded.


The above docstring is wrong: the method is not atomic.  This is easy to see by 
inspecting the method's code:

def testandset(self):
"""Atomic test-and-set -- grab the lock if it is not set,
return True if it succeeded."""
if not self.locked:
self.locked = 1
return True
else:
return False

Therefore, it is possible for two threads to lock the same mutex 
simultaneously.  So the mutex module cannot be used for mutual exclusion.

The documentation for mutex says "The mutex module defines a class that allows 
mutual-exclusion via acquiring and releasing locks."  
[http://docs.python.org/lib/module-mutex.html].  Perhaps it would be a good 
idea to make the module actually do what the documentation says.

--

>Comment By: David Benbennick (dbenbenn)
Date: 2007-07-03 17:26

Message:
Logged In: YES 
user_id=95581
Originator: YES

I've attached a patch to mutex.py that fixes the bug by acquiring a lock
in testandset() and unlock().  After you apply the patch, the previous
attachment will run forever.
File Added: patch.txt

--

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-07-03 15:36

Message:
Logged In: YES 
user_id=942711
Originator: NO

Thanks David, there is something 'interesting' being observed here.
At a point:
Calling testandset in thread 1, m.locked is False
Calling testandset in thread 0, m.locked is False
Thread 0 locked
Resetting, trying again

Another place:
Calling testandset in thread 1, m.locked is False
Calling testandset in thread 0, m.locked is False
Thread 0 locked
Thread 1 locked
Hah, all these threads locked at the same time: [0, 1]

My doubts are still with threading, but am unable to derive anything.
Should someone more experienced look into? Or mind taking this for
suggestions to c.l.p?



--

Comment By: David Benbennick (dbenbenn)
Date: 2007-07-02 05:23

Message:
Logged In: YES 
user_id=95581
Originator: YES

> How are you using mutex with threads, can you please provide some
information.

I'm attaching an example program that demonstrates two threads both
locking the same mutex at the same time.

> If muobj is an instance of mutex class.
> muobj.testandset() for process-a will set the lock.
> muobj.testandset() for process-b will be dealt with self.lock = True
and
> wont be able to set.

That isn't correct.  It is possible for testandset to return True in both
thread-a and thread-b.  What can happen is the following:

1) Thread a calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
2) The OS switches threads.
3) Thread b calls testandset().  It executes the line "if not
self.locked", and finds the result to be True.
4) Thread b sets "self.locked = 1" and returns True
5) Thread a sets "self.locked = 1" and returns True
File Added: ex.py

--

Comment By: O.R.Senthil Kumaran (orsenthil)
Date: 2007-07-01 23:10

Message:
Logged In: YES 
user_id=942711
Originator: NO

Hi David,
I just fired up the docs and found this:
"The mutex module defines a class that allows mutual-exclusion via
acquiring and releasing locks. It does not require (or imply) threading or
multi-tasking, though it could be useful for those purposes."

The docs dont say about threads using mutex object, but instead say if you
want to use threading you can use mutex obj.

How are you using mutex with threads, can you please provide some
information.

If muobj is an instance of mutex class.
muobj.testandset() for process-a will set the lock.
muobj.testandset() for process-b will be dealt with self.lock = True and
wont be able to set.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1746071&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/

[ python-Bugs-1706815 ] socket.error exceptions not subclass of StandardError

2007-07-03 Thread SourceForge.net
Bugs item #1706815, was opened at 2007-04-24 11:09
Message generated for change (Comment added) made by greg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: John Nagle (nagle)
Assigned to: Gregory P. Smith (greg)
Summary: socket.error exceptions not subclass of StandardError

Initial Comment:
The "socket.error" exception is a subclass of Exception, but not of 
StandardError.  It needs to be placed properly in the exception hierarchy, 
presumably somewhere under IOError.

Socket errors have some known problems.  See also:

[ 805194 ] Inappropriate error received using socket timeout
[ 1019808 ] wrong socket error returned
[ 1571878 ] Improvements to socket module exceptions
[ 708927 ] socket timeouts produce wrong errors in win32 

Just figuring out what exceptions can be raised from the socket module is 
tough.  I've seen exceptions derived from "socket.error", exceptions from 
IOError, and exceptions from the SSL layer, which patches the
sockets module when loaded.  These are non-bug exceptions; that is, the problem 
is out in the network, external to the program.

Some are retryable, some indicate that a different approach (different port, 
different protocol) should be tried, and some mean that some named resource 
doesn't exist.  Programs need to make those distinctions reliably.

The most important distinction with sockets is "external network problem" vs. 
"local program program".  To resolve this, I suggest a "NetworkException" in 
the exception hierarchy, with all the things that can go wrong due to 
conditions external to the local machine under that exception.

I'd suggest the following:

1.  Add "NetworkError" under "IOError" in the exception hierarchy.

2.  Put the existing "socket.error" under "NetworkError". Since "socket.error" 
needs to be reparented anyway (it's currently a direct descendant of 
"Exception") this provides a good place for it.

3.  Find any places where the socket module can raise IOError or OSError due to 
an external network condition, and make them raise something under NetworkError 
instead.  Code that catches IOError will still work.

4.  Put all errors in the various SSL modules (SSLError, etc.) which can be 
raised due to external network conditions under "NetworkError"

5.  Move "urllib2.URLError", which is currently under IOError, down a level 
under NetworkError.

6.  Move the misc. errors from "urllib", like "ContentTooShortError", which are 
currently under IOError, down a level under NetworkError.

7.  URL translation errors from the IDNA (Unicode URL encoding) module probably 
should raise an error similar to that for an incorrect URL, rather than raising 
a UnicodeError.  

Then, programs that catch NetworkError could be sure of catching all network 
trouble conditions, but not local code bugs. 

With these changes, any exception that's being caught now will still be caught.

I'd suggest doing 1) above immediately, since that's a clear bug, but the 
others need to be discussed.

 

--

>Comment By: Gregory P. Smith (greg)
Date: 2007-07-03 20:25

Message:
Logged In: YES 
user_id=413
Originator: NO

heres a patch against 2.6a0 svn HEAD implementing items 1 thru 6
(attached).  I didn't look deeply to see if 4 is really implemented or not
but the SSL errors were derived from socket.error so if they all use that
as their base then they should be covered.

test suites pass.
File Added: python-bug-1706815.diff

--

Comment By: Gregory P. Smith (greg)
Date: 2007-07-02 16:39

Message:
Logged In: YES 
user_id=413
Originator: NO

agreed! the above suggestions sound good.

for number (3) if there are any places that raise OSError, that could lead
to code upgrade headaches as the new NetworkError would not be a subclass
of OSError.  imho thats fine but others may disagree.

i am looking at implementing the immediate (1) and (2) as a starting
point.

--

Comment By: John Nagle (nagle)
Date: 2007-04-24 11:12

Message:
Logged In: YES 
user_id=5571
Originator: YES

(See also PEP 352).

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1706815&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[ python-Bugs-1744752 ] Newline skipped in "for line in file"

2007-07-03 Thread SourceForge.net
Bugs item #1744752, was opened at 2007-06-28 04:23
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.5
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Rune Devik (runedevik)
Assigned to: Nobody/Anonymous (nobody)
Summary: Newline skipped in "for line in file"

Initial Comment:
Creating new ticket for the bug described here since it was closed (and I was 
not able to reopen it): 
http://sourceforge.net/tracker/index.php?func=detail&aid=1636950&group_id=5470&atid=105470

The problem is that when you open a hughe file on windows with the "r" mode it 
will sometimes merge two lines. As I said in the ticket above (but probably 
ignored since I updated a closed ticket):

Hi

I have the same problem with a huge file (8GB) containing long lines. Sometimes 
two lines are merged into one and rerunning the test script that reads the file 
it's always the same lines that are merged. Also the merging happens more 
frequently towards the end of the file it seems. I tried to reproduce with a 
smaller data set (10 lines before the two lines that get merged, the two lines 
that gets merged and the 10 lines after that) but I was not able to reproduce 
on this smaller data set. However if you open this huge file in "rb" mode 
instead of "r" mode everything works as it should and no lines are merged at 
all! If I copy the file over to linux and rerun the test script no lines are 
merged (regardless if mode is "r" or "rb") so this is windows specific and 
might have something todo with the adding of \r\n if only \n is found when you 
open the file in "r" mode maybe? Also I have reproduced it on both python 2.3.5 
and 2.5c1 on both windows XP and windows 2003. 

More stats on the input file in both "r" mode and "rb" mode below:

Input file size: 8 695 828 KB

fp = open(file, "r"):
  - total number of lines read:  668909
  - length of the longest line:  13179792
  - length of the shortest line: 89
  - 56 lines contains the content of two lines
  - Always just two lines that are merged into one! 
  - Always the same lines that are merged rerunning the test on the same file. 

open(file, "rb"):
  - total number of lines read:  668965
  - length of the longest line:  13179793
  - length of the shortest line: 90
  - no lines merged

Regards,
Rune Devik

--

>Comment By: Neal Norwitz (nnorwitz)
Date: 2007-07-03 22:16

Message:
Logged In: YES 
user_id=33168
Originator: NO

Without a reproducible test case, there's really nothing we can do.  You
will need to debug this on your own.  Try setting a breakpoint in the
debugger in the file object, probably in get_line().  If you can make a
self contained test case, then we can help.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1744752&group_id=5470
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com