[issue1141] reading large files

2007-09-10 Thread christen

New submission from christen:

September 11, 2007 I downloaded py 3.k

The good news :
Under Windows, Python 3k properly reads files larger than 4 Go (in
contrast to python 2.5 that skips some lines, see below)

The bad news : py 3k is very slow compared to py 2.5; see the results below
the code is 
it reads a 4.9 Go file of 81,017,719 lines (a genbank entry of bacterial
sequences)

###
import time 
print (time.localtime())
fichin=open(r'D:\pythons\16s\total_gb_161_16S.gb')
t0= time.localtime()
print (t0)
i=0

for li in fichin:
i+=1
if i%100==0: 
print (i,time.localtime())

fichin.close()
print ()
print (i)
print (time.localtime())
#


I got the following results (Windows XP 64) on the same machine, using
either py 3k or py 2.5
As soon as my BSD and Linux machines are done with calculations, I will
try that on them.
Best
Richard Christen


python 3k

(2007, 9, 10, 13, 53, 36, 0, 253, 1)
(2007, 9, 10, 13, 53, 36, 0, 253, 1)
100 (2007, 9, 10, 13, 53, 49, 0, 253, 1)
200 (2007, 9, 10, 13, 54, 3, 0, 253, 1)
300 (2007, 9, 10, 13, 54, 18, 0, 253, 1)
400 (2007, 9, 10, 13, 54, 32, 0, 253, 1)
500 (2007, 9, 10, 13, 54, 47, 0, 253, 1)

7700 (2007, 9, 10, 14, 14, 55, 0, 253, 1)
7800 (2007, 9, 10, 14, 15, 9, 0, 253, 1)
7900 (2007, 9, 10, 14, 15, 22, 0, 253, 1)
8000 (2007, 9, 10, 14, 15, 36, 0, 253, 1)
8100 (2007, 9, 10, 14, 15, 49, 0, 253, 1)

81017719#this is the proper number of lines 
(2007, 9, 10, 14, 15, 50, 0, 253, 1)


Python 2.5

(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(2007, 9, 10, 14, 18, 33, 0, 253, 1)
(100, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(200, (2007, 9, 10, 14, 18, 34, 0, 253, 1))
(300, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(400, (2007, 9, 10, 14, 18, 35, 0, 253, 1))
(500, (2007, 9, 10, 14, 18, 36, 0, 253, 1))
...
(7700, (2007, 9, 10, 14, 19, 10, 0, 253, 1))
(7800, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(7900, (2007, 9, 10, 14, 19, 11, 0, 253, 1))
(8000, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
(8100, (2007, 9, 10, 14, 19, 12, 0, 253, 1))
()
81014962  #python 2.5 missed some lines 
(2007, 9, 10, 14, 19, 12, 0, 253, 1)

--
components: Tests
messages: 55777
nosy: [EMAIL PROTECTED]
severity: normal
status: open
title: reading large files
type: behavior
versions: Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1141>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1141] reading large files

2007-09-10 Thread christen

christen added the comment:

Hi Martin

I could certainly do that, but how you get my huge files ? 5 Go of data 
is quite big...

> If you want to compute runtimes, it is better to not convert them to
> local time. Instead, use the pattern
>
> start = time.time()
> ...
>   print time.time()-start # seconds since the program started
>   

OK I'll do that next time

Richard

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1141>
______begin:vcard
fn:Richard Christen
n:Christen;Richard
org;quoted-printable:CNRS UMR 6543  & Universit=C3=A9 de Nice;Laboratoire de Biologie Virtuelle
adr:Parc Valrose;;Centre de Biochimie;Nice;;06108;France
email;internet:[EMAIL PROTECTED]
title;quoted-printable:Champion de saut en =C3=A9paisseur
tel;work:33- 492 076 947
url:http://bioinfo.unice.fr
version:2.1
end:vcard

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1141] reading large files

2007-09-10 Thread christen

christen added the comment:

Hi Stefan

Calculations are underway
both read and write do not work well with p3k

you can try the code below on your own machine :
fichout.write(str(i)+' '*59+'\n')  #generates a big file
fichout.write(str(i)+'\n')   #generate file <4Go

the big file is not read properly with python 2.5  (the small one is)
the big file is long to write and to read with python 3.k

I send you the results as soon it is done under 3k (very very slow indeed)

best
r

import sys
print(sys.version_info)
import time
print (time.strftime('%Y-%m-%d %H:%M:%S'))
liste=[]
start = time.time()
fichout=open('test.txt','w')
for i in xrange(85014961):
if i%500==0 and i>0:
print (i,time.time()-start)
fichout.write(str(i)+' '*59+'\n')
fichout.close()
print ('total lines written ',i)
print (i,time.time()-start)
print ('*'*50)
fichin=open('test.txt')
start3 = time.time()
for i,li in enumerate(fichin):
if i%500==0 and i>0:
print (i,time.time()-start3)
fichin.close()
print ('total lines read ',i)
print(time.time()-start)

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1141>
__begin:vcard
fn:Richard Christen
n:Christen;Richard
org;quoted-printable:CNRS UMR 6543  & Universit=C3=A9 de Nice;Laboratoire de Biologie Virtuelle
adr:Parc Valrose;;Centre de Biochimie;Nice;;06108;France
email;internet:[EMAIL PROTECTED]
title;quoted-printable:Champion de saut en =C3=A9paisseur
tel;work:33- 492 076 947
url:http://bioinfo.unice.fr
version:2.1
end:vcard

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5

2007-09-10 Thread christen

New submission from christen:

Error in reading >4Go files under windows

try this:

import sys
print(sys.version_info)
import time
print (time.strftime('%Y-%m-%d %H:%M:%S'))
liste=[]
start = time.time()
fichout=open('test.txt','w')
for i in xrange(85014961):
if i%500==0 and i>0:
print (i,time.time()-start)
fichout.write(str(i)+' '*59+'\n')
fichout.close()
print ('total lines written ',i)
print (i,time.time()-start)
print ('*'*50)
fichin=open('test.txt')
start3 = time.time()
for i,li in enumerate(fichin):
if i%500==0 and i>0:
print (i,time.time()-start3)
fichin.close()
print ('total lines read ',i)
print(time.time()-start)

it generates a >4Go file,not all lines are read !!
example:
('total lines written ', 85014960)
('total lines read ', 85014950)
10 lines are missing

if you replace by
fichout.write(str(i)+' '*59+'\n')

file is now under 4Go, is properly read
Used both a 32 and 64 Windows XP machines

seems to work with Linux and BSD (did not tried this example but had no
pb with my home made big files)
Pb : many examples of >4Go files for the human genome and other
biological applications. Almost sure that people are doing mistakes,
because it took me a while before discovering that...
Note : does not happen with py 3k :-)

--
components: Windows
messages: 55785
nosy: [EMAIL PROTECTED]
severity: urgent
status: open
title: code sample showing errors reading large files with py 2.5
type: behavior
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1142>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5

2007-09-10 Thread christen

christen added the comment:

made an error in copy paste

if you replace by
fichout.write(str(i)+' '*59+'\n')

should be 
if you replace by
fichout.write(str(i)+'\n')
of course :-(

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1142>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-10 Thread christen

christen added the comment:

Hi Guido

It is not the end of the file that is not read (see also below)

I found about that about one year ago when I was parsing very large 
files resulting from "blast" on the human genome
My parser chock after 4 Go, well before the end of the file : one line 
was missing and my acc=li[x:y] end up with an error, because acc was 
never filled...
This was kind of strange because this had not happened before with my 
Linux box.

I opened the file (which I had created myself) with a editor that could 
show hexa code : the proper line was there and allright.
If I remember well, I modified my code to see better what was going on : 
in fact the missing line had been concateneted to the previous line 
despite the proper existence of the end of line (hexa code was ok). see 
also below

I forgot about that because nobody replied to my mails, and I thought it 
was possibly related with windows 32 . I moved to a windows 64 recently 
(windows has the best driver for SQL databases) and forgot about the bug 
until I again ran into it. I then decided to try python 3k, it reads 
 >4Go file with no trouble but is so so slow, both in reading and 
writing files.
The following code produces either <4Go or >4Go files depending upon 
which fichout.write is commented
They both have the same line numbers, but the >4Go does not read 
completely under windows (32 or 64)
I have no such pb on Linux or BSD (Mac).

python 3k on windows read both files ok, but is very very slow (change 
xrange to range , I guess it is preposterous to advice you about that :-).

best
Richard

import sys
print(sys.version_info)
import time
print (time.strftime('%Y-%m-%d %H:%M:%S'))
liste=[]
start = time.time()
fichout=open('test.txt','w')
for i in xrange(85014961):
if i%500==0 and i>0:
print (i,time.time()-start)
fichout.write(str(i)+' '*59+'\n')  #big file
#fichout.write(str(i)+'\n')#small file, same number of lines

fishout.flush()
fichout.close()
print ('total lines written ',i)
print (i,time.time()-start)
print ('*'*50)
fichin=open('test.txt')
start3 = time.time()
for i,li in enumerate(fichin):
if i%500==0 and i>0:
print (i,time.time()-start3)
fichin.close()
print ('total lines read ',i)
print(time.time()-start)

> Richard, can you somehow view the end of the file to see what its last
> lines actually are?  It should end like this:
>
> 85014951
> 85014952
> 85014953
> 85014954
> 85014955
> 85014956
> 85014957
> 85014958
> 85014959
> 85014960
>
>   

using a text editor reads:
85014944  
85014945  
85014946  
85014947  
85014948  
85014949  
85014950  
85014951  
85014952  
85014953  
85014954  
85014955  
85014956  
85014957  
85014958  
85014959  
85014960  

windows py 2.5, with
if i>85014940:
print i, li.strip()

prints :
(2, 5, 0, 'final', 0)
2007-09-11 07:58:47
(500, 2.6720001697540283)
(1000, 5.375)
(1500, 8.032648498535)
(2000, 10.70368664551)
(2500, 13.375)
(3000, 16.047000169754028)
(3500, 18.70368664551)
(4000, 21.36133514404)
(4500, 24.03264849854)
(5000, 26.68763760376)
(5500, 29.36133514404)
(6000, 32.03264849854)
(6500, 34.70368664551)
(7000, 37.40764849854)
(7500, 40.094000101089478)
(8000, 42.797000169754028)
(8500, 45.485000133514404)
85014941 85014951  
85014942 85014952  
85014943 85014953  
85014944 85014954  
85014945 85014955  
85014946 85014956  

[issue1142] code sample showing errors reading large files with py 2.5/3.0

2007-09-11 Thread christen

christen added the comment:

Bug is still there but pb is solved, simply use oepn('file', 'U')
see outputs :

fichin=open('test.txt','U')
===>
(2, 5, 0, 'final', 0)
2007-09-12 08:00:43
(500, 9.31236239624)
(1000, 22.31236239624)
(1500, 35.094000101089478)
(2000, 47.81236239624)
(2500, 60.56236239624)
(3000, 73.265000104904175)
(3500, 85.95368664551)
(4000, 98.672000169754028)
(4500, 111.35900020599365)
(5000, 123.98400020599365)
(5500, 136.625)
(6000, 149.26500010490417)
(6500, 161.9060001373291)
(7000, 174.625)
(7500, 187.29700016975403)
(8000, 199.8910490417)
(8500, 212.5310001373291)
('total lines read ', 85014960)
212.56236

now with
fichin=open('test.txt')
or
fichin=open('test.txt','r')
===>

(2, 5, 0, 'final', 0)
2007-09-12 08:04:48
(500, 3.18763760376)
(1000, 6.3440001010894775)
(1500, 9.4690001010894775)
(2000, 12.594000101089478)
(2500, 15.719000101089478)
(3000, 18.844000101089478)
(3500, 21.969000101089478)
(4000, 25.094000101089478)
(4500, 28.219000101089478)
(5000, 31.344000101089478)
(5500, 34.469000101089478)
(6000, 37.594000101089478)
* 62410138   
62410139 *
* 62414887   
62414888 *
* 62415540   
62415541 *
* 62420289   
62420290 *
* 62420942   
62420943 *
* 62421595   
62421596 *
* 62422248   
62422249 *
* 62422901   
62422902 *
* 62427650   
62427651 *
* 62428303   
62428304 *
(6500, 40.75)
(7000, 43.95368664551)
(7500, 47.125)
(8000, 50.32868664551)
(8500, 53.51632424927)
('total lines read ', 85014950)
53.516324

best
Richard

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1142>
__begin:vcard
fn:Richard Christen
n:Christen;Richard
org;quoted-printable:CNRS UMR 6543  & Universit=C3=A9 de Nice;Laboratoire de Biologie Virtuelle
adr:Parc Valrose;;Centre de Biochimie;Nice;;06108;France
email;internet:[EMAIL PROTECTED]
title;quoted-printable:Champion de saut en =C3=A9paisseur
tel;work:33- 492 076 947
url:http://bioinfo.unice.fr
version:2.1
end:vcard

___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1451466] reading very large files

2010-05-12 Thread christen

christen  added the comment:

I have no idea because
- I am using 2.5 (windows) or 2.6 (2.5 because of old stuff that I 
compiled compatible with 2.5 not 2.6)
- I am using open(file, 'U') that solved the problem under windows, and 
the pd does not exist in Linux
best
Richard

Terry J. Reedy a écrit :
> Terry J. Reedy  added the comment:
>
> Is this still an issue for 2.7?
>
> --
> nosy: +tjreedy
>
> ___
> Python tracker 
> <http://bugs.python.org/issue1451466>
> ___
>
>
>

--
nosy: +richard.chris...@unice.fr

___
Python tracker 
<http://bugs.python.org/issue1451466>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24537] Py_Initialize unable to load the file system codec

2015-06-30 Thread Dana Christen

New submission from Dana Christen:

I'm using the C API to embed the Python interpreter (see the attached example). 
Everything works fine until I try to run the resulting executable on a machine 
without a Python installation. In that case, the call to Py_Initialize fails 
with the following message:

Fatal Python error: Py_Initialize: unable to load the file system codec
ImportError: No module named 'encodings'

This was on Windows 7 64 bit, and the program was compiled using MS Visual 
Studio 2010 in x64 Release mode, using the official Python 3.4.3 64 bit release 
(v3.4.3:9b73f1c3e601).

--
components: Extension Modules
files: python_api_hello.c
messages: 245984
nosy: Dana Christen
priority: normal
severity: normal
status: open
title: Py_Initialize unable to load the file system codec
type: crash
versions: Python 3.4
Added file: http://bugs.python.org/file39838/python_api_hello.c

___
Python tracker 
<http://bugs.python.org/issue24537>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com