Re: [Tutor] f.readlines(size)

2017-06-06 Thread Peter Otten
Nancy Pham-Nguyen wrote:

> Hi,

Hi Nancy, the only justification for the readlines() method is to serve as a 
trap to trick newbies into writing scripts that consume more memory than 
necessary. While the size argument offers a way around that, there are still 
next to no use cases for readlines.

Iterating over a file directly is a very common operation and a lot of work 
to make it efficient was spent on it. Use it whenever possible.

To read groups of lines consider

# last chunk may be shorter
with open(FILENAME) as f:
while True:
chunk = list(itertools.islice(f, 3))
if not chunk:
break
process_lines(chunk)

or 

# last chunk may be filled with None values
with open(FILENAME) as f:
for chunk in itertools.zip_longest(f, f, f): # Py2: izip_longest
process_lines(chunk)

In both cases you will get chunks of three lines, the only difference being 
the handling of the last chunk.

> I'm trying to understand the optional size argument in file.readlines
> method. The help(file) shows: |  readlines(...) |  readlines([size])
> -> list of strings, each a line from the file. |   |  Call
> readline() repeatedly and return a list of the lines so read. |  The
> optional size argument, if given, is an approximate bound on the | 
> total number of bytes in the lines returned. From the
> documentation:f.readlines() returns a list containing all the lines of
> data in the file. If given an optional parameter sizehint, it reads that
> many bytes from the file and enough more to complete a line, and returns
> the lines from that. This is often used to allow efficient reading of a
> large file by lines, but without having to load the entire file in memory.
> Only complete lines will be returned. I wrote the function below to try
> it, thinking that it would print multiple times, 3 lines at a time, but it
> printed all in one shot, just like when I din't specify the optional
> argument. Could someone explain what I've missed? See input file and
> output below. Thanks,Nancy 

> def readLinesWithSize():
> # bufsize = 65536
> bufsize = 45  
> with open('input.txt') as f: while True:
> # print len(f.readlines(bufsize))   # this will print 33   
> print 
> lines = f.readlines(bufsize) print lines
> if not lines: break for line in lines:
> pass  readLinesWithSize() Output:

This seems to be messed up a little by a "helpful" email client. Therefore 
I'll give my own:

$ cat readlines_demo.py
LINESIZE=32
with open("tmp.txt", "w") as f:
for i in range(30):
f.write("{:02} {}\n".format(i, "x"*(LINESIZE-4)))

BUFSIZE = LINESIZE*3-1
print("bufsize", BUFSIZE)

with open("tmp.txt", "r") as f:
while True:
chunk = f.readlines(BUFSIZE)
if not chunk:
break
print(sum(map(len, chunk)), "bytes:", chunk)
$ python3 readlines_demo.py
bufsize 95
96 bytes: ['00 \n', '01 
\n', '02 \n']
96 bytes: ['03 \n', '04 
\n', '05 \n']
96 bytes: ['06 \n', '07 
\n', '08 \n']
...

So in Python 3 this does what you expect, readlines() stops collecting more 
lines once the total number of bytes exceeds those specified.

"""
readlines(...) method of _io.TextIOWrapper instance
Return a list of lines from the stream.

hint can be specified to control the number of lines read: no more
lines will be read if the total size (in bytes/characters) of all
lines so far exceeds hint.
"""

In Python 2 the docstring is a little vague

"""
The optional size argument, if given, is an *approximate* *bound* on the
total number of bytes in the lines returned.
"""

(emphasis mine) and it seems that small size values which defeat the goal of 
making the operation efficient are ignored:

$ python readlines_demo.py
('bufsize', 95)
(960, 'bytes:', ['00 \n', '01 
\n', '28 \n', '29
...
 \n'])

Playing around a bit on my system the minimum value with an effect seems to 
be about 2**13, but I haven't consulted the readlines source code to verify.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Huge list comprehension

2017-06-06 Thread Peter Otten
syed zaidi wrote:

> 
> hi,
> 
> I would appreciate if you can help me suggesting a quick and efficient
> strategy for comparing multiple lists with one principal list
> 
> I have about 125 lists containing about 100,000 numerical entries in each
> 
> my principal list contains about 6 million entries.
> 
> I want to compare each small list with main list and append yes/no or 0/1
> in each new list corresponding to each of 125 lists
> 
> 
> The program is working but it takes ages to process huge files,
> Can someone pleases tell me how can I make this process fast. Right now it
> takes arounf 2 weeks to complete this task
> 
> 
> the code I have written and is working is as under:
> 
> 
> sample_name = []
> 
> main_op_list,principal_list = [],[]
> dictionary = {}
> 
> with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r')
> as f:
> reader = csv.reader(f, dialect = 'excel', delimiter='\t')
> list2 = filter(None, reader)
> for i in range(len(list2)):
> col1 = list2[i][0]
> operon = list2[i][1]
> main_op_list.append(operon)
> col1 = col1.strip().split("_")
> sample_name = col1[0]
> if dictionary.get(sample_name):
> dictionary[sample_name].append(operon)
> else:
> dictionary[sample_name] = []
> dictionary[sample_name].append(operon)
> locals().update(dictionary) ## converts dictionary keys to variables

Usually I'd refuse to go beyond the line above.
DO NOT EVER WRITE CODE LIKE THAT.
You have your data in a nice dict -- keep it there where it belongs.

> ##print DLF004
> dict_values = dictionary.values()
> dict_keys = dictionary.keys()
> print dict_keys
> print len(dict_keys)
> main_op_list_np = np.array(main_op_list)
> 
> 
DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1,DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1,DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1,DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1
> =[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]

This is mind-numbing...

> for i in main_op_list_np:
> if i in DLF002: DLF002_1.append('1')
> else:DLF002_1.append('0')
> if i in DLF004: DLF004_1.append('1')
> else:DLF004_1.append('0')
> if i in DLF005: DLF005_1.append('1')
> else:DLF005_1.append('0')
> if i in DLF006: DLF006_1.append('1')
> else:DLF006_1.append('0')

... and this is, too. 

Remember, we are volunteers and keep your code samples small. Whether there 
are three if-else checks or one hundred -- the logic remains the same.

Give us a small sample script and a small dataset to go with it, use dicts 
instead of dumping everything into the module namespace, explain the 
script's purpose in plain english, and identify the parts that take too long 
-- then I'll take another look.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] f.readlines(size)

2017-06-06 Thread Cameron Simpson

On 05Jun2017 21:04, Nancy Pham-Nguyen  wrote:
I'm trying to understand the optional size argument in file.readlines method. 
The help(file) shows:
 |  readlines(...) |      readlines([size]) -> list of strings, each a line 
from the file. |       |      Call readline() repeatedly and return a list of 
the lines so read. |      The optional size argument, if given, is an 
approximate bound on the |      total number of bytes in the lines returned.

From the documentation:f.readlines() returns a list containing all the lines of 
data in the file.
If given an optional parameter sizehint, it reads that many bytes from the 
file

and enough more to complete a line, and returns the lines from that.
This is often used to allow efficient reading of a large file by lines,
but without having to load the entire file in memory. Only complete lines
will be returned.
I wrote the function below to try it, thinking that it would print multiple 
times, 3 lines at a time, but it printed all in one shot, just like when I 
din't specify the optional argument. Could someone explain what I've missed? 
See input file and output below.


I'm using this to test:

 from __future__ import print_function
 import sys
 lines = sys.stdin.readlines(1023)
 print(len(lines))
 print(sum(len(_) for _ in lines))
 print(repr(lines))

I've fed it a 41760 byte input (the size isn't important except that it needs 
to be "big enough"). The output starts like this:


 270
 8243

and then the line listing. That 8243 looks interesting, being close to 8192, a 
power of 2. The documentation you quote says:


 The optional size argument, if given, is an approximate bound on the total 
 number of bytes in the lines returned. [...] it reads that many bytes from 
 the file and enough more to complete a line, and returns the lines from that.


It looks to me like readlines uses the sizehint somewhat liberally; the purpose 
as described in the doco is to read input efficiently without using an 
unbounded amount of memory. Imagine feeding readlines() a terabyte input file, 
without the sizehint. It would try to pull it all into memory. With the 
sizehint you get a simple form of batching of the input into smallish groups of 
lines.


I would say, from my experiments here, that the underlying I/O is doing 8192 
byte reads from the file as the default buffer. So although I've asked for 1023 
bytes, readlines says something like: I want at least 1023 bytes; the I/O 
system loads 8192 bytes because that is its normal read size, then readlines 
picks up all the buffer. It does this so as to gather as many lines as readily 
available. It then asks for more data to complete the last line. The last line 
of my readlines() result is:


 %.class: %.java %.class-prereqs : $(("%.class-prereqs" G?which is 68 bytes long including the newline character. 8192 + 68 = 8260, just 
over the 8243 bytes of "complete lines" I got back.


So this sizehint is just a clue, and does not change the behaviour of the 
underlying I/O. It just prevents readlines() reading the entire file.


If you want tighter control, may I suggest iterating over the file like this:

 for line in sys.stdin:
   ... do stuff with the line ...

This also does not change the underlying I/O buffer size, but it does let you 
gather only the lines you want: you can count line lengths or numbers or 
whatever criteria you find useful if you want to stop be fore the end of the 
file.


Cheers,
Cameron Simpson 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] best Python Web-Development Applications framework.

2017-06-06 Thread Sebastian Silva
Hi,

It's been almost a month since you asked but I have a novel suggestion
for you to try: /Jappy/.

Because of the nature of the web and existing browsers, you'll find that
Python is applied predominantly server-side.

While back-end programming can be interesting, I have recently become
aware of how fun it is to program the browser with Python-like syntax [1].

I liked this approach so much that I wrote Jappy, a development
environment for learning Python. This IDE can run locally without an
Internet connection or from a static web host.

Here's the sources with instructions: https://github.com/somosazucar/artisan

And here's the IDE for demonstration to try in any browser:
http://people.sugarlabs.org/~icarito/artisan/Jappy.activity/


Here's a brief list of implemented functionality:

  * Python 3 syntax and comparable performance
  * Tabbed Code editor with syntax highlighting and Solarized color scheme
  * Supports multiple files using Python's import syntax
  * Six examples demonstrating language and API features
  * Unicode support. Emojis you can use directly in your code :-)
  * Runs on Webkit2 / Chrome / Firefox browser engines (IE not tested)
  * Gives access to HTML5, CSS3 and Javascript
  * Saves session in Sugar or Sugarizer Journal if available
  * Export to .zip (compiled JS code + source)
  * Import from .zip or as individual files
  * Jappy library offers browser friendly print, inputAsync, clearScreen
statements
  * Jappy itself is written in Python / RapydScript
  * Experimental standalone Android build and .XO bundle

Tutor@ list is invited to please try it out and report feedback.

I have started work to prepare a course to introduce Python to kids and
have written this IDE for this purpose.

However I enjoyed writing it and have plans for adding interesting
functionality, as time allows.

Happy learning!

Sebastian

[1] Python-like: RapydScript NG provides Python-3 to Javascript
transcompilation directly in the browser.


On 07/05/17 10:23, Jojo Mwebaze wrote:
> Dear All,
>
> I am trying to figure out the best Python Web-Development Applications
> framework. I trying to get started with building Web-Based Applications.
> Kindly give advise which one is the best for a novice  user
>
> Cheers,
> Johnson
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] f.readlines(size)

2017-06-06 Thread Nancy Pham-Nguyen via Tutor
Resend with my member's email address.



Hi,
I'm trying to understand the optional size argument in file.readlines method. 
The help(file) shows:
 |  readlines(...) |      readlines([size]) -> list of strings, each a line 
from the file. |       |      Call readline() repeatedly and return a list of 
the lines so read. |      The optional size argument, if given, is an 
approximate bound on the |      total number of bytes in the lines returned.
From the documentation:f.readlines() returns a list containing all the lines of 
data in the file. 
If given an optional parameter sizehint, it reads that many bytes from the file 
and enough more to complete a line, and returns the lines from that. 
This is often used to allow efficient reading of a large file by lines, 
but without having to load the entire file in memory. Only complete lines 
will be returned.
I wrote the function below to try it, thinking that it would print multiple 
times, 3 lines at a time, but it printed all in one shot, just like when I 
din't specify the optional argument. Could someone explain what I've missed? 
See input file and output below.
Thanks,Nancy
  def readLinesWithSize():      # bufsize = 65536
      bufsize = 45      with open('input.txt') as f:         while True:        
     # print len(f.readlines(bufsize))   # this will print 33             print 
            lines = f.readlines(bufsize)             print lines             if 
not lines:                 break             for line in lines:                 
pass      readLinesWithSize()
Output:
['1CSCO,100,18.04\n', '2ANTM,200,45.03\n', '3CSCO,150,19.05\n', 
'4MSFT,250,80.56\n', '5IBM,500,22.01\n', '6ANTM,250,44.23\n', 
'7GOOG,200,501.45\n', '8CSCO,175,19.56\n', '9MSFT,75,80.81\n', 
'10GOOG,300,502.65\n', '11IBM,150,25.01\n', '12CSCO1,100,18.04\n', 
'13ANTM1,200,45.03\n', '14CSCO1,150,19.05\n', '15MSFT1,250,80.56\n', 
'16IBM1,500,22.01\n', '17ANTM1,250,44.23\n', '18GOOG1,200,501.45\n', 
'19CSCO1,175,19.56\n', '20MSFT1,75,80.81\n', '21GOOG1,300,502.65\n', 
'22IBM1,150,25.01\n', '23CSCO2,100,18.04\n', '24ANTM2,200,45.03\n', 
'25CSCO2,150,19.05\n', '26MSFT2,250,80.56\n', '27IBM2,500,22.01\n', 
'28ANTM2,250,44.23\n', '29GOOG2,200,501.45\n', '30CSCO2,175,19.56\n', 
'31MSFT2,75,80.81\n', '32GOOG2,300,502.65\n', '33IBM2,150,25.01\n']

[]
The input file contains 33 lines of text, 15 or 16 letter each (15 - 16 
bytes):1CSCO,100,18.042ANTM,200,45.033CSCO,150,19.054MSFT,250,80.565IBM,500,22.016ANTM,250,44.237GOOG,200,501.458CSCO,175,19.569MSFT,75,80.8110GOOG,300,502.6511IBM,150,25.0112CSCO1,100,18.0413ANTM1,200,45.0314CSCO1,150,19.0515MSFT1,250,80.5616IBM1,500,22.0117ANTM1,250,44.2318GOOG1,200,501.4519CSCO1,175,19.5620MSFT1,75,80.8121GOOG1,300,502.6522IBM1,150,25.0123CSCO2,100,18.0424ANTM2,200,45.0325CSCO2,150,19.0526MSFT2,250,80.5627IBM2,500,22.0128ANTM2,250,44.2329GOOG2,200,501.4530CSCO2,175,19.5631MSFT2,75,80.8132GOOG2,300,502.6533IBM2,150,25.0

   
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] best Python Web-Development Applications framework.

2017-06-06 Thread Margie Roswell
I like gae-init, which is Flask, + Google Cloud + bootstrap + fontawesome,
etc.

On Sun, May 7, 2017, 11:27 AM Jojo Mwebaze  wrote:

> Dear All,
>
> I am trying to figure out the best Python Web-Development Applications
> framework. I trying to get started with building Web-Based Applications.
> Kindly give advise which one is the best for a novice  user
>
> Cheers,
> Johnson
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor