Nancy Pham-Nguyen wrote: > Hi,
Hi Nancy, the only justification for the readlines() method is to serve as a trap to trick newbies into writing scripts that consume more memory than necessary. While the size argument offers a way around that, there are still next to no use cases for readlines. Iterating over a file directly is a very common operation and a lot of work to make it efficient was spent on it. Use it whenever possible. To read groups of lines consider # last chunk may be shorter with open(FILENAME) as f: while True: chunk = list(itertools.islice(f, 3)) if not chunk: break process_lines(chunk) or # last chunk may be filled with None values with open(FILENAME) as f: for chunk in itertools.zip_longest(f, f, f): # Py2: izip_longest process_lines(chunk) In both cases you will get chunks of three lines, the only difference being the handling of the last chunk. > I'm trying to understand the optional size argument in file.readlines > method. The help(file) shows: | readlines(...) | readlines([size]) > -> list of strings, each a line from the file. | | Call > readline() repeatedly and return a list of the lines so read. | The > optional size argument, if given, is an approximate bound on the | > total number of bytes in the lines returned. From the > documentation:f.readlines() returns a list containing all the lines of > data in the file. If given an optional parameter sizehint, it reads that > many bytes from the file and enough more to complete a line, and returns > the lines from that. This is often used to allow efficient reading of a > large file by lines, but without having to load the entire file in memory. > Only complete lines will be returned. I wrote the function below to try > it, thinking that it would print multiple times, 3 lines at a time, but it > printed all in one shot, just like when I din't specify the optional > argument. Could someone explain what I've missed? See input file and > output below. Thanks,Nancy > def readLinesWithSize(): > # bufsize = 65536 > bufsize = 45 > with open('input.txt') as f: while True: > # print len(f.readlines(bufsize)) # this will print 33 > print > lines = f.readlines(bufsize) print lines > if not lines: break for line in lines: > pass readLinesWithSize() Output: This seems to be messed up a little by a "helpful" email client. Therefore I'll give my own: $ cat readlines_demo.py LINESIZE=32 with open("tmp.txt", "w") as f: for i in range(30): f.write("{:02} {}\n".format(i, "x"*(LINESIZE-4))) BUFSIZE = LINESIZE*3-1 print("bufsize", BUFSIZE) with open("tmp.txt", "r") as f: while True: chunk = f.readlines(BUFSIZE) if not chunk: break print(sum(map(len, chunk)), "bytes:", chunk) $ python3 readlines_demo.py bufsize 95 96 bytes: ['00 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '01 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '02 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'] 96 bytes: ['03 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '04 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '05 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'] 96 bytes: ['06 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '07 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '08 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n'] ... So in Python 3 this does what you expect, readlines() stops collecting more lines once the total number of bytes exceeds those specified. """ readlines(...) method of _io.TextIOWrapper instance Return a list of lines from the stream. hint can be specified to control the number of lines read: no more lines will be read if the total size (in bytes/characters) of all lines so far exceeds hint. """ In Python 2 the docstring is a little vague """ The optional size argument, if given, is an *approximate* *bound* on the total number of bytes in the lines returned. """ (emphasis mine) and it seems that small size values which defeat the goal of making the operation efficient are ignored: $ python readlines_demo.py ('bufsize', 95) (960, 'bytes:', ['00 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '01 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '28 xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n', '29 ... xxxxxxxxxxxxxxxxxxxxxxxxxxxx\n']) Playing around a bit on my system the minimum value with an effect seems to be about 2**13, but I haven't consulted the readlines source code to verify. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor