* bob gailer <bgai...@gmail.com> [2009-11-17 15:26:20 -0500]: > Date: Tue, 17 Nov 2009 15:26:20 -0500 > From: bob gailer <bgai...@gmail.com> > To: Antonio de la Fuente <t...@muybien.org> > CC: Python Tutor mailing list <tutor@python.org> > Subject: Re: [Tutor] Introduction - log exercise > User-Agent: Thunderbird 2.0.0.23 (Windows/20090812) > Message-ID: <4b0306ec.8000...@gmail.com> > > Antonio de la Fuente wrote: > >Hi everybody, > > > >This is my first post here. I have started learning python and I am new to > >programing, just some bash scripting, no much. Thank you for the > >kind support and help that you provide in this list. > > > >This is my problem: I've got a log file that is filling up very quickly, this > >log file is made of blocks separated by a blank line, inside these blocks > >there > >is a line "foo", I want to discard blocks with that line inside it, and > >create a > >new log file, without those blocks, that will reduce drastically the size of > >the > >log file. > > > >The log file is gziped, so I am going to use gzip module, and I am going to > >pass > >the log file as an argument, so sys module is required as well. > > > >I will read lines from file, with the 'for loop', and then I will check them > >for > >'foo' matches with a 'while loop', if matches I (somehow) re-initialise the > >list, and if there is no matches for foo, I will append line to the list. > >When I > >get to a blank line (end of block), write myList to an external file. And > >start > >with another line. > > > >I am stuck with defining 'blank line', I don't manage to get throught the > >while > >loop, any hint here I will really appreciate it. > >I don't expect the solution, as I think this is a great exercise to get wet > >with python, but if anyone thinks that this is the wrong way of solving the > >problem, please let me know. > > > > > >#!/usr/bin/python > > > >import sys > >import gzip > > > >myList = [] > > > ># At the moment not bother with argument part as I am testing it with a > ># testing log file > >#fileIn = gzip.open(sys.argv[1]) > > > >fileIn = gzip.open('big_log_file.gz', 'r') > >fileOut = open('outputFile', 'a') > > > >for line in fileIn: > > while line != 'blank_line': > > if line == 'foo': > > Somehow re-initialise myList > > break > > else: > > myList.append(line) > > fileOut.writelines(myList) > Observations: > 0 - The other responses did not understand your desire to drop any > paragraph containing 'foo'.
Yes, paragraph == block, that's it > 1 - The while loop will run forever, as it keeps processing the same line. Because the tabs in the line with foo?! > 2 - In your sample log file the line with 'foo' starts with a tab. > line == 'foo' will always be false. So I need first to get rid of those tabs, right? I can do that with line.strip(), but then I need the same formatting for the fileOut. > 3 - Is the first line in the file Tue Nov 17 16:11:47 GMT 2009 or blank? First line is Tue Nov 17 16:11:47 GMT 2009 > 4 - Is the last line blank? last line is blank. > > Better logic: > I would have never thought this way of solving the problem. Interesting. > # open files > paragraph = [] > keep = True > for line in fileIn: > if line.isspace(): # end of paragraph Aha! finding the blank line > if keep: > outFile.writelines(paragraph) > paragraph = [] This is what I called re-initialising the list. > keep = True > else: > if keep: > if line == '\tfoo': > keep = False > else: > paragraph.append(line) > # anticipating last line not blank, write last paragraph > if keep: > outFile.writelines(paragraph) > > # use shutil to rename > Thank you. > > -- > Bob Gailer > Chapel Hill NC > 919-636-4239 -- ----------------------------- Antonio de la Fuente MartÃnez E-mail: t...@muybien.org ----------------------------- The problem with people who have no vices is that generally you can be pretty sure they're going to have some pretty annoying virtues. -- Elizabeth Taylor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor