* Antonio de la Fuente <t...@muybien.org> [2009-11-17 16:58:08 +0000]:
> Date: Tue, 17 Nov 2009 16:58:08 +0000 > From: Antonio de la Fuente <t...@muybien.org> > To: Python Tutor mailing list <tutor@python.org> > Subject: [Tutor] Introduction - log exercise > Organization: (muybien.org) > User-Agent: Mutt/1.5.20 (2009-06-14) > Message-ID: <20091117165639.ga3...@cateto> > > Hi everybody, > > This is my first post here. I have started learning python and I am new to > programing, just some bash scripting, no much. > Thank you for the kind support and help that you provide in this list. > > This is my problem: I've got a log file that is filling up very quickly, this > log file is made of blocks separated by a blank line, inside these blocks > there > is a line "foo", I want to discard blocks with that line inside it, and > create a > new log file, without those blocks, that will reduce drastically the size of > the > log file. > > The log file is gziped, so I am going to use gzip module, and I am going to > pass > the log file as an argument, so sys module is required as well. > > I will read lines from file, with the 'for loop', and then I will check them > for > 'foo' matches with a 'while loop', if matches I (somehow) re-initialise the > list, and if there is no matches for foo, I will append line to the list. > When I > get to a blank line (end of block), write myList to an external file. And > start > with another line. > > I am stuck with defining 'blank line', I don't manage to get throught the > while > loop, any hint here I will really appreciate it. > I don't expect the solution, as I think this is a great exercise to get wet > with python, but if anyone thinks that this is the wrong way of solving the > problem, please let me know. > > > #!/usr/bin/python > > import sys > import gzip > > myList = [] > > # At the moment not bother with argument part as I am testing it with a > # testing log file > #fileIn = gzip.open(sys.argv[1]) > > fileIn = gzip.open('big_log_file.gz', 'r') > fileOut = open('outputFile', 'a') > > for line in fileIn: > while line != 'blank_line': > if line == 'foo': > Somehow re-initialise myList > break > else: > myList.append(line) > fileOut.writelines(myList) > > > Somehow rename outputFile with big_log_file.gz > > fileIn.close() > fileOut.close() > > ------------------------------------------------------------- > > The log file will be fill with: > > > Tue Nov 17 16:11:47 GMT 2009 > bladi bladi bla > tarila ri la > patatin pataton > tatati tatata > > Tue Nov 17 16:12:58 GMT 2009 > bladi bladi bla > tarila ri la > patatin pataton > foo > tatati tatata > > Tue Nov 17 16:13:42 GMT 2009 > bladi bladi bla > tarila ri la > patatin pataton > tatati tatata > > > etc, etc ,etc > .............................................................. > > Again, thank you. > This is how, with your help, finally, I wrote the script. The way I compress the file at the end of the script, opening files again, didn't feel right, but couldn't make it work other way, and I was very tired at the time. First test, seems to work, but I will do a more deep test tomorrow. Thank you all. #!/usr/bin/python # Importing modules that I'm going to need import sys import gzip import os # Initialising paragraph list paragraph = [] # Flag to signal which paragraphs to keep and which one to discard. keep = True # Getting file name, without extension, from parameter pass to script, # to rename file. renameFile, ignored = os.path.splitext(sys.argv[1]) # Opening argument file. fileIn = gzip.open(sys.argv[1]) fileOut = open('outputFile', 'a') # Only one argument pass to script, gzip file. if len(sys.argv) != 2: print 'Usage: log_exercise01.py <gzip logfile>' sys.exit(1) # Main loop for line in fileIn: # If a blank line in log file if line.isspace(): # I append a blank line to list to keep the format paragraph.append('\n') # If true append line to file, keeping formating with the # "".join trick if keep: fileOut.write( "".join(paragraph) ) # Re-initialise list paragraph = [] keep = True # Else append line to paragraph list and if stripping the line from # the initial tab is 'foo' then set flag keep to false, to discard # paragraph. else: paragraph.append(line) if line.strip() == 'foo': keep = False # Compressing file that has been created f_in = open('outputFile', 'r') f_out = gzip.open(sys.argv[1], 'w') f_out.writelines(f_in) # Renaming the file with the same name that parameter passed and # deleting the file created by the script. os.rename('outputFile', renameFile) os.remove(renameFile) f_in.close() f_out.close() fileIn.close() fileOut.close() -- ----------------------------- Antonio de la Fuente Martínez E-mail: t...@muybien.org ----------------------------- En una organización jerárquica, cuanto más alto es el nivel, mayor es la confusión. -- Ley de Dow. _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor