Antonio de la Fuente wrote:
Hi everybody,

This is my first post here. I have started learning python and I am new to
programing, just some bash scripting, no much. Thank you for the kind support and help that you provide in this list.

This is my problem: I've got a log file that is filling up very quickly, this
log file is made of blocks separated by a blank line, inside these blocks there
is a line "foo", I want to discard blocks with that line inside it, and create a
new log file, without those blocks, that will reduce drastically the size of the
log file.
The log file is gziped, so I am going to use gzip module, and I am going to pass
the log file as an argument, so sys module is required as well.

I will read lines from file, with the 'for loop', and then I will check them for
'foo' matches with a 'while loop', if matches I (somehow) re-initialise the
list, and if there is no matches for foo, I will append line to the list. When I
get to a blank line (end of block), write myList to an external file. And start
with another line.

I am stuck with defining 'blank line', I don't manage to get throught the while
loop, any hint here I will really appreciate it.
I don't expect the solution, as I think this is a great exercise to get wet
with python, but if anyone thinks that this is the wrong way of solving the
problem, please let me know.


#!/usr/bin/python

import sys
import gzip

myList =]

# At the moment not bother with argument part as I am testing it with a
# testing log file
#fileIn =zip.open(sys.argv[1])

fileIn =zip.open('big_log_file.gz', 'r')
fileOut =pen('outputFile', 'a')

for line in fileIn:
    while line !=blank_line':
        if line ='foo':
            Somehow re-initialise myList
            break
        else:
            myList.append(line)
    fileOut.writelines(myList)


Somehow rename outputFile with big_log_file.gz

fileIn.close()
fileOut.close()

-------------------------------------------------------------

The log file will be fill with:


Tue Nov 17 16:11:47 GMT 2009
        bladi bladi bla
        tarila ri la
        patatin pataton
        tatati tatata

Tue Nov 17 16:12:58 GMT 2009
        bladi bladi bla
        tarila ri la
        patatin pataton
        foo
        tatati tatata

Tue Nov 17 16:13:42 GMT 2009
        bladi bladi bla
        tarila ri la
        patatin pataton
        tatati tatata


etc, etc ,etc
..............................................................

Again, thank you.

You've got some good ideas, and I'm going to give you hints, rather than just writing it for you, as you suggested.

First, let me point out that there are advanced features in Python that could make a simple program that'd be very hard for a beginner to understand. I'll give you the words, but recommend that you not try it at this time. If you were to wrap the file in a generator that returned you a "paragraph" at a time, the same way as it's now returning a line at a time, then the loop would be simply a for-loop on that generator, checking each paragraph for whether it contained "foo" and if so, writing it to the output.


But you can also do it without using advanced features, and that's what I'm going to try to outline.

Two things you'll be testing each line for:  is it blank, and is it "foo".
if line.isspace() will test if a line is whitespace only, as Wayne pointed out. if line == "foo" will test if a line has exactly "foo" in it. But if you apparently have leading whitespace, and
trailing newlines, and if they're irrelevant, then you might want
  if line.strip() == "foo"

I would start by just testing for blank lines. Try replacing all blank lines with "***** blank line ****" and print each line. See whether the output makes sense. if it does, go on to the next step.
  for line in ....
        if line-is-blank
              line-is-fancy-replacement
       print line

Now, instead of just printing the line, add it to a list object. Create an object called paragraph(rather than a file) as an empty list object, before the for loop. Inside the for loop, if the line is non-empty, add it to the paragraph. If the line is empty, then print the paragraph (with something before and after it, so you can see what came from each print stmt). Then blank it (outlist = []).
Check whether this result looks good, and if so, continue on.

Next version of the code: whenever you have a non-blank line, in addition to adding it to the list, also check it for whether it's equal-foo. If so, set a flag. When printing the outlist, skip the printing if the flag is set. Remember that you'll have to clear this flag each time you blank
the mylist, both before the loop, and in the middle of the loop.

Once this makes sense, you can worry about actually writing the output to a real file, maybe compressing it, maybe doing deletes and renames as appropriate. You probably don't need shutil module, os module probably has enough functions for this.

At any of these stages, if you get stuck, call for help. But your code will be only as complex as that stage needs, so we can find one bug at a time.

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to