[Tutor] Simple Stats on Apache Logs

2010-02-11 Thread Lao Mao
Hi, I have 3 servers which generate about 2G of webserver logfiles in a day. These are available on my machine over NFS. I would like to draw up some stats which shows, for a given keyword, how many times it appears in the logs, per hour, over the previous week. So the behavior might be: $ ./we

Re: [Tutor] Simple Stats on Apache Logs

2010-02-11 Thread Lao Mao
Hi Christian, grep -c > or if you are looking for only stuff for today for eg then > grep | grep -c > I don't see how that will produce figures per hour! That would be the simplest implementation. For a python implementation > think about dictionaries with multiple layers like {Date: {Keyw

[Tutor] Downloading S3 Logs

2010-02-11 Thread Lao Mao
Hello, I've written the below to get the previous day's logs from an Amazon S3 bucket. #!/usr/bin/python import time from datetime import datetime import boto daily_s3_log = open("/tmp/s3logs", "w+") now = datetime.now() connection = boto.connect_s3() bucket = connection.get_bucket("downloads.se

[Tutor] Replacing part of a URL

2010-02-20 Thread Lao Mao
Hello, I need to be able to replace the last bit of a bunch of URLs. The urls look like this: www.somesite.com/some/path/to/something.html They may be of varying lengths, but they'll always end with .something_or_other.html I want to take the "something" and replace it with something else. My

[Tutor] Extracting comments from a file

2010-02-21 Thread Lao Mao
Hi, I have an html file, with xml style comments in: I'd like to extract only the comments. My sense of smell suggests that there's probably a library (maybe an xml library) that does this already. Otherwise, my current alogorithm looks a bit like this: * Iterate over file * If current line c