from:"Stephen Nelson\-Smith"

[Tutor] Making Regular Expressions readable

2010-03-08 Thread Stephen Nelson-Smith

Hi,

I've written this today:

#!/usr/bin/env python
import re

pattern = 
r'(?P^(-|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(,
[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})*){1})
(?P(\S*)) (?P(\S*))
(?P(\[[^\]]+\]))
(?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)
(?P(\S*)) (?P(\S*))
(?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)
(?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)(
)?(?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)'

regex = re.compile(pattern)

lines = 0
no_cookies = 0

for line in open('/home/stephen/scratch/feb-100.txt'):
  lines +=1
  line = line.strip()
  match = regex.match(line)

  if match:
data = match.groupdict()
if data['SiteIntelligenceCookie'] == '':
  no_cookies +=1
  else:
print "Couldn't match ", line

print "I analysed %s lines." % (lines,)
print "There were %s lines with missing Site Intelligence cookies." %
(no_cookies,)

It works fine, but it looks pretty unreadable and unmaintainable to
anyone who hasn't spent all day writing regular expressions.

I remember reading about verbose regular expressions.  Would these help?

How could I make the above more maintainable?

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Not Storing State

2009-02-27 Thread Stephen Nelson-Smith

Hi,

This is both a general question and a specific one.

I want to iterate over a bunch of lines; If any line contains a
certain string, I want to do something, otherwise do something else.
I can store state  - eg line 1 - did it contain the string? no.. ok
we're cool, next line

But, I'd like to avoid keeping state.

How can I do this?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] CSS Minification

2009-11-02 Thread Stephen Nelson-Smith

Is there a Python CSS and/or javascript minifier available?

I've got to convert some ant scripts to python, and ant has a minifier
plugin that I need to replicate.

Maybe Beautiful Soup can do this?

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Logfile Manipulation

2009-11-08 Thread Stephen Nelson-Smith

I've got a large amount of data in the form of 3 apache and 3 varnish
logfiles from 3 different machines.  They are rotated at 0400.  The
logfiles are pretty big - maybe 6G per server, uncompressed.

I've got to produce a combined logfile for -2359 for a given day,
with a bit of filtering (removing lines based on text match, bit of
substitution).

I've inherited a nasty shell script that does this but it is very slow
and not clean to read or understand.

I'd like to reimplement this in python.

Initial questions:

* How does Python compare in performance to shell, awk etc in a big
pipeline?  The shell script kills the CPU
* What's the best way to extract the data for a given time, eg  -
2359 yesterday?

Any advice or experiences?

S.
-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld  wrote:

> I'm not familiar with Apache log files so I'll let somebody else answer,
> but I suspect you can either use string.split() or a re.findall(). You might
> even be able to use csv. Or if they are in XML you could use ElementTree.
> It all depends on the data!

An apache logfile entry looks like this:

89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"

I want to extract 24 hrs of data based timestamps like this:

[04/Nov/2009:04:02:10 +]

I also need to do some filtering (eg I actually don't want anything
with service.php), and I also have to do some substitutions - that's
trivial other than not knowing the optimum place to do it?  IE should
I do multiple passes?  Or should I try to do all the work at once,
only viewing each line once?  Also what about reading from compressed
files?  The data comes in as 6 gzipped logfiles which expand to 6G in
total.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

Sorry - forgot to include the list.

On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith  wrote:
> On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD  wrote:
>>
>>> An apache logfile entry looks like this:
>>>
>>>89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET
>>> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
>>> HTTP/1.1" 200 50 "-" "-"
>>>
>>>I want to extract 24 hrs of data based timestamps like this:
>>>
>>> [04/Nov/2009:04:02:10 +]
>>
>> OK It looks like you could use a regex to extract the first
>> thing you find between square brackets. Then convert that to a time.
>
> I'm currently thinking I can just use a string comparison after the
> first entry for the day - that saves date arithmetic.
>
>> I'd opt for doing it all in one pass. With such large files you really
>> want to minimise the amount of time spent reading the file.
>> Plus with such large files you will need/want to process them
>> line by line anyway rather than reading the whole thing into memory.
>
> How do I handle concurrency?  I have 6 log files which I need to turn
> into one time-sequenced log.
>
> I guess I need to switch between each log depending on whether the
> next entry is the next chronological entry between all six.  Then on a
> per line basis I can also reject it if it matches the stuff I want to
> throw out, and substitute it if I need to, then write out to the new
> file.
>
> S.
>



-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

Hi,

>> Any advice or experiences?
>>
>
> go here and download the pdf!
> http://www.dabeaz.com/generators-uk/
> Someone posted this the other day, and I went and read through it and played
> around a bit and it's exactly what you're looking for - plus it has one vs.
> slide of python vs. awk.
> I think you'll find the pdf highly useful and right on.

Looks like generators are a really good fit.  My biggest question
really is how to multiplex.

I have 6 logs per day, so I don't know how which one will have the
next consecutive entry.

I love teh idea of making a big dictionary, but with 6G of data,
that's going to run me out of memory, isn't it

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

Hi,

> If you create iterators from the files that yield (timestamp, entry)
> pairs, you can merge the iterators using one of these recipes:
> http://code.activestate.com/recipes/491285/
> http://code.activestate.com/recipes/535160/

Could you show me how I might do that?

So far I'm at the stage of being able to produce loglines:

#! /usr/bin/env python
import gzip
class LogFile:
  def __init__(self, filename, date):
   self.f=gzip.open(filename,"r")
   for logline in self.f:
 self.line=logline
 self.stamp=" ".join(self.line.split()[3:5])
 if self.stamp.startswith(date):
   break

  def getline(self):
ret=self.line
self.line=self.f.readline()
self.stamp=" ".join(self.line.split()[3:5])
return ret

logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")]
while True:
  print [x.stamp for x in logs]
  nextline=min((x.stamp,x) for x in logs)
  print nextline[1].getline()


-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

And the problem I have with the below is that I've discovered that the
input logfiles aren't strictly ordered - ie there is variance by a
second or so in some of the entries.

I can sort the biggest logfile (800M) using unix sort in about 1.5
mins on my workstation.  That's not really fast enough, with
potentially 12 other files

Hrm...

S.

On Mon, Nov 9, 2009 at 1:35 PM, Stephen Nelson-Smith  wrote:
> Hi,
>
>> If you create iterators from the files that yield (timestamp, entry)
>> pairs, you can merge the iterators using one of these recipes:
>> http://code.activestate.com/recipes/491285/
>> http://code.activestate.com/recipes/535160/
>
> Could you show me how I might do that?
>
> So far I'm at the stage of being able to produce loglines:
>
> #! /usr/bin/env python
> import gzip
> class LogFile:
>  def __init__(self, filename, date):
>   self.f=gzip.open(filename,"r")
>   for logline in self.f:
>     self.line=logline
>     self.stamp=" ".join(self.line.split()[3:5])
>     if self.stamp.startswith(date):
>       break
>
>  def getline(self):
>    ret=self.line
>    self.line=self.f.readline()
>    self.stamp=" ".join(self.line.split()[3:5])
>    return ret
>
> logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")]
> while True:
>  print [x.stamp for x in logs]
>  nextline=min((x.stamp,x) for x in logs)
>  print nextline[1].getline()
>
>
> --
> Stephen Nelson-Smith
> Technical Director
> Atalanta Systems Ltd
> www.atalanta-systems.com
>



-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile Manipulation

2009-11-09 Thread Stephen Nelson-Smith

On Mon, Nov 9, 2009 at 3:15 PM, Wayne Werner  wrote:
> On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith 
> wrote:
>>
>> And the problem I have with the below is that I've discovered that the
>> input logfiles aren't strictly ordered - ie there is variance by a
>> second or so in some of the entries.
>
> Within a given set of 10 lines, is the first line and last line "in order" -

On average, in a sequence of 10 log lines, one will be out by one or
two seconds.

Here's a random slice:

05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:36
05/Nov/2009:01:41:37
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:37
05/Nov/2009:01:41:38
05/Nov/2009:01:41:36
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:38
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:39
05/Nov/2009:01:41:40
05/Nov/2009:01:41:40
05/Nov/2009:01:41:41
> I don't know
> what the default python sorting algorithm is on a list, but AFAIK you'd be
> looking at a constant O(log 10)

I'm not a mathematician - what does this mean, in layperson's terms?

> log_generator = (d for d in logdata)
> mylist = # first ten values

OK

> while True:
>     try:
>         mylist.sort()

OK - sort the first 10 values.

>         nextdata = mylist.pop(0)

So the first value...

>         mylist.append(log_generator.next())

Right, this will add another one value?

>     except StopIteration:
>         print 'done'

> Or now that I look, python has a priority queue (
> http://docs.python.org/library/heapq.html ) that you could use instead. Just
> push the next value into the queue and pop one out - you give it some
> initial qty - 10 or so, and then it will always give you the smallest value.

That sounds very cool - and I see that one of the activestate recipes
Kent suggested uses heapq too.  I'll have a play.

S.
-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

I have the following idea for multiplexing logfiles (ultimately into heapq):

import gzip

class LogFile:
def __init__(self, filename, date):
self.logfile = gzip.open(filename, 'r')
for logline in self.logfile:
self.line = logline
self.stamp = self.timestamp(self.line)
if self.stamp.startswith(date):
break

def timestamp(self, line):
return " ".join(self.line.split()[3:5])

def getline(self):
nextline = self.line
self.line = self.logfile.readline()
self.stamp = self.timestamp(self.line)
return nextline

The idea is that I can then do:

logs = [("log1", "[Nov/05/2009"), ("log2", "[Nov/05/2009"), ("log3",
"[Nov/05/2009"), ("log4", "[Nov/05/2009")]

I've tested it with one log (15M compressed, 211M uncompressed), and
it takes about 20 seconds to be ready to roll.

However, then I get unexpected behaviour:

~/system/tools/magpie $ python
Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import magpie
>>>magpie.l

>>> magpie.l.stamp
'[05/Nov/2009:04:02:07 +]'
>>> magpie.l.getline()
89.151.119.195 - - [05/Nov/2009:04:02:07 +] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"

'89.151.119.195 - - [05/Nov/2009:04:02:07 +] "GET
/service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812
HTTP/1.1" 200 50 "-" "-"\n'
>>> magpie.l.stamp
''
>>> magpie.l.getline()

''
>>>

I expected to be able to call getline() and get more lines...

a) What have I done wrong?
b) Is this an ok implementation?  What improvements could be made?
c) Is 20secs a reasonable time, or am I choosing a slow way to do this?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

Hi Kent,

> One error is that the initial line will be the same as the first
> response from getline(). So you should call getline() before trying to
> access a line. Also you may need to filter all lines - what if there
> is jitter at midnight, or the log rolls over before the end.

Well ultimately I definitely have to filter two logfiles per day, as
logs rotate at 0400.  Or do you mean something else?

> More important, though, you are pretty much writing your own iterator
> without using the iterator protocol. I would write this as:
>
> class LogFile:
>   def __init__(self, filename, date):
>       self.logfile = gzip.open(filename, 'r')
>       self.date = date
>
>   def __iter__(self)
>       for logline in self.logfile:
>           stamp = self.timestamp(logline)
>           if stamp.startswith(date):
>               yield (stamp, logline)
>
>   def timestamp(self, line):
>       return " ".join(self.line.split()[3:5])

Right - I think I understand that.

>From here I get:

import gzip

class LogFile:
def __init__(self, filename, date):
self.logfile = gzip.open(filename, 'r')
self.date = date

def __iter__(self):
for logline in self.logfile:
stamp = self.timestamp(logline)
if stamp.startswith(date):
yield (stamp, logline)

def timestamp(self, line):
return " ".join(self.line.split()[3:5])

l = LogFile("/home/stephen/access_log-20091105.gz", "[04/Nov/2009")

I get:

Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import kent
>>> kent.l

>>> dir(kent.l)
['__doc__', '__init__', '__iter__', '__module__', 'date', 'logfile',
'timestamp']
>>> for line in kent.l:
...   print line
...
Traceback (most recent call last):
  File "", line 1, in ?
  File "kent.py", line 10, in __iter__
stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'
>>> for stamp,line in kent.l:
...   print stamp,line
...
Traceback (most recent call last):
  File "", line 1, in ?
  File "kent.py", line 10, in __iter__
stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'
>>> for stamp,logline in kent.l:
...   print stamp,logline
...
Traceback (most recent call last):
  File "", line 1, in ?
  File "kent.py", line 10, in __iter__
stamp = self.timestamp(logline)
  File "kent.py", line 15, in timestamp
return " ".join(self.line.split()[3:5])
AttributeError: LogFile instance has no attribute 'line'


> You are reading through the entire file on load because your timestamp
> check is failing. You are filtering out the whole file and returning
> just the last line. Check the dates you are supplying vs the actual
> data - they don't match.

Yes, I found that out in the end!  Thanks!

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

Hi,

> probably that line should have been " ".join(line.split()[3:5]), i.e.
> no self. The line variable is a supplied argument.

Now I get:


Python 2.4.3 (#1, Jan 21 2009, 01:11:33)
[GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import kent
>>> kent.l

>>> for a, b in kent.l
  File "", line 1
for a, b in kent.l
 ^
SyntaxError: invalid syntax
>>> for a, b in kent.l:
...   print a, b
...
Traceback (most recent call last):
  File "", line 1, in ?
  File "kent.py", line 11, in __iter__
if stamp.startswith(date):
NameError: global name 'date' is not defined

How does __iter__ know about date?  Should that be self.date?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

Hello,

On Tue, Nov 10, 2009 at 2:00 PM, Luke Paireepinart
 wrote:
>
>> Traceback (most recent call last):
>>  File "", line 1, in ?
>>  File "kent.py", line 11, in __iter__
>>    if stamp.startswith(date):
>> NameError: global name 'date' is not defined
>>
>> How does __iter__ know about date?  Should that be self.date?
>
> Yes.  self.date is set in the constructor.

OK, so now i've given it the full load of logs:

>>> for time, entry in kent.logs:
...   print time, entry
...
Traceback (most recent call last):
  File "", line 1, in ?
ValueError: too many values to unpack

How do I get around this?!

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

On Tue, Nov 10, 2009 at 3:48 PM, Stephen Nelson-Smith
 wrote:

> OK, so now i've given it the full load of logs:
>
>>>> for time, entry in kent.logs:
> ...   print time, entry
> ...
> Traceback (most recent call last):
>  File "", line 1, in ?
> ValueError: too many values to unpack
>
> How do I get around this?!

Erm, and now it's failing with only one logfile...

Code here:

http://pastebin.ca/1665013

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-10 Thread Stephen Nelson-Smith

On Tue, Nov 10, 2009 at 3:59 PM, Stephen Nelson-Smith
 wrote:
> On Tue, Nov 10, 2009 at 3:48 PM, Stephen Nelson-Smith
>  wrote:
>
>> OK, so now i've given it the full load of logs:
>>
>>>>> for time, entry in kent.logs:
>> ...   print time, entry
>> ...
>> Traceback (most recent call last):
>>  File "", line 1, in ?
>> ValueError: too many values to unpack
>>
>> How do I get around this?!
>
> Erm, and now it's failing with only one logfile...
>
> Code here:
>
> http://pastebin.ca/1665013

OK - me being dumb.

So what I want to do is be able to multiplex the files - ie read the
next line of all 12 files at once, filter them accordingly, and then
write them out to one combined file.

My old code did this;

min((x.stamp, x) for x in logs)

What's the best way to do this now I'm using an iterable LogFile class?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-11 Thread Stephen Nelson-Smith

Hi Kent,

> See the Python Cookbook recipes I referenced earlier.
> http://code.activestate.com/recipes/491285/
> http://code.activestate.com/recipes/535160/
>
> Note they won't fix up the jumbled ordering of your files but I don't
> think they will break from it either...

That's exactly the problem.  I do need the end product to be in order.
 The problem is that on my current design I'm still getting stuff out
of sync.  What I do at present is this:

Each of these columns is a log file (logfile A, B C D), with a number
of entries, slightly out of order.

1  1  1  1
2  2  2  2
3  3  3  3
A  B C  D   ...

I currently take a slice through all (12) logs, and stick them in a
priority queue, and pop them off in order.  The problem comes that the
next slice could easily contain timestamps before the entries in  the
previous slice.  So I either need some kind of lookahead capability,
or I need to be feeding the queue one at a time, and hope the queue is
of sufficient size to cover the delta between the various logs.  It
all feels a bit brittle and wrong.

I don't really want to admit defeat and have a cron job sort the logs
before entry.  Anyone got any other ideas?

Thanks all - I'm really learning a lot.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Logfile multiplexing

2009-11-11 Thread Stephen Nelson-Smith

Hi,

On Wed, Nov 11, 2009 at 10:05 AM, Alan Gauld  wrote:
> "Stephen Nelson-Smith"  wrote
>>
>> I don't really want to admit defeat and have a cron job sort the logs
>> before entry.  Anyone got any other ideas?
>
> Why would that be admitting defeat?

Well, it mean admitting defeat on solving the problem in python.  Yes
in practical terms, I should probably preprocess the data, but as a
programming exercise, learning how to sort a number of files into one
is something I'd like to crack.

Maybe the real lesson here is knowing which battles to fight, and a
good developer uses the right tools for the job.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Iterator Merging

2009-11-11 Thread Stephen Nelson-Smith

So, following Kent and Alan's advice, I've preprocessed my data, and
have code that produces 6 LogFile iterator objects:

>>> import magpie
>>> magpie.logs[1]

>>> dir(magpie.logs[1])
['__doc__', '__init__', '__iter__', '__module__', 'date', 'logfile',
'timestamp']

>>> for timestamp, entry in itertools.islice(magpie.logs[1], 3):
...   print timestamp, entry
...
[05/Nov/2009:04:02:13 +] 192.168.41.107 - - [05/Nov/2009:04:02:13
+] "GET 
http://sekrit.com/taxonomy/term/27908?page=111&item_884=1&year=66&form_id=dynamic_build_learning_objectives_form&level=121
HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
[05/Nov/2009:04:02:13 +] 66.249.165.22 - - [05/Nov/2009:04:02:13
+] "GET 
/taxonomy/term/27908?page=111&item_884=1&year=66&form_id=objectives_form&level=121
HTTP/1.1" 200 28736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;
+http://www.google.com/bot.html)"
[05/Nov/2009:04:02:15 +] 162.172.185.126 - - [05/Nov/2009:04:02:15
+] "GET 
http://sekrit.com/sites/all/themes/liszt/images/backgrounds/grad_nav_5_h3.gif
HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible;)"

This is great.

So I have a list of 6 of these iterator objects.

Kent mentioned feeding them into an iterator merger.  I've got the
iterator merger in place too:

>>> from imerge import imerge
>>> imerge

>>> imerge([1,3,4],[2,7])

>>> list(imerge([1,3,4],[2,7]))
[1, 2, 3, 4, 7]

What I'm trying to work out is how to feed the data I have - 6 streams
of timestamp, entry into imerge.

How can I do this?

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Iterable Understanding

2009-11-13 Thread Stephen Nelson-Smith

I think I'm having a major understanding failure.

So having discovered that my Unix sort breaks on the last day of the
month, I've gone ahead and implemented a per log search, using heapq.

I've tested it with various data, and it produces a sorted logfile, per log.

So in essence this:

logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]

Gives me a list of LogFiles - each of which has a getline() method,
which returns a tuple.

I thought I could merge iterables using Kent's recipe, or just with
heapq.merge()

But how do I get from a method that can produce a tuple, to some
mergable iterables?

for log in logs:
  l = log.getline()
  print l

This gives me three loglines.  How do I get more?  Other than while True:

Of course tuples are iterables, but that doesn't help, as I want to
sort on timestamp... so a list of tuples would be ok  But how do I
construct that, bearing in mind I am trying not to use up too much
memory?

I think there's a piece of the jigsaw I just don't get.  Please help!

The code in full is here:

import gzip, heapq, re

class LogFile:
   def __init__(self, filename, date):
   self.logfile = gzip.open(filename, 'r')
   for logline in self.logfile:
   self.line = logline
   self.stamp = self.timestamp(self.line)
   if self.stamp.startswith(date):
   break
   self.initialise_heap()

   def timestamp(self, line):
   stamp = re.search(r'\[(.*?)\]', line).group(1)
   return stamp

   def initialise_heap(self):
   initlist=[]
   self.heap=[]
   for x in xrange(10):
   self.line=self.logfile.readline()
   self.stamp=self.timestamp(self.line)
   initlist.append((self.stamp,self.line))
   heapq.heapify(initlist)
   self.heap=initlist


   def getline(self):
   self.line=self.logfile.readline()
   stamp=self.timestamp(self.line)
   heapq.heappush(self.heap, (stamp, self.line))
   pop = heapq.heappop(self.heap)
   return pop

logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ),
 LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ]
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-14 Thread Stephen Nelson-Smith

Hi,

>> for log in logs:
>>  l = log.getline()
>>  print l
>>
>> This gives me three loglines.  How do I get more?  Other than while True:
>>
> I presume that what you want is to get all lines from each log.

Well... what I want to do is create a single, sorted list by merging a
number of other sorted lists.

> for log in logs:
>     for line in log.getlines():
>     print l

This gives me three lines.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-14 Thread Stephen Nelson-Smith

Gah! Failed to reply to all again!

On Sat, Nov 14, 2009 at 1:43 PM, Stephen Nelson-Smith
 wrote:
> Hi,
>> I'm not 100% sure to understand your needs and intention; just have a try. 
>> Maybe what you want actually is rather:
>>
>> for log in logs:
>>  for line in log:
>>    print l
>
> Assuming you meant print line.  This also gives me just three lines.
>
>> Meaning your log objects need be iterable. To do this, you must have an 
>> __iter__ method that would surely simply return the object's getline (or 
>> maybe replace it alltogether).
>
> I'm not sure I fully understand how that works, but yes, I created an
> __iter__ method:
>
>   def __iter__(self):
>      self.line=self.logfile.readline()
>      stamp=self.timestamp(self.line)
>      heapq.heappush(self.heap, (stamp, self.line))
>      pop = heapq.heappop(self.heap)
>      yield pop
>
> But I still don't see how I can iterate over it... I must be missing 
> something.
>
> I thought that perhaps I could make a generator function:
>
> singly = ((x.stamp, x.line) for x in logs)
> for l in singly:
>  print
>
> But this doesn't seem to help either.
>
>
>> Then when walking the log with for...in, python will silently call getline 
>> until error. This means getline must raise StopIteration when the log is 
>> "empty" and __iter__ must "reset" it.
>
> Yes, but for how long?  Having added the __iter__ method, if I now do:
>
> for log in logs:
>   for line in log:
>      print line
>
> I still get only three results.
>
>> Another solution may be to subtype "file", for a file is precisely an 
>> iterator over lines; and you really get your data from a file.
>
> I'm very sorry - I'm not sure I understand.  I get that a file is
> iterable by definition, but I'm not sure how subtyping it helps.
>
>> Simply (sic), there must some job done about this issue of time stamps 
>> (haven't studied in details). Still, i guess this track may be worth an 
>> little study.
>
> Sorry for not understanding :(
>
>> Once you get logs iterable, you may subtype list for your overall log 
>> collection and set it an __iter__ method like:
>>
>>    for log in self:
>>        for line in log:
>>            yield line
>>
>> (The trick is not from me.)
>
> OK - I make the logs iterable by giving them an __iter__ method - I
> get that.  I just don't know what you mean by 'subtype list'.
>
>> Then you can write:
>>    for line in my_log_collection
>
> That sounds useful
>
> S.
>



-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-14 Thread Stephen Nelson-Smith

Hi Wayne,

> Just write your own merge:
> (simplified and probably inefficient and first thing off the top of my head)
> newlist = []
> for x, y, z in zip(list1, list2, list3):

I think I need something like izip_longest don't I, since the list wil
be of varied length?

Also, where do these lists come from?  They can't go in memory -
they're much too big.  This is why I felt using some kind if generator
was the right way - I can produce 3 (or 12) sets of tuples... i just
need to work out how to merge them.

>     if y > x < z:
>         newlist.append(x)
>     elif x > y < z:
>         newlist.append(y)
>     elif x > z < y:
>         newlist.append(z)
> I'm pretty sure that should work although it's untested.

Well, no it won't work.  The lists are in time order, but they won't
match up.  One log may have entries at the same numerical position (ie
the 10th log entry) but earlier than the entries on the previous
lines.  To give a simple example:

List 1List 2List 3
(1, cat)  (2, fish) (1, cabbage)
(4, dog) (5, pig)  (2, ferret)
(5, phone)  (6, horse)  (3, sausage)

Won't this result in the lowest number *per row* being added to the
new list?  Or am I misunderstanding how it works?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Should a beginner learn Python 3.x

2009-11-14 Thread Stephen Nelson-Smith

My brother in law is learning python.  He's downloaded 3.1 for
Windows, and is having a play.  It's already confused him that print
"hello world" gives a syntax error

He's an absolute beginner with no programming experience at all.  I
think he might be following 'Python Programming for the Absolute
Beginner", or perhaps some online guides.  Should I advise him to
stick with 2.6 for a bit, since most of the material out  there will
be for 2.x?  Or since he's learning from scratch, should he jump
straight to 3.x  In which case what can you recommend for him to work
through - I must stress he has absolutely no clue at all about
programming, no education beyond 16 yrs old, but is keen to learn.

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-15 Thread Stephen Nelson-Smith

Hi Martin,

Thanks for a very detailed response.  I'm about to head out, so I
can't put your ideas into practice yet, or get down to studying for a
while.

However, I had one thing I felt I should respond to.

> It's unclear from your previous posts (to me at least) -- are the
> individual log files already sorted, in chronological order?

Sorry if I didn't make this clear.  No they're not.  They are *nearly*
sorted - ie they're out by a few seconds, every so often, but they are
in order at the level of minutes, or even in the order of a few
seconds.

It was precisely because of this that I decided, following Alan's
advice, to pre-filter the data.  I compiled a unix sort command to do
this, and had a solution I was happy with, based on Kent's iterator
example, fed into heapq.merge.

However, I've since discovered that the unix sort isn't reliable on
the last and first day of the month.  So, I decided I'd need to sort
each logfile first.  The code at the start of *this* thread does this
- it uses a heapq per logfile and is able to produce a tuple of
timestamp, logline, which will be in exact chronological order.  What
I want to do is merge this output into a file.

I think I probably have enough to be getting on with, but I'll be sure
to return if I still have questions after studying the links you
provided, and trying the various suggestions people have made.

Thanks so very much!

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Unexpected iterator

2009-11-15 Thread Stephen Nelson-Smith

> To upack your variables a and b you need an iterable object on the right
> side, which returns you exactly 2 variables

What does 'unpack' mean?  I've seen a few Python errors about packing
and unpacking.  What does it mean?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-15 Thread Stephen Nelson-Smith

Hi Marty,

Thanks for a very lucid reply!

> Well, you haven't described the unreliable behavior of unix sort so I
> can only guess, but I assume you know about the --month-sort (-M) flag?

Nope - but I can look it up.  The problem I have is that the source
logs are rotated at 0400 hrs, so I need two days of logs in order to
extract 24 hrs from  to 2359 (which is the requirement).  At
present, I preprocess using sort, which works fine as long as the
month doesn't change.

> import gzip
> from heapq import heappush, heappop, merge

Is this a preferred method, rather than just 'import heapq'?

> def timestamp(line):
>    # replace with your own timestamp function
>    # this appears to work with the sample logs I chose
>    stamp = ' '.join(line.split(' ', 3)[:-1])
>    return time.strptime(stamp, '%b %d %H:%M:%S')

I have some logfie entries with multiple IP addresses, so I can't
split using whitespace.

> class LogFile(object):
>    def __init__(self, filename, jitter=10):
>        self.logfile = gzip.open(filename, 'r')
>        self.heap = []
>        self.jitter = jitter
>
>    def __iter__(self):
>        while True:
>            for logline in self.logfile:
>                heappush(self.heap, (timestamp(logline), logline))
>                if len(self.heap) >= self.jitter:
>                    break

Really nice way to handle the batching of the initial heap - thank you!

>            try:
>                yield heappop(self.heap)
>            except IndexError:
>                raise StopIteration
>
> logs = [
>    LogFile("/home/stephen/qa/ded1353/quick_log.gz"),
>    LogFile("/home/stephen/qa/ded1408/quick_log.gz"),
>    LogFile("/home/stephen/qa/ded1409/quick_log.gz")
> ]
>
> merged_log = merge(*logs)
> with open('/tmp/merged_log', 'w') as output:
>    for stamp, line in merged_log:
>        output.write(line)

Oooh, I've never used 'with' before.  In fact I am currently
restricted to 2.4 on the machine on whch this will run.  That wasn't a
problem for heapq.merge, as I was just able to copy the code from the
2.6 source.  Or I could use Kent's recipe.

> ... which probably won't preserve the order of log entries that have the
> same timestamp, but if you need it to -- should be easy to accommodate.

I don't think  that is necessary, but I'm curious to know how...

Now... this is brilliant.  What it doesn't do that mine does, is
handle date - mine checks for whether it starts with the appropriate
date, so we can extract 24 hrs of data.  I'll need to try to include
that.  Also, I need to do some filtering and gsubbing, but I think I'm
firmly on the right path now, thanks to you.

> HTH,

Very much indeed.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] GzipFile has no attribute 'exit'

2009-11-16 Thread Stephen Nelson-Smith

I'm trying to write a gzipped file on the fly:

merged_log = merge(*logs)

with gzip.open('/tmp/merged_log.gz', 'w') as output:
for stamp, line in merged_log:
output.write(line)

But I'm getting:

Traceback (most recent call last):
  File "./magpie.py", line 72, in 
with gzip.open('/tmp/merged_log.gz', 'w') as output:
AttributeError: GzipFile instance has no attribute '__exit__'

What am I doing wrong, and how do I put it right?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] I love python / you guys :)

2009-11-16 Thread Stephen Nelson-Smith

Hello all,

On Mon, Nov 16, 2009 at 6:58 AM, Stefan Lesicnik  wrote:
> hi,
>
> Although not a question, i just want to tell you guys how awesome you are!

+1

I've been a happy member of this list for years, even though I've
taken a 3 year Ruby sabbatical!

I've always found it to be full of invaluable advice, and is, in my
opinion, a real gem in Python's crown.  It's one of the reasons I feel
so confident in recommending python to anyone - they can be guaranteed
a friendly and informational welcome on this list.

Thank you very much - it is greatly appreciated.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] proxy switcher - was Re: I love python / you guys :)

2009-11-16 Thread Stephen Nelson-Smith

Hi,

>> When i use our company's LAN i set my proxy variable by hand in .bashrc.
>> There are 4 files to insert proxy variable:
>>
>> in ~/.bashrc, /root/.bashrc, /etc/wgetrc and /etc/apt/apt.conf.
>>
>> The last one is actually rename e.g. mv to apt.conf to activate proxy and mv
>> to apt.conf.bak to deactivate. The proxy variable is something like this
>>
>> export http_proxy=http://username:passw...@proxy:port
>> ftp_proxy=$http_proxy
>>
>> To activate i uncomment them then source .bashrc. To deactivate i put back
>> the comment sign. I do it all in vim e.g. vim -o the-3-files-above. For
>> apt.conf see rename above. I deactivate because i have another internet
>> connection option via 3G usb modem. But thats another story.
>>
>> I will do this myself in python so please show me the way. Surely this can
>> be done.

Here's what I knocked up over lunch.  It doesn't cover the moving of
the file, I don't like that it's deep-nested, and I've not tested it,
but I welcome criticism and feedback:

files = ['file1', 'file2', 'file3', 'file4']
settings = ['export http_proxy=', 'ftp_proxy=']

for file in files:
  with open(file, 'rw') as file:
for line in file:
  for setting in settings:
if setting in line:
  if line[0] == '#':
line = line[1:]
  else:
line =  '#' + line
output.write(line)

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] proxy switcher - was Re: I love python / you guys :)

2009-11-16 Thread Stephen Nelson-Smith

Evening,
> Yes, you can, but not this way.  I'm guessing the op was changing his mind
> back and forth, between having two files, one for reading and one for
> writing, and trying to do it in place.  The code does neither/both.

Well, just neither I think!  I didn't check if 'rw' was possible.  My
plan was to read the file and write to the same file as the change was
made, to avoid having to use temporary files and os.move.  But I
wasn't near a machine with python on it, so it was untested.

> Take a look at fileinput.FileInput()  with the inplace option.  It makes it
> convenient to update text files "in place" by handling all the temp file
> copying and such.  It even handles iterating through the list of files.

Will definitely look into that.

> Good point about 'file' as its a built-in name.  If the code ever has to use
> the std meaning, you have a problem.  Worse, it's unreadable as is.

Thanks for that hint.  In what way is it unreadable?  Because the
intent is not clear because of the ambiguity of the name?

>> Also, you didn't define 'output' anywhere.  Is this an implicit
>> declaration

No, just a dumb mistake.

> Another potential  bug with the code is if more than one "setting" could
> appear in a line. It would change the line for an odd number, and not for an
> even number of matches.

Not sure I follow that.  From the OPs description, it appeared he
would be entering these lines in.  I figured it was safe to trust the
OP not to put in duplicate data.  Maybe defensively I should check for
it anyway?

> Also, the nesting of output.write() is wrong, because file position isn't
> preserved, and random access in a text file isn't a good idea anyway.

Could you expand on this?

>  But
> there's not much point in debugging that till the OP decides how he's going
> to handle the updates, via new files and copying or renaming, or via
> inputfile.

I'll look up inputfile, and try again :)

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Do you use unit testing?

2009-11-16 Thread Stephen Nelson-Smith

Great question,

I learned TDD with PyUnit, but since moved to using Ruby, and have
been spoilt by rspec and even cucumber.  My instincts are to write
things test first, but so far I'm not finding PyUnit easy enough to
get going.

Once I get into the groove, I find it's a wonderful way to work - and
avoids some of the drivel I've come up with in the last few days.

As a discipline - work out what we want to test, write the test, watch
it fail, make it pass - I find this a very productive way to think and
work.

S.

On Mon, Nov 16, 2009 at 8:54 PM, Modulok  wrote:
> List,
>
> A general question:
>
> How many of you guys use unit testing as a development model, or at
> all for that matter?
>
>  I just starting messing around with it and it seems painfully slow to
> have to write a test for everything you do. Thoughts, experiences,
> pros, cons?
>
> Just looking for input and different angles on the matter, from the
> Python community.
> -Modulok-
> ___
> Tutor maillist  -  tu...@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Use of 'or'

2009-11-17 Thread Stephen Nelson-Smith

A friend of mine mentioned what he called the 'pythonic' idiom of:

print a or b

Isn't this a 'clever' kind or ternary - an if / else kind of thing?

I don't warm to it... should I?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Readable date arithmetic

2009-11-18 Thread Stephen Nelson-Smith

I have the following method:

def get_log_dates(the_date_we_want_data_for):
  t = time.strptime(the_date_we_want_data_for, '%Y%m%d')
  t2 = datetime.datetime(*t[:-2])
  extra_day = datetime.timedelta(days=1)
  t3 = t2 + extra_day
  next_log_date = t3.strftime('%Y%m%d')
  return (the_date_we_want_data_for, next_log_date)

Quite apart from not much liking the t[123] variables, does date
arithmetic really need to be this gnarly?

How could I improve the above, especially from a readability
perspective?  Or is it ok?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Why are these results different?

2009-11-19 Thread Stephen Nelson-Smith

I'm seeing different behaviour between code that looks to be the same.
 It obviously isn't the same, so I've misunderstood something:


>>> log_names
('access', 'varnish')
>>> log_dates
('20091105', '20091106')
>>> logs = itertools.chain.from_iterable(glob.glob('%sded*/%s*%s.gz' % 
>>> (source_dir, log, date)) for log in log_names for date in log_dates)
>>> for log in logs:
...   print log
...
/Volumes/UNTITLED 1/ded1/access_log-20091105.gz
/Volumes/UNTITLED 1/ded2/access_log-20091105.gz
/Volumes/UNTITLED 1/ded3/access_log-20091105.gz
/Volumes/UNTITLED 1/ded1/access_log-20091106.gz
/Volumes/UNTITLED 1/ded2/access_log-20091106.gz
/Volumes/UNTITLED 1/ded3/access_log-20091106.gz
/Volumes/UNTITLED 1/ded1/varnishncsa.log-20091105.gz
/Volumes/UNTITLED 1/ded2/varnishncsa.log-20091105.gz
/Volumes/UNTITLED 1/ded3/varnishncsa.log-20091105.gz
/Volumes/UNTITLED 1/ded1/varnishncsa.log-20091106.gz
/Volumes/UNTITLED 1/ded2/varnishncsa.log-20091106.gz
/Volumes/UNTITLED 1/ded3/varnishncsa.log-20091106.gz

However:

for date in log_dates:
  for log in log_names:
 logs = itertools.chain.from_iterable(glob.glob('%sded*/%s*%s.gz'
% (source_dir, log, date)))

Gives me one character at a time when I iterate over logs.

Why is this?

And how, then, can I make the first more readable?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Replace try: except: finally:

2009-11-20 Thread Stephen Nelson-Smith

I need to make some code Python 2.4 compliant... the only thing I see
is use of try: except: finally:

To make this valid, I think I need to do a try: finally: and next try:
except: inside.  Is this correct?

The code has;

try:
  ...
  ...
  ...
except SystemExit:
raise
except KeyboardInterrupt:
if state.output.status:
print >> sys.stderr, "\nStopped."
sys.exit(1)
except:
sys.excepthook(*sys.exc_info())
sys.exit(1)
finally:
for key in connections.keys():
if state.output.status:
print "Disconnecting from %s..." % denormalize(key),
connections[key].close()
if state.output.status:
print "done."

How should I replace this?

S.



-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Iterable Understanding

2009-11-23 Thread Stephen Nelson-Smith

Martin,

>    def __iter__(self):
>        while True:
>            for logline in self.logfile:
>                heappush(self.heap, (timestamp(logline), logline))
>                if len(self.heap) >= self.jitter:
>                    break
>            try:
>                yield heappop(self.heap)
>            except IndexError:
>                raise StopIteration

In this __iter__ method, why are we wrapping a for loop in a while True?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Monitoring a logfile

2009-12-01 Thread Stephen Nelson-Smith

Varnish has a dedicated (but not always) reliable logger service.  I'd
like to monitor the logs - specifically I want to check that a known
entry appears in there every minute (it should be there about 10 times
a minute).

What's going to be the best way to carry out this kind of check?  I
had a look at SEC, but it looks horrifically complicated.

Could someone point me in the right direction?  I think I basically
want to be able to check the logfile every minute and check that an
entry is in there since the last time I met  I just can't see the
right way to get started.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Trying to send a URL via XMPP

2009-12-10 Thread Stephen Nelson-Smith

Hi,

I'm trying to send a message to a user via XMPP - but I want them to
receive a clickable word.

I'm using Python 2.4 on RHEL 5.4 and  python-xmpp-0.4.1-6 from EPEL.

I've tried variations on:

>>> jid = xmpp.protocol.JID('motherin...@jabber.sekrit.org.uk')
>>> cl = xmpp.Client(jid.getDomain())
>>> cl.connect()
>>> cl.auth(jid.getNode(),'notgoodenough')
>>> cl.send(xmpp.protocol.Message("snelsonsm...@jabber.sekrit.org.uk", ">> xmlns='http://jabber.org/protocol/xhtml-im'>>> xmlns='http://www.w3.org/1999/xhtml'>>> href='http://rt.sekrit.org.uk/rt3/Ticket/Display.html?id=#77'>Ticket #77 
>>> updated."))

But every time I just receive the raw html

Any idea what I am doing wrong?

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Modules and Test Suites

2009-12-29 Thread Stephen Nelson-Smith

I do quite a lot of programming in Ruby.  When I do so, my code tends
to have the following layout:

/path/to/src/my_project

Inside my_project:

lib/
test/
my_project.rb

my_project.rb uses classes and helper methods in lib

Inside test, I have a test suite that also uses classes and helper
methods in ../lib

This seems like a sensible way to keep tests and other code separate.

In Python I don't know how to do this so I just have all my tests
in the same place as the rest of the code.

a) Is my above way a sensible and pythonic approach?
b) If so - how can I do it in Python?
c) If not, is there a better way than having all the tests in the same
place as the rest of the code?

S.

-- 
Stephen Nelson-Smith
Technical Director
Atalanta Systems Ltd
www.atalanta-systems.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] [OT] Urgent Help Needed

2007-06-06 Thread Stephen Nelson-Smith

Hello friends,

I urgently need to get hold of someone who can help me with the
closing stages of a database project - porting data from an old system
to a completely rewritten schema.

My lead developer has suffered a bereavement, and I need a SQL expert,
or programmer who could accomplish the porting.

I've budgeted a week to get the task done, so need someone who could
join my team at this very short notice on a week's contract.

If you know anyone, or feel you fit the bill, let me know off list.
I'm based in North London.

Thanks, and sorry for taking advantage of the list - hope you al understand.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] [OT] Urgent Help Needed

2007-06-06 Thread Stephen Nelson-Smith

On 6/6/07, Tim Golden <[EMAIL PROTECTED]> wrote:

> You might want to mention the database (or databases) in
> question. Given the short timeframes, people'd feel more
> confident if it was the system they're familiar with.

Sorry yes.  We have an old (primitive) accounts system, which is
basically one big table, effectively a log of purchases.  This is in
MySQL 4.

We have a new model, which abstracts out into half a dozen tables
representing different entities.  This is going to be in MySQL 5.

What we're trying to do is extract identities from the transaction
table, accounting for things like name changes, company changes.
We've been doing it with SQL statements, and I have some code snippets
I can show.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation

2007-09-18 Thread Stephen Nelson-Smith

Hello friends,

Over lunch today some colleagues discussed a question they are using
as a conversation starter in some preliminary chats in our developer
hiring process.

The question was:

"Place the following three in order: Inheritance, Polymorphism, Encapsulation."

They specifically did not define in *what* order, leaving that for
the answerer to decide.

I responded thus:

Encapsulation comes with OO - you get it for free.  Polymorphism is a
hugely important enabler, but this in itself is enabled by
Inheritance, so I put them in this order: Inheritance, Polymorphism,
Encapsulation.

My colleagues felt that of the three options this was the least
satisfactory, and showed a lack of understanding of OO design.  One
even suggested that one could have polymorphism without inheritance.

I'm curious as to your opinions - answer to the question, responses to
my answer, and to the feedback from my colleagues.

Thanks!

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation

2007-09-18 Thread Stephen Nelson-Smith

Michael Langford wrote:

> Inheritance: Syntactic sugar that's not really needed to make a well
> organized system. Often overused, especially by programmers in big
> companies, beginning students of programmers, green engineers, and
> professors. In practice hides a lot of data, often making behavior
> surprising, therefore harder to maintain. Can be used in limited situations
> to great advantage, but like cologne on car salesmen, is used in greater
> amounts than it should be. One should always ask, can I make a simpler
> system with composition.

Pretty much exactly what my colleague said.  Having thought about it I
understand this.

> Polymorphism: The process of creating many classes which a single interface
> which are then all used by an object that doesn't know or need to know the
> type. Many people think you only get this by using inheritance and therefore
> use inheritance many places a simpler, less opaque, more lightweight
> solution will work. Most dynamically typed languages (most notably, python,
> ruby and smalltalk) don't even require you specify the interface explicitly
> to get polymorphic behavior.  C++ templates can do non-explicit interface
> polymorphism, however in a more complicated, blindingly fast to run,
> blindingly slow to compile way.

Also what my colleague said!  This is the bit I had missed.  Perhaps I
need to rethink / reload my understanding of polymorphism.

> Encapsulation: The process of taking what shouldn't matter to the external
> world, and locking it behind an interface. This principle works best when
> put into small, specialized libraries and designed for general use, as this
> is the only encapsulated form that is shown to last over time. Supposedly
> something OO based design allows, but in reality, the coupling among classes
> varies in differing amounts. The module/public visibility of Java is a good
> compromise with classes that hides some data but share some with certain
> other classes. C++ has large issues for historical reasons on this front, as
> the implementation section of a class is largely revealed through the class
> definition.

Interesting.  And again revealing of a weakness in my understanding.

I do think this is a good question for getting a sense of where a
person's understanding is.  I wonder how much this understanding is a
pre-requistite for being a good developer... not too much I hope!

>  --Michael

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation

2007-09-19 Thread Stephen Nelson-Smith

On 9/19/07, Michael Langford <[EMAIL PROTECTED]> wrote:
> I do think this is a good question for getting a sense of where a
> person's understanding is.  I wonder how much this understanding is a
> pre-requistite for being a good developer... not too much I hope!
>
> A good developer is a very loaded term. :o)
>
> There are a lot of good programmers who are bad developers. A lot of being a
> good developer is getting things done that work well enough to supply
> whomever is downstream from you at work.

Agreed - we discussed this at work yesterday when designing the
interview tasks for our new recruits.  On this point, I feel I score
highly.

> It also has to do with several things such as source control
> tools, ticket tracking tools, testing, debugging, and deployment as well.

Right - which is  the sort of stuff a fresh-faced university student
will take time to get up to speed with,

> You will find yourself making more informed choices on things and debugging
> things a lot better the more of this technical programming stuff you pick
> up. A good compiler/interpreter book and a lot of experimentation will
> really open your eyes on a lot of this, as will just doing your job while
> constantly reading reading reading about what you are learning and doing
> things like talk with your co-workers at lunch about hiring tests, and
> participating in online discussion boards.

:o) Yes - this is exactly what I feel, and try to do.  I'm lucky that
I work with some brilliant people.  I wasn't hired as a developer -
I'm a sysadmin who can also program.  I've been seconded to the
development team, and am loving it, but quickly realising how much I
don't know!

> In addition you should try doing really hard things that break or come
> really close to breaking the tools you use. But not too hard, you have to be
> able to *do* them after all. Learning how VM's and interpreters and
> compilers work, then understanding the concepts behind why they work that
> way really helps give you a framework to hang an understanding of what's
> going on in a program. One way to get this knowledge is to start far in the
> past and work forward. Another is to dive deep into something from today.

Inspiring advice! :)

> I will say if you're already in the workforce and not headed back out to
> school, the onus will be on you to pick up more of this. Much of the
> conceptual stuff you won't hear at work, except occasionally overheard in a
> discussion assuming you already know it. You'll have to ask questions, and
> you'll have to read up on it afterwards, because often your co-workers won't
> have the whole picture either.

Right.  And I must spend more time here again.  Since leaving my last
job most of my programming has been in Ruby (and now PHP and SQL).  I
sort of fear that my head won't be able to hold it all at once, so
I've neglected my Python studies!

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] uncomprehension on RE

2007-09-19 Thread Stephen Nelson-Smith

On 9/19/07, cedric briner <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I do not understand the behaviour of this:
>
> import re
> re.search('(a)*','aaa').groups()
> ('a',)
>
> I was thinking that the ``*'' will operate on the group delimited by the
> parenthesis. And so, I was expecting this result:
> ('a', 'a', 'a')
>
> Is there something I'am missing ?

What you are trying to do, I think, is get the * to expand to the
number of times you expect your group to appear.  You cannot do this.
You need to specify as many groups as you want to get returned:

re.search('(x)(x)(x)', 'xxx').groups()  would work.

In your case you have a single group that matches several times.
Python simply returns one match.

Consider this:

>>> re.search('(.)*', 'abc').groups()
('c',)

Can you see how that happens?

You could do re.findall('x', 'xxx') - but I don't know what you are
actually trying to do.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Finding even and odd numbers

2007-09-19 Thread Stephen Nelson-Smith

On 9/19/07, Boykie Mackay <[EMAIL PROTECTED]> wrote:
> Hi guys,
>
> I have come across a bit of code to find if a group of numbers is odd or
> even.The code snippet is shown below:
>
> if not n&1:
> return false
>
> The above should return false for all even numbers,numbers being
> represented by n.I have tried to wrap my head around the 'not n&1' but
> I'm failing to understand what's going on.Could someone please explain
> the statement.


It's just a bitwise and, which will always return 1 for an odd number.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] uncomprehension on RE

2007-09-20 Thread Stephen Nelson-Smith

On 9/20/07, cedric briner <[EMAIL PROTECTED]> wrote:

> To let you know, I'm writing a script to generate bind9 configuration
> from a nis hosts table. So I was trying in a one re to catch from this:
>
>   [  ...] [# comment]
> e.g:
> 10.12.23.45 hostname1 alias1 alias2 alias3 # there is a nice comment
> 37.64.86.23 hostname2
> 35.25.89.34 hostname3 alias5

I'm not sure I follow.  Could you state explicitly what the input is
(I think the above), and what output you want?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Permission Report

2007-10-07 Thread Stephen Nelson-Smith

Hello all,

I have a tree of code on a machine which has been tweaked and fiddled
with over several months, and which passes tests.

I have the same codebase in a new virtual machine.  A shell hack[0]
shows me that the permissions are very different between the two.

I could use rsync or something to synchronise them, but I would like
to produce a report of the sort:

Change file: foo from 755 to 775

So I can try to work out why this is necessary.

I'm not sure how best to proceed - I guess walk through the filesystem
gathering info using stat, then do the same on the new system, and
compare.

Or are there some clever modules I could use?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Fwd: Permission Report

2007-10-08 Thread Stephen Nelson-Smith

Sorry...

-- Forwarded message --
From: Stephen Nelson-Smith <[EMAIL PROTECTED]>
Date: Oct 8, 2007 6:54 PM
Subject: Re: [Tutor] Permission Report
To: Alan Gauld <[EMAIL PROTECTED]>

On 10/8/07, Alan Gauld <[EMAIL PROTECTED]> wrote:

> Yes, os.walk and os.stat should do what you want.

Ok - I have:

import os, stat
permissions = {}

for dir, base, files in os.walk('/home/peter/third/accounts/'):
  for f in files:
file = os.path.join(dir, f)
perm = os.stat(file)[stat.ST_MODE]
permissions[file] = oct(stat.S_IMODE(perm))

This is fine - it stores the info I need.

But if I want to run the same procedure on a remote host, and store
the results in a dictionary so they can be compared, what would I do?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Fwd: Permission Report

2007-10-10 Thread Stephen Nelson-Smith

On 10/10/07, Kent Johnson <[EMAIL PROTECTED]> wrote:
> Stephen Nelson-Smith wrote:
> > But if I want to run the same procedure on a remote host, and store
> > the results in a dictionary so they can be compared, what would I do?
>
> What kind of access do you have to the remote host? If you have
> filesystem access you can use the same program running locally.

In the end I used sshfs, which worked fine.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] VIX API

2007-10-11 Thread Stephen Nelson-Smith

Hello,

Does anyone know if there are python bindings for the VMware VIX API?

I googled for a bit, but didn't find them...

How tricky would it be to wrap the C API?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] NNTP Client

2007-11-13 Thread Stephen Nelson-Smith

Hello all,

I wish to pull all the articles for one particular newsgroup to a
local machine, on a regular basis.  I don't wish to read them - I will
be parsing the contents programatically.  In your view is it going to
be best to use an 'off-the-shelf' news reader, or ought it to be
straightforward to write a client that does this task?  If so, any
pointers would be most welcome.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] NNTP Client

2007-11-13 Thread Stephen Nelson-Smith

On Nov 13, 2007 2:13 PM, Stephen Nelson-Smith <[EMAIL PROTECTED]> wrote:

> ought it to be straightforward to write a client that does this task?

Well:

>>> server = NNTP('news.gmane.org')
>>> resp, count, first, last, name =
server.group("gmane.linux.redhat.enterprise.announce")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.5/nntplib.py", line 346, in group
resp = self.shortcmd('GROUP ' + name)
  File "/usr/lib/python2.5/nntplib.py", line 260, in shortcmd
return self.getresp()
  File "/usr/lib/python2.5/nntplib.py", line 215, in getresp
resp = self.getline()
  File "/usr/lib/python2.5/nntplib.py", line 207, in getline
if not line: raise EOFError
EOFError

What's wrong with that then?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] NNTP Client

2007-11-13 Thread Stephen Nelson-Smith

On Nov 13, 2007 4:01 PM, Stephen Nelson-Smith <[EMAIL PROTECTED]> wrote:
> >>> server = NNTP('news.gmane.org')
>
> What's wrong with that then?

server, apparently:>>> s.group("gmane.discuss")
('211 11102 10 11329 gmane.discuss', '11102', '10', '11329', 'gmane.discuss')
>>> server.group("gmane.discuss")
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.5/nntplib.py", line 346, in group
resp = self.shortcmd('GROUP ' + name)
  File "/usr/lib/python2.5/nntplib.py", line 259, in shortcmd
self.putcmd(line)
  File "/usr/lib/python2.5/nntplib.py", line 199, in putcmd
self.putline(line)
  File "/usr/lib/python2.5/nntplib.py", line 194, in putline
self.sock.sendall(line)
  File "", line 1, in sendall
socket.error: (32, 'Broken pipe')

Stupid of me.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] [OT] Vacancy - python systems programmer

2007-11-15 Thread Stephen Nelson-Smith

All,

I may shortly be in the position of being able to hire a python
systems programmer for a short contract (1-2 days initially to spike
an ongoing project).

The ideal person will have the following:

* Solid experience of Python for systems programming and database interaction
* Familiarity with MySQL 5 - inserting and querying medium-sized
databases with Python
* Systems experience on RHEL or equivalent clone
* Sysadmin experience on Unix / Linux with patching and package
management systems
* Experience in a high-availability environment, eg managed hosting.
* Experience of working in an agile environment (test first, pairing)

This is, of course, a shopping list, but is intended to give a sense
of the sort of background I'm after.

If this sounds like the sort of thing you'd be interested in, please
contact me off list.

Thanks,

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Parsing Word Docs

2007-03-08 Thread Stephen Nelson-Smith

Hello all,

I have a directory containing a load of word documents, say 100 or so.
which is updated every hour.

I want a cgi script that effectively does a grep on the word docs, and
returns each doc that matches the search term.

I've had a look at doing this by looking at each binary file and
reimplementing strings(1) to capture useful info.  I've also read that
one can treat a word doc as a COM object.  Am I right in thinking that
I can't do this on python under unix?

What other ways are there?  Or is the binary parsing the way to go?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] How do to an svn checkout using the svn module?

2007-03-09 Thread Stephen Nelson-Smith

Hello,

I want to do a simple svn checkout using the python svn module.  I
haven't been able to find any/much/basic documentation that discusses
such client operations.

This should be very easy, I imagine!

What do I need to do?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] How do to an svn checkout using the svn module?

2007-03-09 Thread Stephen Nelson-Smith

On 3/9/07, Kent Johnson <[EMAIL PROTECTED]> wrote:

> Did you find the pysvn Programmer's Guide that comes with pysvn? It has
> this example:

Ah.. no I haven't got pysvn installed... but will take a look.

What I do have is:

>>> import sys
>>> import svn.core
>>> import svn.client
>>> import sys
>>> pool = svn.core.svn_pool_create(None)
>>> svn.core.svn_config_ensure( None, pool )
>>> ctx = svn.client.svn_client_ctx_t()
>>> config = svn.core.svn_config_get_config( None, pool )
>>> ctx.config = config
>>> rev = svn.core.svn_opt_revision_t()
>>> rev.kind = svn.core.svn_opt_revision_head
>>> rev.number = 0
>>> ctx.auth_baton = svn.core.svn_auth_open( [], pool )
>>> url = "https://svn.uk.delarue.com/repos/prdrep/prddoc/";
>>> path ="/tmp"
>>> svn.client.svn_client_checkout(url, path, rev, 0, ctx, pool)

Traceback (most recent call last):
  File "", line 1, in ?
libsvn._core.SubversionException: ("PROPFIND request failed on
'/repos/prdrep/prddoc'", 175002)

Not sure what I am doing wrong... the url is correct.

> Kent

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Parsing Word Docs

2007-03-09 Thread Stephen Nelson-Smith

On 3/8/07, Tim Golden <[EMAIL PROTECTED]> wrote:

> Simplest thing's probably antiword (http://www.winfield.demon.nl/)
> and then whatever text-scanning approach you want.

I've gone for:

#!/usr/bin/env python

import glob, os

url = "/home/cherp/prddoc"
searchstring = "dxpolbl.p"
worddocs = []

for (dirpath, dirnames, filenames) in os.walk(url):
  for f in filenames:
if f.endswith(".doc"):
  worddocs.append(os.path.join(dirpath,f))

for d in worddocs:
  for i in glob.glob(d):
if searchstring in open(i,"r").read():
  print "Found it in: ", i.split('/')[-1]

Now... I want to convert this to a cgi-script... how do I grab
$QUERY_STRING in python?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] [OT] ETL Tools

2007-03-29 Thread Stephen Nelson-Smith

Hello all,

Does anyone know of any ETL (Extraction, Transformation, Loading)
tools in Python (or at any rate, !Java)?

I have lots (and lots) of raw data in the form of log files which I
need to process and aggregate and then do a whole bunch of group-by
operations, before dumping them into text/relational database for a
search engine to access.

At present we have a bunch of scripts in perl and ruby, and a berkley
and mysql database for the grouping operations.  This is proving to be
a little slow with the amount of data we now have, so I am looking
into alternatives.

Does anyone have any experience of this sort of  thing?  Or know
someone who does, that I could talk to?

Best regards,

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] HTML Parsing

2008-04-21 Thread Stephen Nelson-Smith

Hi,

I want to write a little script that parses an apache mod_status page.

I want it to return simple the number of page requests a second and
the number of connections.

It seems this is very complicated... I can do it in a shell one-liner:

curl 10.1.2.201/server-status 2>&1 | grep -i request | grep dt | {
IFS='> ' read _ rps _; IFS='> ' read _ currRequests _ _ _ _
idleWorkers _; echo $rps $currRequests $idleWorkers   ; }

But that's horrid.

So is:

$ eval `printf '3 requests currently being processed, 17 idle
workers\n 2.82 requests/sec - 28.1 kB/second - 10.0
kB/request\n' | sed -nr '// { N;
s@([0-9]*)[^,]*,([0-9]*).*([0-9.]*)[EMAIL PROTECTED]((\1+\2));[EMAIL 
PROTECTED];
}'`
$ echo "workers: $workers reqs/secs $requests"
workers: 20 reqs/sec 2.82

The page looks like this:



Apache Status

Apache Server Status for 10.1.2.201

Server Version: Apache/2.0.46 (Red Hat)
Server Built: Aug  1 2006 09:25:45

Current Time: Monday, 21-Apr-2008 14:29:44 BST
Restart Time: Monday, 21-Apr-2008 13:32:46 BST
Parent Server Generation: 0
Server uptime:  56 minutes 58 seconds
Total accesses: 10661 - Total Traffic: 101.5 MB
CPU Usage: u6.03 s2.15 cu0 cs0 - .239% CPU load
3.12 requests/sec - 30.4 kB/second - 9.7 kB/request
9 requests currently being processed, 11 idle workers


How can/should I do this?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] HTML Parsing

2008-04-21 Thread Stephen Nelson-Smith

On 4/21/08, Andreas Kostyrka <[EMAIL PROTECTED]> wrote:
> As usual there are a number of ways.
>
>  But I basically see two steps here:
>
>  1.) capture all dt elements. If you want to stick with the standard
>  library, htmllib would be the module. Else you can use e.g.
>  BeautifulSoup or something comparable.

I want to stick with standard library.

How do you capture  elements?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] HTML Parsing

2008-04-21 Thread Stephen Nelson-Smith

Hi,

>  for lineno, line in enumerate(html):

-Epython2.2hasnoenumerate()

Can we code around this?

>   x = line.find("requests/sec")
>   if x >= 0:
>no_requests_sec = line[3:x]
>break
>  for lineno, line in enumerate(html[lineno+1:]):
>   x = line.find("requests currently being processed")
>   if x >= 0:
>no_connections = line[3:x]

That all looks ok.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] HTML Parsing

2008-04-22 Thread Stephen Nelson-Smith

Hello,

>  For data this predictable, simple regex matching will probably work fine.

I thought that too...

Anyway - here's what I've come up with:

#!/usr/bin/python

import urllib, sgmllib, re

mod_status = urllib.urlopen("http://10.1.2.201/server-status";)
status_info = mod_status.read()
mod_status.close()

class StatusParser(sgmllib.SGMLParser):
def parse(self, string):
self.feed(string)
self.close()

def __init__(self, verbose=0):
sgmllib.SGMLParser.__init__(self, verbose)
self.information = []
self.inside_dt_element = False

def start_dt(self, attributes):
self.inside_dt_element = True

def end_dt(self):
self.inside_dt_element = False

def handle_data(self, data):
if self.inside_dt_element:
self.information.append(data)

def get_data(self):
return self.information


status_parser = StatusParser()
status_parser.parse(status_info)

rps_pattern = re.compile( '(\d+\.\d+) requests/sec' )
connections_pattern = re.compile( '(\d+) requests\D*(\d+) idle.*' )

for line in status_parser.get_data():
rps_match = rps_pattern.search( line )
connections_match =  connections_pattern.search( line )
if rps_match:
rps = float(rps_match.group(1))
elif connections_match:
connections = int(connections_match.group(1)) +
int(connections_match.group(2))

rps_threshold = 10
connections_threshold = 100

if rps > rps_threshold:
print "CRITICAL: %s Requests per second" % rps
else:
print "OK: %s Requests per second" % rps

if connections > connections_threshold:
print "CRITICAL: %s Simultaneous Connections" % connections
else:
print "OK: %s Simultaneous Connections" % connections

Comments and criticism please.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Sending Mail

2008-04-22 Thread Stephen Nelson-Smith

smtpserver = 'relay.clara.net'

RECIPIENTS = ['[EMAIL PROTECTED]']
SENDER = '[EMAIL PROTECTED]'
message = """Subject: HTTPD ALERT: %s requests %s connections
Please investigate ASAP.""" % (rps, connections)

session = smtplib.SMTP(smtpserver)
smtpresult = session.sendmail(SENDER, RECIPIENTS, message)
if smtpresult:
errstr = ""
for recip in smtpresult.keys():
errstr = """Could not delivery mail to: %s

Server said: %s
%s

%s""" % (recip, smtpresult[recip][0], smtpresult[recip][1], errstr)
raise smtplib.SMTPException, errstr

This sends emails

But gmail says it came from "unknown sender"

I see an envelope-from in the headers.

What am I missing?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Web Stats

2008-06-11 Thread Stephen Nelson-Smith

Hi,

I've been asked to produce a report showing all possible resources in
a website, together with statistics on how frequently they've been
visited.  Nothing fancy - just number and perhaps date of last visit.
 This has to include resources which have not been visited, as the
point is to clean out old stuff.

I have several years of apache weblogs.

Is there something out there that already does this?  If not, or if
it's interesting and not beyond the ken of a reasonable programmer,
could anyone provide some pointers on where to start?

Thanks,

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Web Stats

2008-06-11 Thread Stephen Nelson-Smith

Hello,

>>  This has to include resources which have not been visited, as the
>> point is to clean out old stuff.
>
> Take a look at AWStats (not Python).

Doesn't this 'only' parse weblogs?  I'd still need some kind of spider
to tell me all the possible resources available wouldn't I?  It's a
big website, with 1000s of pages.

> For do it yourself, loghetti
> might be a good starting point
> http://code.google.com/p/loghetti/

Looks interesting, but again don't I fall foul of the "how can I know
about what, by definition, doesn't feature in a log?" problem?

S.
> Kent
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Init Scripts

2008-07-04 Thread Stephen Nelson-Smith

Hello,

I've been wrestling with some badly written init scripts, and picking
my way through the redhat init script system.  I'm getting to the
point of thinking I could do this sort of thing in Python just as
effectively.

Are there any pointers available?  Eg libraries that give process
information, so I can obtain status information?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Short Contract

2008-09-08 Thread Stephen Nelson-Smith

Hello,

I'm looking for someone to help on a short contract to build a
centralised blogging system.  I want a planet-style aggregation of
blogs, but with the ability to see and make comments on each
individual blog, from the central planet page.

Ideally, it would also have a little 'icon' mug-shot of the person who
writes each blog next to their entry, and a dynamically generated 'go
to this guy's blog' button under each entry.

I'd be happy to take an existing open-source tool (eg venus) and
modify it for purpose - I think that's a better idea than writing a
new blog engine.

The blogs don't exist yet, so we can ensure they all live on the same
blog server.

If this is something you have bandwidth for, or an interest in, please
contact me off list and we can discuss  it.

Thanks,

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Capturing and parsing over telnet

2008-11-30 Thread Stephen Nelson-Smith

I want to write a program that connects to a TCP port using telnet,
and issues commands, parsing the output the command provides, and then
issuing another command.

This might look like this:

$ telnet water.fieldphone.net 7456
Welcome to water, enter your username
>_ sheep
Enter your password
>_ sheep123
>_ examine here
[some info to parse]
[.]
[.]
>_ some command based on parsing the previous screen
[more info to parse]
[.]
[.]

I am confident I can parse the info, if I can read it in.

I am not sure how to handle the telnet I/O

How should I proceed?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Capturing and parsing over telnet

2008-11-30 Thread Stephen Nelson-Smith

Hi,

> How about pexpect;
> http://www.noah.org/wiki/Pexpect

Ah yes - I've used that before to good effect.

ATM I'm playing with telnetlib.  Is there a way to read everything on
the screen, even if I don't know what it will be?

eg:
c = telnetlib.Telnet("test.lan")
c.read_until("name: ")
c.write("test\n")
c.read_until("word: ")
c.write("test\n")

And now I don't know what I will see - nor is there a prompt which
indicates that the output is complete.

I effectively want something like c.read_everything()

I tried c.read_all() but I don't get anything from that.

Suggestions?

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Capturing and parsing over telnet

2008-11-30 Thread Stephen Nelson-Smith

Hi,

> I effectively want something like c.read_everything()

Looks like read_very_eager() does what I want.

S.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] League Secretary Application

2015-05-30 Thread Stephen Nelson-Smith

Hello,

I'm the league secretary for a table tennis league.  I have to generate a
weekly results report, league table, and player averages, from results
cards which arrive by post or email.

The data is of the form:

Division: 1
Week: 7
Home: Some Team
Away: Different Team
Player A: Fred Bloggs
Player B: Nora Batty
Player X: Jim Smith
Player Y: Edna Jones
A vs X: 3-0
B vs Y: 3-2
A vs Y: 3-0
B vs X: 3-2
Doubles: 3-1

>From this I can calculate the points allocated to teams and produce a table.

I've not done any real python for about 6 years, but figured it'd be fun to
design and write something that would take away the time and error issues
associated with generating this manually.  Sure I could build a
spreadsheet, but this seems more fun.

I'm currently thinking through possible approaches, from parsing results
written in, eg YAML, to a menu-driven system, to a web app.  I'm generally
in favour of the simplest thing that could possibly work, but I am
conscious that there's a lot of room for data entry error and thus
validation, if I just parse a file, or make a CLI.  OTOH I have never ever
written a web app, with forms etc.

There's no time constraint here - this is merely for fun, and to make my
life easier.

Any thoughts?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] League Secretary Application

2015-05-30 Thread Stephen Nelson-Smith

Hullo,

On Sat, May 30, 2015 at 3:49 PM, Laura Creighton  wrote:

>
> 2.  How do you receive your data now?  Do you want to change this,
> perhaps extend the capabilities -- i.e. let people send an sms
> with results to your cell phone?  Or limit the capabilities ("Stop
> phoning me with this stuff!  Use the webpage!)  How you get your
> data is very relevant to the design.
>

I get a physical card, or a photograph of the same.  It'd be possible in
the future to get people to use a website or a phone app, but for now, I
enter the data from the cards, manually.

> 3.  After you have performed your calculation and made a table, what
> do you do with it?  Email it to members?  Publish it in a
> weekly dead-tree newspaper?  Post it to a website?  What you
> want to do with it once you have it is also very relevant to the
> design.
>

ATM I send an email out, and someone else takes that data and publishes it
on a website.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Sorting a list of list

2015-06-05 Thread Stephen Nelson-Smith

As part of my league secretary program (to which thread I shall reply again
shortly), I need to sort a list of lists.  I've worked out that I can use
sorted() and operator.itemgetter to sort by a value at a known position in
each list.  Is it possible to do this at a secondary level?  So if the
items are the same, we use the secondary key?

Current function:

>>> def sort_table(table, col=0):
... return sorted(table, key=operator.itemgetter(col), reverse=True)
...
>>> sort_table(results, 6)
[['spip', 2, 2, 0, 10, 0, 4], ['hpip', 2, 0, 2, 2, 8, 0]]

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

[Tutor] Ideas for Child's Project

2015-01-06 Thread Stephen Nelson-Smith

Hello,

My son is interested in programming, and has dabbled in Scratch and done a
tiny bit of Python at school.  He's 11 and is going for an entrance exam
for a selective school in a couple of weeks.  They've asked him to bring
along something to demonstrate an interest, and present it to them.

In talking about it, we hit upon the idea that he might like to embark upon
a prorgamming challenge, or learning objective / project, spending say 30
mins a day for the next week or two, so he can show what he's done and talk
about what he learned.

Any suggestions for accessible yet challenging and stimulating projects?

Any recommendations for books / websites / tutorials that are worth a look?

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Ideas for Child's Project

2015-01-07 Thread Stephen Nelson-Smith

Hi Danny,

On Tue, Jan 6, 2015 at 10:07 PM, Danny Yoo  wrote:

> On Tue, Jan 6, 2015 at 1:46 PM, Stephen Nelson-Smith 
> wrote:
>
> You might want to look at Bootstrapworld, a curriculum for
> middle-school/high-school math using programming and games:
>
> http://www.bootstrapworld.org/
>
> Students who go through the material learn how math can be used
> productively toward writing a video game.  Along the way, they learn
> the idea of function, of considering inputs and outputs, and how to
> test what they've designed.
>

Sounds ace.  I had a look.  It seems optimised for Racket, which is also
cool so I installed Racket on my son's computer and let him have a
play.  He immediately got into it, and got the hang of functions and
expressions etc.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Debugging a sort error.

2019-01-13 Thread Stephen Nelson-Smith

Hi,

On Sun, Jan 13, 2019 at 8:34 AM  wrote:

> description.sort()
> TypeError: unorderable types: float() < str()

So, fairly obviously, we can't test whether a float is less than a
string.  Any more than we can tell if a grapefruit is faster than a
cheetah.  So there must be items in description that are strings and
floats.

With 2000 lines, you're going to struggle to eyeball this, so try
something like this:

In [69]: irrational_numbers = [3.14159265, 1.606695, "pi", "Pythagoras
Constant"]
In [70]: from collections import Counter
In [71]: dict(Counter([type(e) for e in irrational_numbers]))
Out[71]: {float: 2, str: 2}

If with your data, this shows only strings, I'll eat my hat.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Python installtion

2019-01-13 Thread Stephen Nelson-Smith

Hi,

On Mon, Jan 7, 2019 at 11:11 AM mousumi sahu
 wrote:
>
> Dear Sir,
> I am trying to install python 2.7.10 on HPC. Python 2.6 has already been
> install on root. I do not have root authority. Please suggest me how can I
> do this.

Sorry - I replied to you directly, by accident.  Take 2, with reply all:

You need to do a local installation of Python, and set up your system
to use that in preference to the one at the system level.  Although
it's possible to do this with various manual steps, there's a really
handy tool you can use which will make your life easier, and allow you
to manage multiple versions of Python, which might be useful, if you
wanted, say, to be able to run both Python 2 and Python 3.  The tool
is called `pyenv`, and as long as you have a bash/zsh shell, and your
system has a C compiler and associated tools already installed, you
can install and use it.

The simplest approach is to clone the tool it from git, modify your
shell to use it, and then use it to install Python.  Here's a sample
way to set it up.  This won't necessarily match your exact
requirements, but you can try it, and please come back if you have any
further questions:

1. Clone the git repo into your home directory

git clone https://github.com/pyenv/pyenv.git ~/.pyenv

Pyenv is very simple, conceptually.  It's just a set of shell scripts
to automate the process of fetching, compiling, and installing
versions of Python, and then massaging your shell to make sure the
versions you have installed are used in preference to anything else.
So now you have the tool, you need to configure your shell to use it.
I'm going to assume you're using Bash.

2. Make sure the contents of the pyenv tool is available on your path

echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile
echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile

Note - this might need to be .bashrc, or something else, depending on
your os/distro/setup.  However, in principle you're just making the
pyenv tool (which itself is just a set of shell scripts) available at
all times.

3. Set your shell to initialise the pyenv tool every time you start a new shell

echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n  eval "$(pyenv
init -)"\nfi' >> ~/.bash_profile

Again: this might need to be .bashrc

4. Now open a new shell, and check you have pyenv available:

$ pyenv
pyenv 1.2.9-2-g6309aaf2
Usage: pyenv  []

Some useful pyenv commands are:
   commandsList all available pyenv commands
   local   Set or show the local application-specific Python version
   global  Set or show the global Python version
   shell   Set or show the shell-specific Python version
   install Install a Python version using python-build
   uninstall   Uninstall a specific Python version
   rehash  Rehash pyenv shims (run this after installing executables)
   version Show the current Python version and its origin
   versionsList all Python versions available to pyenv
   which   Display the full path to an executable
   whence  List all Python versions that contain the given executable

See `pyenv help ' for information on a specific command.
For full documentation, see: https://github.com/pyenv/pyenv#readme

If you don't have pyenv working at this stage, come back and I'll help
you troubleshoot.  Assuming you do, continue:

5. Now you can install a version of Python, locally :

pyenv install --list

This shows you the various options of Pythons you can install.  You
want the latest 2.7:

pyenv install 2.7.15

This will fetch the source code of Python, and compile and install it
for you, and place it in your local shell environment, where you can
use it.

If this step doesn't work, it's probably because your system doesn't
have a compiler and associated tools.  I can help you troubleshoot
that, but ultimately you'll need support from your system
administrator at this point.

Assuming it's install Python, now you just need to tell your shell
that you want to use it:

pyenv local 2.7.15

This will make your shell find your 2.7.15 installation ahead of the
system python:

$ python --version
Python 2.7.15

Now you can run and use your Python.

Any further questions, sing out.

S.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

80 matches

Mail list logo