[Tutor] Making Regular Expressions readable
Hi, I've written this today: #!/usr/bin/env python import re pattern = r'(?P^(-|[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(, [0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})*){1}) (?P(\S*)) (?P(\S*)) (?P(\[[^\]]+\])) (?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?) (?P(\S*)) (?P(\S*)) (?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?) (?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)( )?(?P(\"([^"\\]*(?:\\.[^"\\]*)*)\")?)' regex = re.compile(pattern) lines = 0 no_cookies = 0 for line in open('/home/stephen/scratch/feb-100.txt'): lines +=1 line = line.strip() match = regex.match(line) if match: data = match.groupdict() if data['SiteIntelligenceCookie'] == '': no_cookies +=1 else: print "Couldn't match ", line print "I analysed %s lines." % (lines,) print "There were %s lines with missing Site Intelligence cookies." % (no_cookies,) It works fine, but it looks pretty unreadable and unmaintainable to anyone who hasn't spent all day writing regular expressions. I remember reading about verbose regular expressions. Would these help? How could I make the above more maintainable? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Not Storing State
Hi, This is both a general question and a specific one. I want to iterate over a bunch of lines; If any line contains a certain string, I want to do something, otherwise do something else. I can store state - eg line 1 - did it contain the string? no.. ok we're cool, next line But, I'd like to avoid keeping state. How can I do this? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] CSS Minification
Is there a Python CSS and/or javascript minifier available? I've got to convert some ant scripts to python, and ant has a minifier plugin that I need to replicate. Maybe Beautiful Soup can do this? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Logfile Manipulation
I've got a large amount of data in the form of 3 apache and 3 varnish logfiles from 3 different machines. They are rotated at 0400. The logfiles are pretty big - maybe 6G per server, uncompressed. I've got to produce a combined logfile for -2359 for a given day, with a bit of filtering (removing lines based on text match, bit of substitution). I've inherited a nasty shell script that does this but it is very slow and not clean to read or understand. I'd like to reimplement this in python. Initial questions: * How does Python compare in performance to shell, awk etc in a big pipeline? The shell script kills the CPU * What's the best way to extract the data for a given time, eg - 2359 yesterday? Any advice or experiences? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
On Mon, Nov 9, 2009 at 8:47 AM, Alan Gauld wrote: > I'm not familiar with Apache log files so I'll let somebody else answer, > but I suspect you can either use string.split() or a re.findall(). You might > even be able to use csv. Or if they are in XML you could use ElementTree. > It all depends on the data! An apache logfile entry looks like this: 89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 HTTP/1.1" 200 50 "-" "-" I want to extract 24 hrs of data based timestamps like this: [04/Nov/2009:04:02:10 +] I also need to do some filtering (eg I actually don't want anything with service.php), and I also have to do some substitutions - that's trivial other than not knowing the optimum place to do it? IE should I do multiple passes? Or should I try to do all the work at once, only viewing each line once? Also what about reading from compressed files? The data comes in as 6 gzipped logfiles which expand to 6G in total. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
Sorry - forgot to include the list. On Mon, Nov 9, 2009 at 9:33 AM, Stephen Nelson-Smith wrote: > On Mon, Nov 9, 2009 at 9:10 AM, ALAN GAULD wrote: >> >>> An apache logfile entry looks like this: >>> >>>89.151.119.196 - - [04/Nov/2009:04:02:10 +] "GET >>> /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 >>> HTTP/1.1" 200 50 "-" "-" >>> >>>I want to extract 24 hrs of data based timestamps like this: >>> >>> [04/Nov/2009:04:02:10 +] >> >> OK It looks like you could use a regex to extract the first >> thing you find between square brackets. Then convert that to a time. > > I'm currently thinking I can just use a string comparison after the > first entry for the day - that saves date arithmetic. > >> I'd opt for doing it all in one pass. With such large files you really >> want to minimise the amount of time spent reading the file. >> Plus with such large files you will need/want to process them >> line by line anyway rather than reading the whole thing into memory. > > How do I handle concurrency? I have 6 log files which I need to turn > into one time-sequenced log. > > I guess I need to switch between each log depending on whether the > next entry is the next chronological entry between all six. Then on a > per line basis I can also reject it if it matches the stuff I want to > throw out, and substitute it if I need to, then write out to the new > file. > > S. > -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
Hi, >> Any advice or experiences? >> > > go here and download the pdf! > http://www.dabeaz.com/generators-uk/ > Someone posted this the other day, and I went and read through it and played > around a bit and it's exactly what you're looking for - plus it has one vs. > slide of python vs. awk. > I think you'll find the pdf highly useful and right on. Looks like generators are a really good fit. My biggest question really is how to multiplex. I have 6 logs per day, so I don't know how which one will have the next consecutive entry. I love teh idea of making a big dictionary, but with 6G of data, that's going to run me out of memory, isn't it S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
Hi, > If you create iterators from the files that yield (timestamp, entry) > pairs, you can merge the iterators using one of these recipes: > http://code.activestate.com/recipes/491285/ > http://code.activestate.com/recipes/535160/ Could you show me how I might do that? So far I'm at the stage of being able to produce loglines: #! /usr/bin/env python import gzip class LogFile: def __init__(self, filename, date): self.f=gzip.open(filename,"r") for logline in self.f: self.line=logline self.stamp=" ".join(self.line.split()[3:5]) if self.stamp.startswith(date): break def getline(self): ret=self.line self.line=self.f.readline() self.stamp=" ".join(self.line.split()[3:5]) return ret logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")] while True: print [x.stamp for x in logs] nextline=min((x.stamp,x) for x in logs) print nextline[1].getline() -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
And the problem I have with the below is that I've discovered that the input logfiles aren't strictly ordered - ie there is variance by a second or so in some of the entries. I can sort the biggest logfile (800M) using unix sort in about 1.5 mins on my workstation. That's not really fast enough, with potentially 12 other files Hrm... S. On Mon, Nov 9, 2009 at 1:35 PM, Stephen Nelson-Smith wrote: > Hi, > >> If you create iterators from the files that yield (timestamp, entry) >> pairs, you can merge the iterators using one of these recipes: >> http://code.activestate.com/recipes/491285/ >> http://code.activestate.com/recipes/535160/ > > Could you show me how I might do that? > > So far I'm at the stage of being able to produce loglines: > > #! /usr/bin/env python > import gzip > class LogFile: > def __init__(self, filename, date): > self.f=gzip.open(filename,"r") > for logline in self.f: > self.line=logline > self.stamp=" ".join(self.line.split()[3:5]) > if self.stamp.startswith(date): > break > > def getline(self): > ret=self.line > self.line=self.f.readline() > self.stamp=" ".join(self.line.split()[3:5]) > return ret > > logs=[LogFile("a/access_log-20091105.gz","[05/Nov/2009"),LogFile("b/access_log-20091105.gz","[05/Nov/2009"),LogFile("c/access_log-20091105.gz","[05/Nov/2009")] > while True: > print [x.stamp for x in logs] > nextline=min((x.stamp,x) for x in logs) > print nextline[1].getline() > > > -- > Stephen Nelson-Smith > Technical Director > Atalanta Systems Ltd > www.atalanta-systems.com > -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile Manipulation
On Mon, Nov 9, 2009 at 3:15 PM, Wayne Werner wrote: > On Mon, Nov 9, 2009 at 7:46 AM, Stephen Nelson-Smith > wrote: >> >> And the problem I have with the below is that I've discovered that the >> input logfiles aren't strictly ordered - ie there is variance by a >> second or so in some of the entries. > > Within a given set of 10 lines, is the first line and last line "in order" - On average, in a sequence of 10 log lines, one will be out by one or two seconds. Here's a random slice: 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:36 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:36 05/Nov/2009:01:41:37 05/Nov/2009:01:41:37 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:37 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:37 05/Nov/2009:01:41:38 05/Nov/2009:01:41:36 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:38 05/Nov/2009:01:41:39 05/Nov/2009:01:41:38 05/Nov/2009:01:41:39 05/Nov/2009:01:41:39 05/Nov/2009:01:41:39 05/Nov/2009:01:41:39 05/Nov/2009:01:41:40 05/Nov/2009:01:41:40 05/Nov/2009:01:41:41 > I don't know > what the default python sorting algorithm is on a list, but AFAIK you'd be > looking at a constant O(log 10) I'm not a mathematician - what does this mean, in layperson's terms? > log_generator = (d for d in logdata) > mylist = # first ten values OK > while True: > try: > mylist.sort() OK - sort the first 10 values. > nextdata = mylist.pop(0) So the first value... > mylist.append(log_generator.next()) Right, this will add another one value? > except StopIteration: > print 'done' > Or now that I look, python has a priority queue ( > http://docs.python.org/library/heapq.html ) that you could use instead. Just > push the next value into the queue and pop one out - you give it some > initial qty - 10 or so, and then it will always give you the smallest value. That sounds very cool - and I see that one of the activestate recipes Kent suggested uses heapq too. I'll have a play. S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Logfile multiplexing
I have the following idea for multiplexing logfiles (ultimately into heapq): import gzip class LogFile: def __init__(self, filename, date): self.logfile = gzip.open(filename, 'r') for logline in self.logfile: self.line = logline self.stamp = self.timestamp(self.line) if self.stamp.startswith(date): break def timestamp(self, line): return " ".join(self.line.split()[3:5]) def getline(self): nextline = self.line self.line = self.logfile.readline() self.stamp = self.timestamp(self.line) return nextline The idea is that I can then do: logs = [("log1", "[Nov/05/2009"), ("log2", "[Nov/05/2009"), ("log3", "[Nov/05/2009"), ("log4", "[Nov/05/2009")] I've tested it with one log (15M compressed, 211M uncompressed), and it takes about 20 seconds to be ready to roll. However, then I get unexpected behaviour: ~/system/tools/magpie $ python Python 2.4.3 (#1, Jan 21 2009, 01:11:33) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import magpie >>>magpie.l >>> magpie.l.stamp '[05/Nov/2009:04:02:07 +]' >>> magpie.l.getline() 89.151.119.195 - - [05/Nov/2009:04:02:07 +] "GET /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 HTTP/1.1" 200 50 "-" "-" '89.151.119.195 - - [05/Nov/2009:04:02:07 +] "GET /service.php?s=nav&arg[]=&arg[]=home&q=ubercrumb/node%2F20812 HTTP/1.1" 200 50 "-" "-"\n' >>> magpie.l.stamp '' >>> magpie.l.getline() '' >>> I expected to be able to call getline() and get more lines... a) What have I done wrong? b) Is this an ok implementation? What improvements could be made? c) Is 20secs a reasonable time, or am I choosing a slow way to do this? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
Hi Kent, > One error is that the initial line will be the same as the first > response from getline(). So you should call getline() before trying to > access a line. Also you may need to filter all lines - what if there > is jitter at midnight, or the log rolls over before the end. Well ultimately I definitely have to filter two logfiles per day, as logs rotate at 0400. Or do you mean something else? > More important, though, you are pretty much writing your own iterator > without using the iterator protocol. I would write this as: > > class LogFile: > def __init__(self, filename, date): > self.logfile = gzip.open(filename, 'r') > self.date = date > > def __iter__(self) > for logline in self.logfile: > stamp = self.timestamp(logline) > if stamp.startswith(date): > yield (stamp, logline) > > def timestamp(self, line): > return " ".join(self.line.split()[3:5]) Right - I think I understand that. >From here I get: import gzip class LogFile: def __init__(self, filename, date): self.logfile = gzip.open(filename, 'r') self.date = date def __iter__(self): for logline in self.logfile: stamp = self.timestamp(logline) if stamp.startswith(date): yield (stamp, logline) def timestamp(self, line): return " ".join(self.line.split()[3:5]) l = LogFile("/home/stephen/access_log-20091105.gz", "[04/Nov/2009") I get: Python 2.4.3 (#1, Jan 21 2009, 01:11:33) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import kent >>> kent.l >>> dir(kent.l) ['__doc__', '__init__', '__iter__', '__module__', 'date', 'logfile', 'timestamp'] >>> for line in kent.l: ... print line ... Traceback (most recent call last): File "", line 1, in ? File "kent.py", line 10, in __iter__ stamp = self.timestamp(logline) File "kent.py", line 15, in timestamp return " ".join(self.line.split()[3:5]) AttributeError: LogFile instance has no attribute 'line' >>> for stamp,line in kent.l: ... print stamp,line ... Traceback (most recent call last): File "", line 1, in ? File "kent.py", line 10, in __iter__ stamp = self.timestamp(logline) File "kent.py", line 15, in timestamp return " ".join(self.line.split()[3:5]) AttributeError: LogFile instance has no attribute 'line' >>> for stamp,logline in kent.l: ... print stamp,logline ... Traceback (most recent call last): File "", line 1, in ? File "kent.py", line 10, in __iter__ stamp = self.timestamp(logline) File "kent.py", line 15, in timestamp return " ".join(self.line.split()[3:5]) AttributeError: LogFile instance has no attribute 'line' > You are reading through the entire file on load because your timestamp > check is failing. You are filtering out the whole file and returning > just the last line. Check the dates you are supplying vs the actual > data - they don't match. Yes, I found that out in the end! Thanks! S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
Hi, > probably that line should have been " ".join(line.split()[3:5]), i.e. > no self. The line variable is a supplied argument. Now I get: Python 2.4.3 (#1, Jan 21 2009, 01:11:33) [GCC 4.1.2 20071124 (Red Hat 4.1.2-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import kent >>> kent.l >>> for a, b in kent.l File "", line 1 for a, b in kent.l ^ SyntaxError: invalid syntax >>> for a, b in kent.l: ... print a, b ... Traceback (most recent call last): File "", line 1, in ? File "kent.py", line 11, in __iter__ if stamp.startswith(date): NameError: global name 'date' is not defined How does __iter__ know about date? Should that be self.date? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
Hello, On Tue, Nov 10, 2009 at 2:00 PM, Luke Paireepinart wrote: > >> Traceback (most recent call last): >> File "", line 1, in ? >> File "kent.py", line 11, in __iter__ >> if stamp.startswith(date): >> NameError: global name 'date' is not defined >> >> How does __iter__ know about date? Should that be self.date? > > Yes. self.date is set in the constructor. OK, so now i've given it the full load of logs: >>> for time, entry in kent.logs: ... print time, entry ... Traceback (most recent call last): File "", line 1, in ? ValueError: too many values to unpack How do I get around this?! S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
On Tue, Nov 10, 2009 at 3:48 PM, Stephen Nelson-Smith wrote: > OK, so now i've given it the full load of logs: > >>>> for time, entry in kent.logs: > ... print time, entry > ... > Traceback (most recent call last): > File "", line 1, in ? > ValueError: too many values to unpack > > How do I get around this?! Erm, and now it's failing with only one logfile... Code here: http://pastebin.ca/1665013 S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
On Tue, Nov 10, 2009 at 3:59 PM, Stephen Nelson-Smith wrote: > On Tue, Nov 10, 2009 at 3:48 PM, Stephen Nelson-Smith > wrote: > >> OK, so now i've given it the full load of logs: >> >>>>> for time, entry in kent.logs: >> ... print time, entry >> ... >> Traceback (most recent call last): >> File "", line 1, in ? >> ValueError: too many values to unpack >> >> How do I get around this?! > > Erm, and now it's failing with only one logfile... > > Code here: > > http://pastebin.ca/1665013 OK - me being dumb. So what I want to do is be able to multiplex the files - ie read the next line of all 12 files at once, filter them accordingly, and then write them out to one combined file. My old code did this; min((x.stamp, x) for x in logs) What's the best way to do this now I'm using an iterable LogFile class? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
Hi Kent, > See the Python Cookbook recipes I referenced earlier. > http://code.activestate.com/recipes/491285/ > http://code.activestate.com/recipes/535160/ > > Note they won't fix up the jumbled ordering of your files but I don't > think they will break from it either... That's exactly the problem. I do need the end product to be in order. The problem is that on my current design I'm still getting stuff out of sync. What I do at present is this: Each of these columns is a log file (logfile A, B C D), with a number of entries, slightly out of order. 1 1 1 1 2 2 2 2 3 3 3 3 A B C D ... I currently take a slice through all (12) logs, and stick them in a priority queue, and pop them off in order. The problem comes that the next slice could easily contain timestamps before the entries in the previous slice. So I either need some kind of lookahead capability, or I need to be feeding the queue one at a time, and hope the queue is of sufficient size to cover the delta between the various logs. It all feels a bit brittle and wrong. I don't really want to admit defeat and have a cron job sort the logs before entry. Anyone got any other ideas? Thanks all - I'm really learning a lot. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Logfile multiplexing
Hi, On Wed, Nov 11, 2009 at 10:05 AM, Alan Gauld wrote: > "Stephen Nelson-Smith" wrote >> >> I don't really want to admit defeat and have a cron job sort the logs >> before entry. Anyone got any other ideas? > > Why would that be admitting defeat? Well, it mean admitting defeat on solving the problem in python. Yes in practical terms, I should probably preprocess the data, but as a programming exercise, learning how to sort a number of files into one is something I'd like to crack. Maybe the real lesson here is knowing which battles to fight, and a good developer uses the right tools for the job. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Iterator Merging
So, following Kent and Alan's advice, I've preprocessed my data, and have code that produces 6 LogFile iterator objects: >>> import magpie >>> magpie.logs[1] >>> dir(magpie.logs[1]) ['__doc__', '__init__', '__iter__', '__module__', 'date', 'logfile', 'timestamp'] >>> for timestamp, entry in itertools.islice(magpie.logs[1], 3): ... print timestamp, entry ... [05/Nov/2009:04:02:13 +] 192.168.41.107 - - [05/Nov/2009:04:02:13 +] "GET http://sekrit.com/taxonomy/term/27908?page=111&item_884=1&year=66&form_id=dynamic_build_learning_objectives_form&level=121 HTTP/1.1" 200 - "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" [05/Nov/2009:04:02:13 +] 66.249.165.22 - - [05/Nov/2009:04:02:13 +] "GET /taxonomy/term/27908?page=111&item_884=1&year=66&form_id=objectives_form&level=121 HTTP/1.1" 200 28736 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" [05/Nov/2009:04:02:15 +] 162.172.185.126 - - [05/Nov/2009:04:02:15 +] "GET http://sekrit.com/sites/all/themes/liszt/images/backgrounds/grad_nav_5_h3.gif HTTP/1.1" 304 0 "-" "Mozilla/4.0 (compatible;)" This is great. So I have a list of 6 of these iterator objects. Kent mentioned feeding them into an iterator merger. I've got the iterator merger in place too: >>> from imerge import imerge >>> imerge >>> imerge([1,3,4],[2,7]) >>> list(imerge([1,3,4],[2,7])) [1, 2, 3, 4, 7] What I'm trying to work out is how to feed the data I have - 6 streams of timestamp, entry into imerge. How can I do this? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Iterable Understanding
I think I'm having a major understanding failure. So having discovered that my Unix sort breaks on the last day of the month, I've gone ahead and implemented a per log search, using heapq. I've tested it with various data, and it produces a sorted logfile, per log. So in essence this: logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] Gives me a list of LogFiles - each of which has a getline() method, which returns a tuple. I thought I could merge iterables using Kent's recipe, or just with heapq.merge() But how do I get from a method that can produce a tuple, to some mergable iterables? for log in logs: l = log.getline() print l This gives me three loglines. How do I get more? Other than while True: Of course tuples are iterables, but that doesn't help, as I want to sort on timestamp... so a list of tuples would be ok But how do I construct that, bearing in mind I am trying not to use up too much memory? I think there's a piece of the jigsaw I just don't get. Please help! The code in full is here: import gzip, heapq, re class LogFile: def __init__(self, filename, date): self.logfile = gzip.open(filename, 'r') for logline in self.logfile: self.line = logline self.stamp = self.timestamp(self.line) if self.stamp.startswith(date): break self.initialise_heap() def timestamp(self, line): stamp = re.search(r'\[(.*?)\]', line).group(1) return stamp def initialise_heap(self): initlist=[] self.heap=[] for x in xrange(10): self.line=self.logfile.readline() self.stamp=self.timestamp(self.line) initlist.append((self.stamp,self.line)) heapq.heapify(initlist) self.heap=initlist def getline(self): self.line=self.logfile.readline() stamp=self.timestamp(self.line) heapq.heappush(self.heap, (stamp, self.line)) pop = heapq.heappop(self.heap) return pop logs = [ LogFile( "/home/stephen/qa/ded1353/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1408/quick_log.gz", "04/Nov/2009" ), LogFile( "/home/stephen/qa/ded1409/quick_log.gz", "04/Nov/2009" ) ] ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Hi, >> for log in logs: >> l = log.getline() >> print l >> >> This gives me three loglines. How do I get more? Other than while True: >> > I presume that what you want is to get all lines from each log. Well... what I want to do is create a single, sorted list by merging a number of other sorted lists. > for log in logs: > for line in log.getlines(): > print l This gives me three lines. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Gah! Failed to reply to all again! On Sat, Nov 14, 2009 at 1:43 PM, Stephen Nelson-Smith wrote: > Hi, >> I'm not 100% sure to understand your needs and intention; just have a try. >> Maybe what you want actually is rather: >> >> for log in logs: >> for line in log: >> print l > > Assuming you meant print line. This also gives me just three lines. > >> Meaning your log objects need be iterable. To do this, you must have an >> __iter__ method that would surely simply return the object's getline (or >> maybe replace it alltogether). > > I'm not sure I fully understand how that works, but yes, I created an > __iter__ method: > > def __iter__(self): > self.line=self.logfile.readline() > stamp=self.timestamp(self.line) > heapq.heappush(self.heap, (stamp, self.line)) > pop = heapq.heappop(self.heap) > yield pop > > But I still don't see how I can iterate over it... I must be missing > something. > > I thought that perhaps I could make a generator function: > > singly = ((x.stamp, x.line) for x in logs) > for l in singly: > print > > But this doesn't seem to help either. > > >> Then when walking the log with for...in, python will silently call getline >> until error. This means getline must raise StopIteration when the log is >> "empty" and __iter__ must "reset" it. > > Yes, but for how long? Having added the __iter__ method, if I now do: > > for log in logs: > for line in log: > print line > > I still get only three results. > >> Another solution may be to subtype "file", for a file is precisely an >> iterator over lines; and you really get your data from a file. > > I'm very sorry - I'm not sure I understand. I get that a file is > iterable by definition, but I'm not sure how subtyping it helps. > >> Simply (sic), there must some job done about this issue of time stamps >> (haven't studied in details). Still, i guess this track may be worth an >> little study. > > Sorry for not understanding :( > >> Once you get logs iterable, you may subtype list for your overall log >> collection and set it an __iter__ method like: >> >> for log in self: >> for line in log: >> yield line >> >> (The trick is not from me.) > > OK - I make the logs iterable by giving them an __iter__ method - I > get that. I just don't know what you mean by 'subtype list'. > >> Then you can write: >> for line in my_log_collection > > That sounds useful > > S. > -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Hi Wayne, > Just write your own merge: > (simplified and probably inefficient and first thing off the top of my head) > newlist = [] > for x, y, z in zip(list1, list2, list3): I think I need something like izip_longest don't I, since the list wil be of varied length? Also, where do these lists come from? They can't go in memory - they're much too big. This is why I felt using some kind if generator was the right way - I can produce 3 (or 12) sets of tuples... i just need to work out how to merge them. > if y > x < z: > newlist.append(x) > elif x > y < z: > newlist.append(y) > elif x > z < y: > newlist.append(z) > I'm pretty sure that should work although it's untested. Well, no it won't work. The lists are in time order, but they won't match up. One log may have entries at the same numerical position (ie the 10th log entry) but earlier than the entries on the previous lines. To give a simple example: List 1List 2List 3 (1, cat) (2, fish) (1, cabbage) (4, dog) (5, pig) (2, ferret) (5, phone) (6, horse) (3, sausage) Won't this result in the lowest number *per row* being added to the new list? Or am I misunderstanding how it works? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Should a beginner learn Python 3.x
My brother in law is learning python. He's downloaded 3.1 for Windows, and is having a play. It's already confused him that print "hello world" gives a syntax error He's an absolute beginner with no programming experience at all. I think he might be following 'Python Programming for the Absolute Beginner", or perhaps some online guides. Should I advise him to stick with 2.6 for a bit, since most of the material out there will be for 2.x? Or since he's learning from scratch, should he jump straight to 3.x In which case what can you recommend for him to work through - I must stress he has absolutely no clue at all about programming, no education beyond 16 yrs old, but is keen to learn. S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Hi Martin, Thanks for a very detailed response. I'm about to head out, so I can't put your ideas into practice yet, or get down to studying for a while. However, I had one thing I felt I should respond to. > It's unclear from your previous posts (to me at least) -- are the > individual log files already sorted, in chronological order? Sorry if I didn't make this clear. No they're not. They are *nearly* sorted - ie they're out by a few seconds, every so often, but they are in order at the level of minutes, or even in the order of a few seconds. It was precisely because of this that I decided, following Alan's advice, to pre-filter the data. I compiled a unix sort command to do this, and had a solution I was happy with, based on Kent's iterator example, fed into heapq.merge. However, I've since discovered that the unix sort isn't reliable on the last and first day of the month. So, I decided I'd need to sort each logfile first. The code at the start of *this* thread does this - it uses a heapq per logfile and is able to produce a tuple of timestamp, logline, which will be in exact chronological order. What I want to do is merge this output into a file. I think I probably have enough to be getting on with, but I'll be sure to return if I still have questions after studying the links you provided, and trying the various suggestions people have made. Thanks so very much! S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unexpected iterator
> To upack your variables a and b you need an iterable object on the right > side, which returns you exactly 2 variables What does 'unpack' mean? I've seen a few Python errors about packing and unpacking. What does it mean? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Hi Marty, Thanks for a very lucid reply! > Well, you haven't described the unreliable behavior of unix sort so I > can only guess, but I assume you know about the --month-sort (-M) flag? Nope - but I can look it up. The problem I have is that the source logs are rotated at 0400 hrs, so I need two days of logs in order to extract 24 hrs from to 2359 (which is the requirement). At present, I preprocess using sort, which works fine as long as the month doesn't change. > import gzip > from heapq import heappush, heappop, merge Is this a preferred method, rather than just 'import heapq'? > def timestamp(line): > # replace with your own timestamp function > # this appears to work with the sample logs I chose > stamp = ' '.join(line.split(' ', 3)[:-1]) > return time.strptime(stamp, '%b %d %H:%M:%S') I have some logfie entries with multiple IP addresses, so I can't split using whitespace. > class LogFile(object): > def __init__(self, filename, jitter=10): > self.logfile = gzip.open(filename, 'r') > self.heap = [] > self.jitter = jitter > > def __iter__(self): > while True: > for logline in self.logfile: > heappush(self.heap, (timestamp(logline), logline)) > if len(self.heap) >= self.jitter: > break Really nice way to handle the batching of the initial heap - thank you! > try: > yield heappop(self.heap) > except IndexError: > raise StopIteration > > logs = [ > LogFile("/home/stephen/qa/ded1353/quick_log.gz"), > LogFile("/home/stephen/qa/ded1408/quick_log.gz"), > LogFile("/home/stephen/qa/ded1409/quick_log.gz") > ] > > merged_log = merge(*logs) > with open('/tmp/merged_log', 'w') as output: > for stamp, line in merged_log: > output.write(line) Oooh, I've never used 'with' before. In fact I am currently restricted to 2.4 on the machine on whch this will run. That wasn't a problem for heapq.merge, as I was just able to copy the code from the 2.6 source. Or I could use Kent's recipe. > ... which probably won't preserve the order of log entries that have the > same timestamp, but if you need it to -- should be easy to accommodate. I don't think that is necessary, but I'm curious to know how... Now... this is brilliant. What it doesn't do that mine does, is handle date - mine checks for whether it starts with the appropriate date, so we can extract 24 hrs of data. I'll need to try to include that. Also, I need to do some filtering and gsubbing, but I think I'm firmly on the right path now, thanks to you. > HTH, Very much indeed. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] GzipFile has no attribute '__exit__'
I'm trying to write a gzipped file on the fly: merged_log = merge(*logs) with gzip.open('/tmp/merged_log.gz', 'w') as output: for stamp, line in merged_log: output.write(line) But I'm getting: Traceback (most recent call last): File "./magpie.py", line 72, in with gzip.open('/tmp/merged_log.gz', 'w') as output: AttributeError: GzipFile instance has no attribute '__exit__' What am I doing wrong, and how do I put it right? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I love python / you guys :)
Hello all, On Mon, Nov 16, 2009 at 6:58 AM, Stefan Lesicnik wrote: > hi, > > Although not a question, i just want to tell you guys how awesome you are! +1 I've been a happy member of this list for years, even though I've taken a 3 year Ruby sabbatical! I've always found it to be full of invaluable advice, and is, in my opinion, a real gem in Python's crown. It's one of the reasons I feel so confident in recommending python to anyone - they can be guaranteed a friendly and informational welcome on this list. Thank you very much - it is greatly appreciated. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] proxy switcher - was Re: I love python / you guys :)
Hi, >> When i use our company's LAN i set my proxy variable by hand in .bashrc. >> There are 4 files to insert proxy variable: >> >> in ~/.bashrc, /root/.bashrc, /etc/wgetrc and /etc/apt/apt.conf. >> >> The last one is actually rename e.g. mv to apt.conf to activate proxy and mv >> to apt.conf.bak to deactivate. The proxy variable is something like this >> >> export http_proxy=http://username:passw...@proxy:port >> ftp_proxy=$http_proxy >> >> To activate i uncomment them then source .bashrc. To deactivate i put back >> the comment sign. I do it all in vim e.g. vim -o the-3-files-above. For >> apt.conf see rename above. I deactivate because i have another internet >> connection option via 3G usb modem. But thats another story. >> >> I will do this myself in python so please show me the way. Surely this can >> be done. Here's what I knocked up over lunch. It doesn't cover the moving of the file, I don't like that it's deep-nested, and I've not tested it, but I welcome criticism and feedback: files = ['file1', 'file2', 'file3', 'file4'] settings = ['export http_proxy=', 'ftp_proxy='] for file in files: with open(file, 'rw') as file: for line in file: for setting in settings: if setting in line: if line[0] == '#': line = line[1:] else: line = '#' + line output.write(line) S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] proxy switcher - was Re: I love python / you guys :)
Evening, > Yes, you can, but not this way. I'm guessing the op was changing his mind > back and forth, between having two files, one for reading and one for > writing, and trying to do it in place. The code does neither/both. Well, just neither I think! I didn't check if 'rw' was possible. My plan was to read the file and write to the same file as the change was made, to avoid having to use temporary files and os.move. But I wasn't near a machine with python on it, so it was untested. > Take a look at fileinput.FileInput() with the inplace option. It makes it > convenient to update text files "in place" by handling all the temp file > copying and such. It even handles iterating through the list of files. Will definitely look into that. > Good point about 'file' as its a built-in name. If the code ever has to use > the std meaning, you have a problem. Worse, it's unreadable as is. Thanks for that hint. In what way is it unreadable? Because the intent is not clear because of the ambiguity of the name? >> Also, you didn't define 'output' anywhere. Is this an implicit >> declaration No, just a dumb mistake. > Another potential bug with the code is if more than one "setting" could > appear in a line. It would change the line for an odd number, and not for an > even number of matches. Not sure I follow that. From the OPs description, it appeared he would be entering these lines in. I figured it was safe to trust the OP not to put in duplicate data. Maybe defensively I should check for it anyway? > Also, the nesting of output.write() is wrong, because file position isn't > preserved, and random access in a text file isn't a good idea anyway. Could you expand on this? > But > there's not much point in debugging that till the OP decides how he's going > to handle the updates, via new files and copying or renaming, or via > inputfile. I'll look up inputfile, and try again :) S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Do you use unit testing?
Great question, I learned TDD with PyUnit, but since moved to using Ruby, and have been spoilt by rspec and even cucumber. My instincts are to write things test first, but so far I'm not finding PyUnit easy enough to get going. Once I get into the groove, I find it's a wonderful way to work - and avoids some of the drivel I've come up with in the last few days. As a discipline - work out what we want to test, write the test, watch it fail, make it pass - I find this a very productive way to think and work. S. On Mon, Nov 16, 2009 at 8:54 PM, Modulok wrote: > List, > > A general question: > > How many of you guys use unit testing as a development model, or at > all for that matter? > > I just starting messing around with it and it seems painfully slow to > have to write a test for everything you do. Thoughts, experiences, > pros, cons? > > Just looking for input and different angles on the matter, from the > Python community. > -Modulok- > ___ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Use of 'or'
A friend of mine mentioned what he called the 'pythonic' idiom of: print a or b Isn't this a 'clever' kind or ternary - an if / else kind of thing? I don't warm to it... should I? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Readable date arithmetic
I have the following method: def get_log_dates(the_date_we_want_data_for): t = time.strptime(the_date_we_want_data_for, '%Y%m%d') t2 = datetime.datetime(*t[:-2]) extra_day = datetime.timedelta(days=1) t3 = t2 + extra_day next_log_date = t3.strftime('%Y%m%d') return (the_date_we_want_data_for, next_log_date) Quite apart from not much liking the t[123] variables, does date arithmetic really need to be this gnarly? How could I improve the above, especially from a readability perspective? Or is it ok? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Why are these results different?
I'm seeing different behaviour between code that looks to be the same. It obviously isn't the same, so I've misunderstood something: >>> log_names ('access', 'varnish') >>> log_dates ('20091105', '20091106') >>> logs = itertools.chain.from_iterable(glob.glob('%sded*/%s*%s.gz' % >>> (source_dir, log, date)) for log in log_names for date in log_dates) >>> for log in logs: ... print log ... /Volumes/UNTITLED 1/ded1/access_log-20091105.gz /Volumes/UNTITLED 1/ded2/access_log-20091105.gz /Volumes/UNTITLED 1/ded3/access_log-20091105.gz /Volumes/UNTITLED 1/ded1/access_log-20091106.gz /Volumes/UNTITLED 1/ded2/access_log-20091106.gz /Volumes/UNTITLED 1/ded3/access_log-20091106.gz /Volumes/UNTITLED 1/ded1/varnishncsa.log-20091105.gz /Volumes/UNTITLED 1/ded2/varnishncsa.log-20091105.gz /Volumes/UNTITLED 1/ded3/varnishncsa.log-20091105.gz /Volumes/UNTITLED 1/ded1/varnishncsa.log-20091106.gz /Volumes/UNTITLED 1/ded2/varnishncsa.log-20091106.gz /Volumes/UNTITLED 1/ded3/varnishncsa.log-20091106.gz However: for date in log_dates: for log in log_names: logs = itertools.chain.from_iterable(glob.glob('%sded*/%s*%s.gz' % (source_dir, log, date))) Gives me one character at a time when I iterate over logs. Why is this? And how, then, can I make the first more readable? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Replace try: except: finally:
I need to make some code Python 2.4 compliant... the only thing I see is use of try: except: finally: To make this valid, I think I need to do a try: finally: and next try: except: inside. Is this correct? The code has; try: ... ... ... except SystemExit: raise except KeyboardInterrupt: if state.output.status: print >> sys.stderr, "\nStopped." sys.exit(1) except: sys.excepthook(*sys.exc_info()) sys.exit(1) finally: for key in connections.keys(): if state.output.status: print "Disconnecting from %s..." % denormalize(key), connections[key].close() if state.output.status: print "done." How should I replace this? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Iterable Understanding
Martin, > def __iter__(self): > while True: > for logline in self.logfile: > heappush(self.heap, (timestamp(logline), logline)) > if len(self.heap) >= self.jitter: > break > try: > yield heappop(self.heap) > except IndexError: > raise StopIteration In this __iter__ method, why are we wrapping a for loop in a while True? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Monitoring a logfile
Varnish has a dedicated (but not always) reliable logger service. I'd like to monitor the logs - specifically I want to check that a known entry appears in there every minute (it should be there about 10 times a minute). What's going to be the best way to carry out this kind of check? I had a look at SEC, but it looks horrifically complicated. Could someone point me in the right direction? I think I basically want to be able to check the logfile every minute and check that an entry is in there since the last time I met I just can't see the right way to get started. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Trying to send a URL via XMPP
Hi, I'm trying to send a message to a user via XMPP - but I want them to receive a clickable word. I'm using Python 2.4 on RHEL 5.4 and python-xmpp-0.4.1-6 from EPEL. I've tried variations on: >>> jid = xmpp.protocol.JID('motherin...@jabber.sekrit.org.uk') >>> cl = xmpp.Client(jid.getDomain()) >>> cl.connect() >>> cl.auth(jid.getNode(),'notgoodenough') >>> cl.send(xmpp.protocol.Message("snelsonsm...@jabber.sekrit.org.uk", ">> xmlns='http://jabber.org/protocol/xhtml-im'>>> xmlns='http://www.w3.org/1999/xhtml'>>> href='http://rt.sekrit.org.uk/rt3/Ticket/Display.html?id=#77'>Ticket #77 >>> updated.")) But every time I just receive the raw html Any idea what I am doing wrong? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Modules and Test Suites
I do quite a lot of programming in Ruby. When I do so, my code tends to have the following layout: /path/to/src/my_project Inside my_project: lib/ test/ my_project.rb my_project.rb uses classes and helper methods in lib Inside test, I have a test suite that also uses classes and helper methods in ../lib This seems like a sensible way to keep tests and other code separate. In Python I don't know how to do this so I just have all my tests in the same place as the rest of the code. a) Is my above way a sensible and pythonic approach? b) If so - how can I do it in Python? c) If not, is there a better way than having all the tests in the same place as the rest of the code? S. -- Stephen Nelson-Smith Technical Director Atalanta Systems Ltd www.atalanta-systems.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] [OT] Urgent Help Needed
Hello friends, I urgently need to get hold of someone who can help me with the closing stages of a database project - porting data from an old system to a completely rewritten schema. My lead developer has suffered a bereavement, and I need a SQL expert, or programmer who could accomplish the porting. I've budgeted a week to get the task done, so need someone who could join my team at this very short notice on a week's contract. If you know anyone, or feel you fit the bill, let me know off list. I'm based in North London. Thanks, and sorry for taking advantage of the list - hope you al understand. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] [OT] Urgent Help Needed
On 6/6/07, Tim Golden <[EMAIL PROTECTED]> wrote: > You might want to mention the database (or databases) in > question. Given the short timeframes, people'd feel more > confident if it was the system they're familiar with. Sorry yes. We have an old (primitive) accounts system, which is basically one big table, effectively a log of purchases. This is in MySQL 4. We have a new model, which abstracts out into half a dozen tables representing different entities. This is going to be in MySQL 5. What we're trying to do is extract identities from the transaction table, accounting for things like name changes, company changes. We've been doing it with SQL statements, and I have some code snippets I can show. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation
Hello friends, Over lunch today some colleagues discussed a question they are using as a conversation starter in some preliminary chats in our developer hiring process. The question was: "Place the following three in order: Inheritance, Polymorphism, Encapsulation." They specifically did not define in *what* order, leaving that for the answerer to decide. I responded thus: Encapsulation comes with OO - you get it for free. Polymorphism is a hugely important enabler, but this in itself is enabled by Inheritance, so I put them in this order: Inheritance, Polymorphism, Encapsulation. My colleagues felt that of the three options this was the least satisfactory, and showed a lack of understanding of OO design. One even suggested that one could have polymorphism without inheritance. I'm curious as to your opinions - answer to the question, responses to my answer, and to the feedback from my colleagues. Thanks! S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation
Michael Langford wrote: > Inheritance: Syntactic sugar that's not really needed to make a well > organized system. Often overused, especially by programmers in big > companies, beginning students of programmers, green engineers, and > professors. In practice hides a lot of data, often making behavior > surprising, therefore harder to maintain. Can be used in limited situations > to great advantage, but like cologne on car salesmen, is used in greater > amounts than it should be. One should always ask, can I make a simpler > system with composition. Pretty much exactly what my colleague said. Having thought about it I understand this. > Polymorphism: The process of creating many classes which a single interface > which are then all used by an object that doesn't know or need to know the > type. Many people think you only get this by using inheritance and therefore > use inheritance many places a simpler, less opaque, more lightweight > solution will work. Most dynamically typed languages (most notably, python, > ruby and smalltalk) don't even require you specify the interface explicitly > to get polymorphic behavior. C++ templates can do non-explicit interface > polymorphism, however in a more complicated, blindingly fast to run, > blindingly slow to compile way. Also what my colleague said! This is the bit I had missed. Perhaps I need to rethink / reload my understanding of polymorphism. > Encapsulation: The process of taking what shouldn't matter to the external > world, and locking it behind an interface. This principle works best when > put into small, specialized libraries and designed for general use, as this > is the only encapsulated form that is shown to last over time. Supposedly > something OO based design allows, but in reality, the coupling among classes > varies in differing amounts. The module/public visibility of Java is a good > compromise with classes that hides some data but share some with certain > other classes. C++ has large issues for historical reasons on this front, as > the implementation section of a class is largely revealed through the class > definition. Interesting. And again revealing of a weakness in my understanding. I do think this is a good question for getting a sense of where a person's understanding is. I wonder how much this understanding is a pre-requistite for being a good developer... not too much I hope! > --Michael S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] [Slightly OT] Inheritance, Polymorphism and Encapsulation
On 9/19/07, Michael Langford <[EMAIL PROTECTED]> wrote: > I do think this is a good question for getting a sense of where a > person's understanding is. I wonder how much this understanding is a > pre-requistite for being a good developer... not too much I hope! > > A good developer is a very loaded term. :o) > > There are a lot of good programmers who are bad developers. A lot of being a > good developer is getting things done that work well enough to supply > whomever is downstream from you at work. Agreed - we discussed this at work yesterday when designing the interview tasks for our new recruits. On this point, I feel I score highly. > It also has to do with several things such as source control > tools, ticket tracking tools, testing, debugging, and deployment as well. Right - which is the sort of stuff a fresh-faced university student will take time to get up to speed with, > You will find yourself making more informed choices on things and debugging > things a lot better the more of this technical programming stuff you pick > up. A good compiler/interpreter book and a lot of experimentation will > really open your eyes on a lot of this, as will just doing your job while > constantly reading reading reading about what you are learning and doing > things like talk with your co-workers at lunch about hiring tests, and > participating in online discussion boards. :o) Yes - this is exactly what I feel, and try to do. I'm lucky that I work with some brilliant people. I wasn't hired as a developer - I'm a sysadmin who can also program. I've been seconded to the development team, and am loving it, but quickly realising how much I don't know! > In addition you should try doing really hard things that break or come > really close to breaking the tools you use. But not too hard, you have to be > able to *do* them after all. Learning how VM's and interpreters and > compilers work, then understanding the concepts behind why they work that > way really helps give you a framework to hang an understanding of what's > going on in a program. One way to get this knowledge is to start far in the > past and work forward. Another is to dive deep into something from today. Inspiring advice! :) > I will say if you're already in the workforce and not headed back out to > school, the onus will be on you to pick up more of this. Much of the > conceptual stuff you won't hear at work, except occasionally overheard in a > discussion assuming you already know it. You'll have to ask questions, and > you'll have to read up on it afterwards, because often your co-workers won't > have the whole picture either. Right. And I must spend more time here again. Since leaving my last job most of my programming has been in Ruby (and now PHP and SQL). I sort of fear that my head won't be able to hold it all at once, so I've neglected my Python studies! S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] uncomprehension on RE
On 9/19/07, cedric briner <[EMAIL PROTECTED]> wrote: > Hello, > > I do not understand the behaviour of this: > > import re > re.search('(a)*','aaa').groups() > ('a',) > > I was thinking that the ``*'' will operate on the group delimited by the > parenthesis. And so, I was expecting this result: > ('a', 'a', 'a') > > Is there something I'am missing ? What you are trying to do, I think, is get the * to expand to the number of times you expect your group to appear. You cannot do this. You need to specify as many groups as you want to get returned: re.search('(x)(x)(x)', 'xxx').groups() would work. In your case you have a single group that matches several times. Python simply returns one match. Consider this: >>> re.search('(.)*', 'abc').groups() ('c',) Can you see how that happens? You could do re.findall('x', 'xxx') - but I don't know what you are actually trying to do. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Finding even and odd numbers
On 9/19/07, Boykie Mackay <[EMAIL PROTECTED]> wrote: > Hi guys, > > I have come across a bit of code to find if a group of numbers is odd or > even.The code snippet is shown below: > > if not n&1: > return false > > The above should return false for all even numbers,numbers being > represented by n.I have tried to wrap my head around the 'not n&1' but > I'm failing to understand what's going on.Could someone please explain > the statement. It's just a bitwise and, which will always return 1 for an odd number. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] uncomprehension on RE
On 9/20/07, cedric briner <[EMAIL PROTECTED]> wrote: > To let you know, I'm writing a script to generate bind9 configuration > from a nis hosts table. So I was trying in a one re to catch from this: > > [ ...] [# comment] > e.g: > 10.12.23.45 hostname1 alias1 alias2 alias3 # there is a nice comment > 37.64.86.23 hostname2 > 35.25.89.34 hostname3 alias5 I'm not sure I follow. Could you state explicitly what the input is (I think the above), and what output you want? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Permission Report
Hello all, I have a tree of code on a machine which has been tweaked and fiddled with over several months, and which passes tests. I have the same codebase in a new virtual machine. A shell hack[0] shows me that the permissions are very different between the two. I could use rsync or something to synchronise them, but I would like to produce a report of the sort: Change file: foo from 755 to 775 So I can try to work out why this is necessary. I'm not sure how best to proceed - I guess walk through the filesystem gathering info using stat, then do the same on the new system, and compare. Or are there some clever modules I could use? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Fwd: Permission Report
Sorry... -- Forwarded message -- From: Stephen Nelson-Smith <[EMAIL PROTECTED]> Date: Oct 8, 2007 6:54 PM Subject: Re: [Tutor] Permission Report To: Alan Gauld <[EMAIL PROTECTED]> On 10/8/07, Alan Gauld <[EMAIL PROTECTED]> wrote: > Yes, os.walk and os.stat should do what you want. Ok - I have: import os, stat permissions = {} for dir, base, files in os.walk('/home/peter/third/accounts/'): for f in files: file = os.path.join(dir, f) perm = os.stat(file)[stat.ST_MODE] permissions[file] = oct(stat.S_IMODE(perm)) This is fine - it stores the info I need. But if I want to run the same procedure on a remote host, and store the results in a dictionary so they can be compared, what would I do? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Fwd: Permission Report
On 10/10/07, Kent Johnson <[EMAIL PROTECTED]> wrote: > Stephen Nelson-Smith wrote: > > But if I want to run the same procedure on a remote host, and store > > the results in a dictionary so they can be compared, what would I do? > > What kind of access do you have to the remote host? If you have > filesystem access you can use the same program running locally. In the end I used sshfs, which worked fine. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] VIX API
Hello, Does anyone know if there are python bindings for the VMware VIX API? I googled for a bit, but didn't find them... How tricky would it be to wrap the C API? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] NNTP Client
Hello all, I wish to pull all the articles for one particular newsgroup to a local machine, on a regular basis. I don't wish to read them - I will be parsing the contents programatically. In your view is it going to be best to use an 'off-the-shelf' news reader, or ought it to be straightforward to write a client that does this task? If so, any pointers would be most welcome. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] NNTP Client
On Nov 13, 2007 2:13 PM, Stephen Nelson-Smith <[EMAIL PROTECTED]> wrote: > ought it to be straightforward to write a client that does this task? Well: >>> server = NNTP('news.gmane.org') >>> resp, count, first, last, name = server.group("gmane.linux.redhat.enterprise.announce") Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/nntplib.py", line 346, in group resp = self.shortcmd('GROUP ' + name) File "/usr/lib/python2.5/nntplib.py", line 260, in shortcmd return self.getresp() File "/usr/lib/python2.5/nntplib.py", line 215, in getresp resp = self.getline() File "/usr/lib/python2.5/nntplib.py", line 207, in getline if not line: raise EOFError EOFError What's wrong with that then? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] NNTP Client
On Nov 13, 2007 4:01 PM, Stephen Nelson-Smith <[EMAIL PROTECTED]> wrote: > >>> server = NNTP('news.gmane.org') > > What's wrong with that then? server, apparently:>>> s.group("gmane.discuss") ('211 11102 10 11329 gmane.discuss', '11102', '10', '11329', 'gmane.discuss') >>> server.group("gmane.discuss") Traceback (most recent call last): File "", line 1, in File "/usr/lib/python2.5/nntplib.py", line 346, in group resp = self.shortcmd('GROUP ' + name) File "/usr/lib/python2.5/nntplib.py", line 259, in shortcmd self.putcmd(line) File "/usr/lib/python2.5/nntplib.py", line 199, in putcmd self.putline(line) File "/usr/lib/python2.5/nntplib.py", line 194, in putline self.sock.sendall(line) File "", line 1, in sendall socket.error: (32, 'Broken pipe') Stupid of me. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] [OT] Vacancy - python systems programmer
All, I may shortly be in the position of being able to hire a python systems programmer for a short contract (1-2 days initially to spike an ongoing project). The ideal person will have the following: * Solid experience of Python for systems programming and database interaction * Familiarity with MySQL 5 - inserting and querying medium-sized databases with Python * Systems experience on RHEL or equivalent clone * Sysadmin experience on Unix / Linux with patching and package management systems * Experience in a high-availability environment, eg managed hosting. * Experience of working in an agile environment (test first, pairing) This is, of course, a shopping list, but is intended to give a sense of the sort of background I'm after. If this sounds like the sort of thing you'd be interested in, please contact me off list. Thanks, S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Parsing Word Docs
Hello all, I have a directory containing a load of word documents, say 100 or so. which is updated every hour. I want a cgi script that effectively does a grep on the word docs, and returns each doc that matches the search term. I've had a look at doing this by looking at each binary file and reimplementing strings(1) to capture useful info. I've also read that one can treat a word doc as a COM object. Am I right in thinking that I can't do this on python under unix? What other ways are there? Or is the binary parsing the way to go? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] How do to an svn checkout using the svn module?
Hello, I want to do a simple svn checkout using the python svn module. I haven't been able to find any/much/basic documentation that discusses such client operations. This should be very easy, I imagine! What do I need to do? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] How do to an svn checkout using the svn module?
On 3/9/07, Kent Johnson <[EMAIL PROTECTED]> wrote: > Did you find the pysvn Programmer's Guide that comes with pysvn? It has > this example: Ah.. no I haven't got pysvn installed... but will take a look. What I do have is: >>> import sys >>> import svn.core >>> import svn.client >>> import sys >>> pool = svn.core.svn_pool_create(None) >>> svn.core.svn_config_ensure( None, pool ) >>> ctx = svn.client.svn_client_ctx_t() >>> config = svn.core.svn_config_get_config( None, pool ) >>> ctx.config = config >>> rev = svn.core.svn_opt_revision_t() >>> rev.kind = svn.core.svn_opt_revision_head >>> rev.number = 0 >>> ctx.auth_baton = svn.core.svn_auth_open( [], pool ) >>> url = "https://svn.uk.delarue.com/repos/prdrep/prddoc/"; >>> path ="/tmp" >>> svn.client.svn_client_checkout(url, path, rev, 0, ctx, pool) Traceback (most recent call last): File "", line 1, in ? libsvn._core.SubversionException: ("PROPFIND request failed on '/repos/prdrep/prddoc'", 175002) Not sure what I am doing wrong... the url is correct. > Kent S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Parsing Word Docs
On 3/8/07, Tim Golden <[EMAIL PROTECTED]> wrote: > Simplest thing's probably antiword (http://www.winfield.demon.nl/) > and then whatever text-scanning approach you want. I've gone for: #!/usr/bin/env python import glob, os url = "/home/cherp/prddoc" searchstring = "dxpolbl.p" worddocs = [] for (dirpath, dirnames, filenames) in os.walk(url): for f in filenames: if f.endswith(".doc"): worddocs.append(os.path.join(dirpath,f)) for d in worddocs: for i in glob.glob(d): if searchstring in open(i,"r").read(): print "Found it in: ", i.split('/')[-1] Now... I want to convert this to a cgi-script... how do I grab $QUERY_STRING in python? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] [OT] ETL Tools
Hello all, Does anyone know of any ETL (Extraction, Transformation, Loading) tools in Python (or at any rate, !Java)? I have lots (and lots) of raw data in the form of log files which I need to process and aggregate and then do a whole bunch of group-by operations, before dumping them into text/relational database for a search engine to access. At present we have a bunch of scripts in perl and ruby, and a berkley and mysql database for the grouping operations. This is proving to be a little slow with the amount of data we now have, so I am looking into alternatives. Does anyone have any experience of this sort of thing? Or know someone who does, that I could talk to? Best regards, S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] HTML Parsing
Hi, I want to write a little script that parses an apache mod_status page. I want it to return simple the number of page requests a second and the number of connections. It seems this is very complicated... I can do it in a shell one-liner: curl 10.1.2.201/server-status 2>&1 | grep -i request | grep dt | { IFS='> ' read _ rps _; IFS='> ' read _ currRequests _ _ _ _ idleWorkers _; echo $rps $currRequests $idleWorkers ; } But that's horrid. So is: $ eval `printf '3 requests currently being processed, 17 idle workers\n 2.82 requests/sec - 28.1 kB/second - 10.0 kB/request\n' | sed -nr '// { N; s@([0-9]*)[^,]*,([0-9]*).*([0-9.]*)[EMAIL PROTECTED]((\1+\2));[EMAIL PROTECTED]; }'` $ echo "workers: $workers reqs/secs $requests" workers: 20 reqs/sec 2.82 The page looks like this: Apache Status Apache Server Status for 10.1.2.201 Server Version: Apache/2.0.46 (Red Hat) Server Built: Aug 1 2006 09:25:45 Current Time: Monday, 21-Apr-2008 14:29:44 BST Restart Time: Monday, 21-Apr-2008 13:32:46 BST Parent Server Generation: 0 Server uptime: 56 minutes 58 seconds Total accesses: 10661 - Total Traffic: 101.5 MB CPU Usage: u6.03 s2.15 cu0 cs0 - .239% CPU load 3.12 requests/sec - 30.4 kB/second - 9.7 kB/request 9 requests currently being processed, 11 idle workers How can/should I do this? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
On 4/21/08, Andreas Kostyrka <[EMAIL PROTECTED]> wrote: > As usual there are a number of ways. > > But I basically see two steps here: > > 1.) capture all dt elements. If you want to stick with the standard > library, htmllib would be the module. Else you can use e.g. > BeautifulSoup or something comparable. I want to stick with standard library. How do you capture elements? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
Hi, > for lineno, line in enumerate(html): -Epython2.2hasnoenumerate() Can we code around this? > x = line.find("requests/sec") > if x >= 0: >no_requests_sec = line[3:x] >break > for lineno, line in enumerate(html[lineno+1:]): > x = line.find("requests currently being processed") > if x >= 0: >no_connections = line[3:x] That all looks ok. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] HTML Parsing
Hello, > For data this predictable, simple regex matching will probably work fine. I thought that too... Anyway - here's what I've come up with: #!/usr/bin/python import urllib, sgmllib, re mod_status = urllib.urlopen("http://10.1.2.201/server-status";) status_info = mod_status.read() mod_status.close() class StatusParser(sgmllib.SGMLParser): def parse(self, string): self.feed(string) self.close() def __init__(self, verbose=0): sgmllib.SGMLParser.__init__(self, verbose) self.information = [] self.inside_dt_element = False def start_dt(self, attributes): self.inside_dt_element = True def end_dt(self): self.inside_dt_element = False def handle_data(self, data): if self.inside_dt_element: self.information.append(data) def get_data(self): return self.information status_parser = StatusParser() status_parser.parse(status_info) rps_pattern = re.compile( '(\d+\.\d+) requests/sec' ) connections_pattern = re.compile( '(\d+) requests\D*(\d+) idle.*' ) for line in status_parser.get_data(): rps_match = rps_pattern.search( line ) connections_match = connections_pattern.search( line ) if rps_match: rps = float(rps_match.group(1)) elif connections_match: connections = int(connections_match.group(1)) + int(connections_match.group(2)) rps_threshold = 10 connections_threshold = 100 if rps > rps_threshold: print "CRITICAL: %s Requests per second" % rps else: print "OK: %s Requests per second" % rps if connections > connections_threshold: print "CRITICAL: %s Simultaneous Connections" % connections else: print "OK: %s Simultaneous Connections" % connections Comments and criticism please. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Sending Mail
smtpserver = 'relay.clara.net' RECIPIENTS = ['[EMAIL PROTECTED]'] SENDER = '[EMAIL PROTECTED]' message = """Subject: HTTPD ALERT: %s requests %s connections Please investigate ASAP.""" % (rps, connections) session = smtplib.SMTP(smtpserver) smtpresult = session.sendmail(SENDER, RECIPIENTS, message) if smtpresult: errstr = "" for recip in smtpresult.keys(): errstr = """Could not delivery mail to: %s Server said: %s %s %s""" % (recip, smtpresult[recip][0], smtpresult[recip][1], errstr) raise smtplib.SMTPException, errstr This sends emails But gmail says it came from "unknown sender" I see an envelope-from in the headers. What am I missing? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Web Stats
Hi, I've been asked to produce a report showing all possible resources in a website, together with statistics on how frequently they've been visited. Nothing fancy - just number and perhaps date of last visit. This has to include resources which have not been visited, as the point is to clean out old stuff. I have several years of apache weblogs. Is there something out there that already does this? If not, or if it's interesting and not beyond the ken of a reasonable programmer, could anyone provide some pointers on where to start? Thanks, S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Web Stats
Hello, >> This has to include resources which have not been visited, as the >> point is to clean out old stuff. > > Take a look at AWStats (not Python). Doesn't this 'only' parse weblogs? I'd still need some kind of spider to tell me all the possible resources available wouldn't I? It's a big website, with 1000s of pages. > For do it yourself, loghetti > might be a good starting point > http://code.google.com/p/loghetti/ Looks interesting, but again don't I fall foul of the "how can I know about what, by definition, doesn't feature in a log?" problem? S. > Kent > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Init Scripts
Hello, I've been wrestling with some badly written init scripts, and picking my way through the redhat init script system. I'm getting to the point of thinking I could do this sort of thing in Python just as effectively. Are there any pointers available? Eg libraries that give process information, so I can obtain status information? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Short Contract
Hello, I'm looking for someone to help on a short contract to build a centralised blogging system. I want a planet-style aggregation of blogs, but with the ability to see and make comments on each individual blog, from the central planet page. Ideally, it would also have a little 'icon' mug-shot of the person who writes each blog next to their entry, and a dynamically generated 'go to this guy's blog' button under each entry. I'd be happy to take an existing open-source tool (eg venus) and modify it for purpose - I think that's a better idea than writing a new blog engine. The blogs don't exist yet, so we can ensure they all live on the same blog server. If this is something you have bandwidth for, or an interest in, please contact me off list and we can discuss it. Thanks, S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Capturing and parsing over telnet
I want to write a program that connects to a TCP port using telnet, and issues commands, parsing the output the command provides, and then issuing another command. This might look like this: $ telnet water.fieldphone.net 7456 Welcome to water, enter your username >_ sheep Enter your password >_ sheep123 >_ examine here [some info to parse] [.] [.] >_ some command based on parsing the previous screen [more info to parse] [.] [.] I am confident I can parse the info, if I can read it in. I am not sure how to handle the telnet I/O How should I proceed? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Capturing and parsing over telnet
Hi, > How about pexpect; > http://www.noah.org/wiki/Pexpect Ah yes - I've used that before to good effect. ATM I'm playing with telnetlib. Is there a way to read everything on the screen, even if I don't know what it will be? eg: c = telnetlib.Telnet("test.lan") c.read_until("name: ") c.write("test\n") c.read_until("word: ") c.write("test\n") And now I don't know what I will see - nor is there a prompt which indicates that the output is complete. I effectively want something like c.read_everything() I tried c.read_all() but I don't get anything from that. Suggestions? S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Capturing and parsing over telnet
Hi, > I effectively want something like c.read_everything() Looks like read_very_eager() does what I want. S. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] League Secretary Application
Hello, I'm the league secretary for a table tennis league. I have to generate a weekly results report, league table, and player averages, from results cards which arrive by post or email. The data is of the form: Division: 1 Week: 7 Home: Some Team Away: Different Team Player A: Fred Bloggs Player B: Nora Batty Player X: Jim Smith Player Y: Edna Jones A vs X: 3-0 B vs Y: 3-2 A vs Y: 3-0 B vs X: 3-2 Doubles: 3-1 >From this I can calculate the points allocated to teams and produce a table. I've not done any real python for about 6 years, but figured it'd be fun to design and write something that would take away the time and error issues associated with generating this manually. Sure I could build a spreadsheet, but this seems more fun. I'm currently thinking through possible approaches, from parsing results written in, eg YAML, to a menu-driven system, to a web app. I'm generally in favour of the simplest thing that could possibly work, but I am conscious that there's a lot of room for data entry error and thus validation, if I just parse a file, or make a CLI. OTOH I have never ever written a web app, with forms etc. There's no time constraint here - this is merely for fun, and to make my life easier. Any thoughts? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] League Secretary Application
Hullo, On Sat, May 30, 2015 at 3:49 PM, Laura Creighton wrote: > > 2. How do you receive your data now? Do you want to change this, > perhaps extend the capabilities -- i.e. let people send an sms > with results to your cell phone? Or limit the capabilities ("Stop > phoning me with this stuff! Use the webpage!) How you get your > data is very relevant to the design. > I get a physical card, or a photograph of the same. It'd be possible in the future to get people to use a website or a phone app, but for now, I enter the data from the cards, manually. > 3. After you have performed your calculation and made a table, what > do you do with it? Email it to members? Publish it in a > weekly dead-tree newspaper? Post it to a website? What you > want to do with it once you have it is also very relevant to the > design. > ATM I send an email out, and someone else takes that data and publishes it on a website. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Sorting a list of list
As part of my league secretary program (to which thread I shall reply again shortly), I need to sort a list of lists. I've worked out that I can use sorted() and operator.itemgetter to sort by a value at a known position in each list. Is it possible to do this at a secondary level? So if the items are the same, we use the secondary key? Current function: >>> def sort_table(table, col=0): ... return sorted(table, key=operator.itemgetter(col), reverse=True) ... >>> sort_table(results, 6) [['spip', 2, 2, 0, 10, 0, 4], ['hpip', 2, 0, 2, 2, 8, 0]] S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Ideas for Child's Project
Hello, My son is interested in programming, and has dabbled in Scratch and done a tiny bit of Python at school. He's 11 and is going for an entrance exam for a selective school in a couple of weeks. They've asked him to bring along something to demonstrate an interest, and present it to them. In talking about it, we hit upon the idea that he might like to embark upon a prorgamming challenge, or learning objective / project, spending say 30 mins a day for the next week or two, so he can show what he's done and talk about what he learned. Any suggestions for accessible yet challenging and stimulating projects? Any recommendations for books / websites / tutorials that are worth a look? S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Ideas for Child's Project
Hi Danny, On Tue, Jan 6, 2015 at 10:07 PM, Danny Yoo wrote: > On Tue, Jan 6, 2015 at 1:46 PM, Stephen Nelson-Smith > wrote: > > You might want to look at Bootstrapworld, a curriculum for > middle-school/high-school math using programming and games: > > http://www.bootstrapworld.org/ > > Students who go through the material learn how math can be used > productively toward writing a video game. Along the way, they learn > the idea of function, of considering inputs and outputs, and how to > test what they've designed. > Sounds ace. I had a look. It seems optimised for Racket, which is also cool so I installed Racket on my son's computer and let him have a play. He immediately got into it, and got the hang of functions and expressions etc. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Debugging a sort error.
Hi, On Sun, Jan 13, 2019 at 8:34 AM wrote: > description.sort() > TypeError: unorderable types: float() < str() So, fairly obviously, we can't test whether a float is less than a string. Any more than we can tell if a grapefruit is faster than a cheetah. So there must be items in description that are strings and floats. With 2000 lines, you're going to struggle to eyeball this, so try something like this: In [69]: irrational_numbers = [3.14159265, 1.606695, "pi", "Pythagoras Constant"] In [70]: from collections import Counter In [71]: dict(Counter([type(e) for e in irrational_numbers])) Out[71]: {float: 2, str: 2} If with your data, this shows only strings, I'll eat my hat. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python installtion
Hi, On Mon, Jan 7, 2019 at 11:11 AM mousumi sahu wrote: > > Dear Sir, > I am trying to install python 2.7.10 on HPC. Python 2.6 has already been > install on root. I do not have root authority. Please suggest me how can I > do this. Sorry - I replied to you directly, by accident. Take 2, with reply all: You need to do a local installation of Python, and set up your system to use that in preference to the one at the system level. Although it's possible to do this with various manual steps, there's a really handy tool you can use which will make your life easier, and allow you to manage multiple versions of Python, which might be useful, if you wanted, say, to be able to run both Python 2 and Python 3. The tool is called `pyenv`, and as long as you have a bash/zsh shell, and your system has a C compiler and associated tools already installed, you can install and use it. The simplest approach is to clone the tool it from git, modify your shell to use it, and then use it to install Python. Here's a sample way to set it up. This won't necessarily match your exact requirements, but you can try it, and please come back if you have any further questions: 1. Clone the git repo into your home directory git clone https://github.com/pyenv/pyenv.git ~/.pyenv Pyenv is very simple, conceptually. It's just a set of shell scripts to automate the process of fetching, compiling, and installing versions of Python, and then massaging your shell to make sure the versions you have installed are used in preference to anything else. So now you have the tool, you need to configure your shell to use it. I'm going to assume you're using Bash. 2. Make sure the contents of the pyenv tool is available on your path echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bash_profile echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bash_profile Note - this might need to be .bashrc, or something else, depending on your os/distro/setup. However, in principle you're just making the pyenv tool (which itself is just a set of shell scripts) available at all times. 3. Set your shell to initialise the pyenv tool every time you start a new shell echo -e 'if command -v pyenv 1>/dev/null 2>&1; then\n eval "$(pyenv init -)"\nfi' >> ~/.bash_profile Again: this might need to be .bashrc 4. Now open a new shell, and check you have pyenv available: $ pyenv pyenv 1.2.9-2-g6309aaf2 Usage: pyenv [] Some useful pyenv commands are: commandsList all available pyenv commands local Set or show the local application-specific Python version global Set or show the global Python version shell Set or show the shell-specific Python version install Install a Python version using python-build uninstall Uninstall a specific Python version rehash Rehash pyenv shims (run this after installing executables) version Show the current Python version and its origin versionsList all Python versions available to pyenv which Display the full path to an executable whence List all Python versions that contain the given executable See `pyenv help ' for information on a specific command. For full documentation, see: https://github.com/pyenv/pyenv#readme If you don't have pyenv working at this stage, come back and I'll help you troubleshoot. Assuming you do, continue: 5. Now you can install a version of Python, locally : pyenv install --list This shows you the various options of Pythons you can install. You want the latest 2.7: pyenv install 2.7.15 This will fetch the source code of Python, and compile and install it for you, and place it in your local shell environment, where you can use it. If this step doesn't work, it's probably because your system doesn't have a compiler and associated tools. I can help you troubleshoot that, but ultimately you'll need support from your system administrator at this point. Assuming it's install Python, now you just need to tell your shell that you want to use it: pyenv local 2.7.15 This will make your shell find your 2.7.15 installation ahead of the system python: $ python --version Python 2.7.15 Now you can run and use your Python. Any further questions, sing out. S. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor