Stephen Nelson-Smith wrote: > Nope - but I can look it up. The problem I have is that the source > logs are rotated at 0400 hrs, so I need two days of logs in order to > extract 24 hrs from 0000 to 2359 (which is the requirement). At > present, I preprocess using sort, which works fine as long as the > month doesn't change.
Still not sure without more detail, but IIRC from your previous posts, your log entry timestamps are formatted with the abbreviated month name instead of month number. Without the -M flag, the sort command will ... well, erm ... sort the month names alphabetically. With the -M (--month-sort) flag, they are sorted chronologically. Just a guess, of course. I suppose this is drifting a bit off topic, in any case, but it may still serve to demonstrate the importance of converting your string based timestamps into something that can be sorted accurately by your python code -- the most obvious being time or datetime objects, IMHO. <snip> >> class LogFile(object): >> def __init__(self, filename, jitter=10): >> self.logfile = gzip.open(filename, 'r') >> self.heap = [] >> self.jitter = jitter >> >> def __iter__(self): >> while True: >> for logline in self.logfile: >> heappush(self.heap, (timestamp(logline), logline)) >> if len(self.heap) >= self.jitter: >> break > > Really nice way to handle the batching of the initial heap - thank you! > >> try: >> yield heappop(self.heap) >> except IndexError: >> raise StopIteration <snip> >> ... which probably won't preserve the order of log entries that have the >> same timestamp, but if you need it to -- should be easy to accommodate. > > I don't think that is necessary, but I'm curious to know how... I'd imagine something like this might work ... class LogFile(object): def __init__(self, filename, jitter=10): self.logfile = open(filename, 'r') self.heap = [] self.jitter = jitter def __iter__(self): line_count = 0 while True: for logline in self.logfile: line_count += 1 heappush(self.heap, ((timestamp(logline), line_count), logline)) if len(self.heap) >= self.jitter: break try: yield heappop(self.heap) except IndexError: raise StopIteration The key concept is to pass additional unique data to heappush, something related to the order of lines from input. So, you could probably do something with file.tell() also. But beware, it seems you can't reliably tell() a file object opened in 'r' mode, used as an iterator[1] -- and in python 3.x attempting to do so raises an IOError. [1] http://mail.python.org/pipermail/python-list/2008-November/156865.html HTH, Marty _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor