-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Steven,
On 03/05/14 02:53, Steven D'Aprano wrote: > Hi Bob, and welcome! > > My responses interleaved with yours, below. > > On Fri, May 02, 2014 at 11:19:26PM +0100, Bob Williams wrote: >> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 >> >> Hi, >> >> I'm fairly new to coding and python. My system is linux >> (openSUSE 13.1). > > Nice to know. And I see you have even more infomation about your > system in your email signature, including your email client and > uptime. But what you don't tell us is what version of Python you're > using. I'm going to guess that it is something in the 3.x range, > since you call print as a function rather than a statement, but > can't be sure. > I have both 2 and 3 installed here, but I call this script with python2. I guess I should concentrate on Python 3 as that's the way things are going. > Fortunately in this case I don't think the exact version matters. > > > [...] >> fullPath = [] # declare (initially empty) lists truncPath = [] >> >> with codecs.open('/var/log/rsyncd.log', 'r') as rsyncd_log: for >> line in rsyncd_log.readlines(): fullPath += [line.decode('utf-8', >> 'ignore').strip()] > > A small note about performance here. If your log files are very > large (say, hundreds of thousands or millions of lines) you will > find that this part is *horribly horrible slow*. There's two > problems, a minor and a major one. > The log file is typically a few thousand lines, so my code runs fast enough, but I understand your point. > First, rsyncd_log.readlines will read the entire file in one go. > Since you end up essentially copying the whole file, you end up > with two large lists of lines. There are ways to solve that, and > process the lines lazily, one line at a time without needing to > store the whole file. But that's not the big problem. > > The big problem is this: > > fullPath += [line.decode('utf-8', 'ignore').strip()] > > which is an O(N**2) algorithm. Do you know that terminology? Very > briefly: O(1) means approximately constant time: tripling the size > of the input makes no difference to the processing time. O(N) means > linear time: tripling the input triples the processing time. > O(N**2) means quadratic time: tripling the input increases the > processing time not by a factor of three, but a factor of three > squared, or nine. > > With small files, and fast computers, you won't notice. But with > huge files and a slow computer, that could be painful. > > Instead, a better approach is: > > fullPath.append(line.decode('utf-8', 'ignore').strip()) > > which avoids the O(N**2) performance trap. > Understood. > [snip] > > Now at last we get to your immediate problem: the above is intended > to iterate over the lines of fullPath. But it starts at the > beginning of the file, which may not be today. The first time you > hit a line which is not today, the program exits, before it gets a > chance to advance to the more recent days. That probably means that > it looks at the first line in the log, determines that it is not > today, and exits. > I'd missed the (now obvious) point that my if condition contained two terms which both have to be true. > I'm going to suggest a more streamlined algorithm. Most of it is > actual Python code, assuming you're using Python 3. Only the > "process this line" part needs to be re-written. > > new_activity = False # Nothing has happened today. with > open('/var/log/rsyncd.log', 'r', encoding='utf-8', errors='ignore') > as rsyncd_log: for line in rsyncd_log: line = line.strip() if > line[0:10] == today and 'recv' in line: new_activity = True process > this line # <== fix this > > if not new_activity: print("no new albums have been added today") > Thanks. This works nicely. > > > This has the benefit that every line is touched only once, not > three times as in your version. Performance is linear, not > quadratic. You should be able to adapt this to your needs. > > Good luck, and feel free to ask questions! > - -- Bob Williams System: Linux 3.11.10-7-desktop Distro: openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.0 Uptime: 06:00am up 11:26, 4 users, load average: 0.00, 0.02, 0.05 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iEYEARECAAYFAlNlXCYACgkQ0Sr7eZJrmU71OACfSy1XWSDA08DNAndcA89AZg6Z +2IAniQJIrSd7wVJWl2MEtEdHlcdkwfj =YtSq -----END PGP SIGNATURE----- _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor