[Tutor] Logical error?

2014-05-02 Thread Bob Williams
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi,

I'm fairly new to coding and python. My system is linux (openSUSE
13.1). I've written the following code to examine a log file, and
extract strings from certain lines if two conditions are met, namely
that the file has been modified today, and the line contains the
string 'recv'.

- ---Code---
#!/usr/bin/python

import sys
import datetime
import codecs
import subprocess

# Format date as /MM/DD
today = (datetime.datetime.now()).strftime('%Y/%m/%d')

fullPath = []   # declare (initially empty) lists
truncPath = []

with codecs.open('/var/log/rsyncd.log', 'r') as rsyncd_log:
for line in rsyncd_log.readlines():
fullPath += [line.decode('utf-8', 'ignore').strip()]
if fullPath[-1][0:10] == today:
print("\n   Rsyncd.log has been modified in the last 24 hours...")
else:
print("\n   No recent rsync activity. Nothing to do.\n")
sys.exit()

# Search for lines starting with today's date and containing 'recv'
# Strip everything up to and including 'recv' and following last '/'
path separator
for i in range(0, len(fullPath)):
if fullPath[i][0:10] == today and 'recv' in fullPath[i]:
print("got there")
begin = fullPath[i].find('recv ')
end = fullPath[i].rfind('/')
fullPath[i] = fullPath[i][begin+5:end]
truncPath.append(fullPath[i])
print("   ...and the following new albums have been added:\n")
else:
print("   ...but no new music has been downloaded.\n")
sys.exit()

- ---Code---

The file rsyncd.log typically contains lines such as (sorry about the
wrapping):

2014/05/02 19:43:14 [20282]
host109-145-nnn-xxx.range109-145.btcentralplus.com recv Logical
Progression Level 3 (1998) Intense/17 Words 2 B Heard Collective -
Sonic Weapon.flac 72912051 72946196

I would expect the script to output a list of artist and album names,
eg Logical Progression Level 3 (1998) Intense. IOW what is between the
string 'recv' and the trailing '/'. What it actually produces is:

:~> python ~/bin/newstuff.py

   Rsyncd.log has been modified in the last 24 hours...
   ...but no new music has been downloaded.

This suggests that the first 'if' clause (matching the first 10
characters of the last line) is satisfied, but the second one isn't,
as the flow jumps to the second 'else' clause.

As the script runs without complaint, this is presumably a logical
error rather than a syntax error, but I cannot see where I've gone wrong.

Bob
- -- 
Bob Williams
System:  Linux 3.11.10-7-desktop
Distro:  openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.0
Uptime:  06:00am up 11:26, 4 users, load average: 0.00, 0.02, 0.05
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlNkGewACgkQ0Sr7eZJrmU57YwCgg91pxyQbFMSe+TqHkEjMuzQ6
03MAnRQ50up6v+kYE+Hf/jK6yOqQw4Ma
=+w0s
-END PGP SIGNATURE-
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Fwd: Logical error?

2014-05-02 Thread C Smith
The first loop tests for the last element of fullPath to have today's date.
The second loop tests the first element in fullPath, if it is not today,
you will end up running sys.exit() when you hit the else clause in second
loop.


On Fri, May 2, 2014 at 6:38 PM, C Smith wrote:

> The first loop tests for the last element of fullPath to have today's
> date. The second loop tests the first element in fullPath, if it is not
> today, you will end up running sys.exit() when you hit the else clause in
> second loop.
>
>
> On Fri, May 2, 2014 at 6:19 PM, Bob Williams 
> wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> Hi,
>>
>> I'm fairly new to coding and python. My system is linux (openSUSE
>> 13.1). I've written the following code to examine a log file, and
>> extract strings from certain lines if two conditions are met, namely
>> that the file has been modified today, and the line contains the
>> string 'recv'.
>>
>> - ---Code---
>> #!/usr/bin/python
>>
>> import sys
>> import datetime
>> import codecs
>> import subprocess
>>
>> # Format date as /MM/DD
>> today = (datetime.datetime.now()).strftime('%Y/%m/%d')
>>
>> fullPath = []   # declare (initially empty) lists
>> truncPath = []
>>
>> with codecs.open('/var/log/rsyncd.log', 'r') as rsyncd_log:
>> for line in rsyncd_log.readlines():
>> fullPath += [line.decode('utf-8', 'ignore').strip()]
>> if fullPath[-1][0:10] == today:
>> print("\n   Rsyncd.log has been modified in the last 24 hours...")
>> else:
>> print("\n   No recent rsync activity. Nothing to do.\n")
>> sys.exit()
>>
>> # Search for lines starting with today's date and containing 'recv'
>> # Strip everything up to and including 'recv' and following last '/'
>> path separator
>> for i in range(0, len(fullPath)):
>> if fullPath[i][0:10] == today and 'recv' in fullPath[i]:
>> print("got there")
>> begin = fullPath[i].find('recv ')
>> end = fullPath[i].rfind('/')
>> fullPath[i] = fullPath[i][begin+5:end]
>> truncPath.append(fullPath[i])
>> print("   ...and the following new albums have been added:\n")
>> else:
>> print("   ...but no new music has been downloaded.\n")
>> sys.exit()
>>
>> - ---Code---
>>
>> The file rsyncd.log typically contains lines such as (sorry about the
>> wrapping):
>>
>> 2014/05/02 19:43:14 [20282]
>> host109-145-nnn-xxx.range109-145.btcentralplus.com recv Logical
>> Progression Level 3 (1998) Intense/17 Words 2 B Heard Collective -
>> Sonic Weapon.flac 72912051 72946196
>>
>> I would expect the script to output a list of artist and album names,
>> eg Logical Progression Level 3 (1998) Intense. IOW what is between the
>> string 'recv' and the trailing '/'. What it actually produces is:
>>
>> :~> python ~/bin/newstuff.py
>>
>>Rsyncd.log has been modified in the last 24 hours...
>>...but no new music has been downloaded.
>>
>> This suggests that the first 'if' clause (matching the first 10
>> characters of the last line) is satisfied, but the second one isn't,
>> as the flow jumps to the second 'else' clause.
>>
>> As the script runs without complaint, this is presumably a logical
>> error rather than a syntax error, but I cannot see where I've gone wrong.
>>
>> Bob
>> - --
>> Bob Williams
>> System:  Linux 3.11.10-7-desktop
>> Distro:  openSUSE 13.1 (x86_64) with KDE Development Platform: 4.13.0
>> Uptime:  06:00am up 11:26, 4 users, load average: 0.00, 0.02, 0.05
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v2.0.22 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlNkGewACgkQ0Sr7eZJrmU57YwCgg91pxyQbFMSe+TqHkEjMuzQ6
>> 03MAnRQ50up6v+kYE+Hf/jK6yOqQw4Ma
>> =+w0s
>> -END PGP SIGNATURE-
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> https://mail.python.org/mailman/listinfo/tutor
>>
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Logical error?

2014-05-02 Thread Steven D'Aprano
Hi Bob, and welcome!

My responses interleaved with yours, below.

On Fri, May 02, 2014 at 11:19:26PM +0100, Bob Williams wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> Hi,
> 
> I'm fairly new to coding and python. My system is linux (openSUSE
> 13.1). 

Nice to know. And I see you have even more infomation about your system 
in your email signature, including your email client and uptime. But 
what you don't tell us is what version of Python you're using. I'm going 
to guess that it is something in the 3.x range, since you call print as 
a function rather than a statement, but can't be sure.

Fortunately in this case I don't think the exact version matters.


[...]
> fullPath = []   # declare (initially empty) lists
> truncPath = []
> 
> with codecs.open('/var/log/rsyncd.log', 'r') as rsyncd_log:
> for line in rsyncd_log.readlines():
> fullPath += [line.decode('utf-8', 'ignore').strip()]

A small note about performance here. If your log files are very large 
(say, hundreds of thousands or millions of lines) you will find that 
this part is *horribly horrible slow*. There's two problems, a minor and 
a major one.

First, rsyncd_log.readlines will read the entire file in one go. Since 
you end up essentially copying the whole file, you end up with two large 
lists of lines. There are ways to solve that, and process the lines 
lazily, one line at a time without needing to store the whole file. But 
that's not the big problem.

The big problem is this:

fullPath += [line.decode('utf-8', 'ignore').strip()]

which is an O(N**2) algorithm. Do you know that terminology? Very 
briefly: O(1) means approximately constant time: tripling the size of 
the input makes no difference to the processing time. O(N) means linear 
time: tripling the input triples the processing time. O(N**2) means 
quadratic time: tripling the input increases the processing time not by 
a factor of three, but a factor of three squared, or nine.

With small files, and fast computers, you won't notice. But with huge 
files and a slow computer, that could be painful.

Instead, a better approach is:

fullPath.append(line.decode('utf-8', 'ignore').strip())

which avoids the O(N**2) performance trap.


> if fullPath[-1][0:10] == today:
> print("\n   Rsyncd.log has been modified in the last 24 hours...")
> else:
> print("\n   No recent rsync activity. Nothing to do.\n")
> sys.exit()
> 
> # Search for lines starting with today's date and containing 'recv'
> # Strip everything up to and including 'recv' and following last '/'
> path separator
> for i in range(0, len(fullPath)):
> if fullPath[i][0:10] == today and 'recv' in fullPath[i]:
> print("got there")
> begin = fullPath[i].find('recv ')
> end = fullPath[i].rfind('/')
> fullPath[i] = fullPath[i][begin+5:end]
> truncPath.append(fullPath[i])
> print("   ...and the following new albums have been added:\n")
> else:
> print("   ...but no new music has been downloaded.\n")
> sys.exit()

Now at last we get to your immediate problem: the above is 
intended to iterate over the lines of fullPath. But it starts at the 
beginning of the file, which may not be today. The first time you hit a 
line which is not today, the program exits, before it gets a chance to 
advance to the more recent days. That probably means that it looks at 
the first line in the log, determines that it is not today, and exits.

I'm going to suggest a more streamlined algorithm. Most of it is actual 
Python code, assuming you're using Python 3. Only the "process this 
line" part needs to be re-written.

new_activity = False  # Nothing has happened today.
with open('/var/log/rsyncd.log', 'r', 
  encoding='utf-8', errors='ignore') as rsyncd_log:
for line in rsyncd_log:
line = line.strip()
if line[0:10] == today and 'recv' in line:
new_activity = True
process this line  #  <== fix this

if not new_activity:
print("no new albums have been added today")



This has the benefit that every line is touched only once, not three 
times as in your version. Performance is linear, not quadratic. You 
should be able to adapt this to your needs.

Good luck, and feel free to ask questions!



-- 
Steven
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor