Re: [Tutor] 2016-02-01 Filter STRINGS in Log File and Pass as VARAIBLE within PYTHON script

Cameron Simpson Mon, 01 Feb 2016 04:56:07 -0800

On 01Feb2016 15:53, knnleow GOOGLE <knnl...@gmail.com> wrote:

trying out on how to port my unix shell script to python.
get more complicated than i expected.....: (
i am not familiar with the modules available in python.
anyone care to share how to better the clumsy approach below.
regards,
kuenn
               timestamp02 = time.strftime("%Y-%m-%d-%H%M%S")
banIPaddressesFile = os.popen("cat/var/log/fail2ban.log| egrep ssh| egrep Ban| egrep " + myDate + "| awk\'{print $7}\'| sort -n| uniq >/tmp/banIPaddressesFile." +timestamp02).read()

First up, this is still essentially a shell script. You're constructing a shellpipeline like this (paraphrased):


 cat >/var/log/fail2ban.log
 | egrep ssh
 | egrep Ban
 | egrep myDate
 | awk '{print $7}'
 | sort -n

| uniq>/tmp/banIPaddressesFile-timestamp

So really, you're doing almost nothing in Python. You're also writingintermediate results to a temporary filename, then reading from it. Unless youreally need to keep that file around, you won't need that either.

Before I get into the Python side of things, there are a few small (small)criticisms of your shell script:

- it has a "useless cat"; this is a very common shell inefficiency there peopleput "cat filename | filter1 | filter2 ..." when they could more cleanly justgo "filter1 <filename | filter2 | ..."

- you are searching for fixed strings; why are you using egrep? Just say "grep"(or even "fgrep" if you're old school - you're new to this so I presume not)

- you're using "sort -n | uniq", presumably because uniq requires sorted input;you are better off using "sort -un" here and skipping uniq. I'd also pointout that since these are IP addresses, "sort -n" doesn't really do what youwant here.


So, to the Python:

You seem to want to read the file /var/log/fail2ban.log and for certainspecific lines, record column 7 which I gather from the rest of the code(below) is an IP address. I gather you just want one copy of each unique IPaddress.


So, to read lines of the file the standard idom goes:

 with open('/var/log/fail2ban.log') as fail_log:
   for line in fail_log:
     ... process lines here ...

You seem to be checking for two keywords and a date in the interesting lines.You can do this with a simple test:


 if 'ssh' in line and 'Ban' in line and myDate in line:

If you want the seventh column from the line (per your awk command) you can getit like this:


 words = line.split()
 word7 = words[6]

because Python arrays count form 0, therefore index 6 is the seventh word.

You want the unique IP addresses, so I suggest storing them all in a set andnot bothering with a sort until some other time. So make an empty set beforeyou read the file:


 ip_addrs = set()

and add each address to it for the lines you select:

 ip_addrs.add(word7)

After you have read the whole file you will have the desired addresses in theip_addrs set.

Try to put all that together and come back with working code, or come back withcompleted but not working code and specific questions.


Cheers,
Cameron Simpson <c...@zip.com.au>
_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] 2016-02-01 Filter STRINGS in Log File and Pass as VARAIBLE within PYTHON script

Reply via email to