[Tutor] regular expressions question

2006-08-12 Thread nimrodx
Hi All,

I am trying to fish through the history file for the Konquerer web 
browser, and pull out the
web sites visited.

The file's encoding is binary or something

Here is the first section of the file:
'\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
 


Does that tell you anything?

I have been trying to replace the pesky \x00's with something less 
annoying, but
with no success:
   import re
   pattern = r"\x00"
   re.sub(pattern, '', dat2)

That seems to work at the command line, but this this:

   web = re.compile(
r"(?P[/a-zA-Z0-9\.]+)"
)
   res = re.findall(web,dat2)
tends to give me back individual alphanumeric characters, "."'s, and "/"'s,
as if they had each been separated by an unmatched character:
e.g. ['z', 't', 'f', 'i', 'l', 'e', 'h', 'o', 'm', 'e', 'a', 'l', 'p', 
'h', 'a',...]

I was hoping for one web address per element of the list...

Suggestions greatly appreciated!!

Thanks,

Matt
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Message seemed to bounce, so I will try again

2006-08-12 Thread nimrodx
Hi All,

I am trying to fish through the history file for the Konquerer web 
browser, and pull out the
web sites visited.

The file's encoding is binary or something

Here is the first section of the file:
'\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
 


Does that tell you anything?

I have been trying to replace the pesky \x00's with something less 
annoying, but
with no success:
  import re
  pattern = r"\x00"
  re.sub(pattern, '', dat2)

That seems to work at the command line, but this this:

  web = re.compile(
   r"(?P[/a-zA-Z0-9\.]+)"
   )
  res = re.findall(web,dat2)
tends to give me back individual alphanumeric characters, "."'s, and "/"'s,
as if they had each been separated by an unmatched character:
e.g. ['z', 't', 'f', 'i', 'l', 'e', 'h', 'o', 'm', 'e', 'a', 'l', 'p', 
'h', 'a',...]

I was hoping for one web address per element of the list...

Suggestions greatly appreciated!!

Thanks,

Matt

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regular expressions question]

2006-08-12 Thread nimrodx
Hi Alan and other Gurus,

if you look carefully at the string below, you see
that in amongst the "\x" stuff you have the text I want:
z tfile://home/alpha
which I know to be an address on my system, plus a bit of preceeding txt.
Alan Gauld wrote:
>> The file's encoding is binary or something
>>
>> Here is the first section of the file:
>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
>>  
>>
>>
>> Does that tell you anything?
> But that is almost certainly the wrong approach, you'll never
> figure out where the word boundaries are without them!
So I believe this is the right approach. in fact, If I print the string, 
without any modifications:
I get the following sort of stuff:
¸z¨ôôtfile:/home/alpha/care/my_details.aspx.html%oô¯0%oô¯0l

So this is one approach that will work.
I have no idea what sort of encoding it is, but if someone could tell me 
how to get rid of what I assume are hex digits.
In a hex editor it turns out to be readable and sensible url's with 
spaces between each digit, and a bit of crud at the end of url's, just 
as above.

Any suggestions with that additional info?
I've used struct before, it is a very nice module. Could  this be some 
sort of UTF encoding?

I think I was a bit light on info with that first post.
Thanks for your time,

Matt





___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] [whitelist] Re: regular expressions question

2006-08-17 Thread nimrodx
Hi Alan,

I found a pretty complicated way to do it (Alan's way is way more elegant).
In case someone is searching the archive, maybe they will find something 
in it that is useful.
It uses the regular experessions module.

import re

def dehexlify_websites(fle):
   # get binary data
   inpt = open(fle,'rb')
   dat = inpt.read()
   inpt.close()
   #strip out the hex "0"'s
   pattern = r"\x00"
   res = re.sub(pattern, "", dat)
   #-
   #it seemed easier to do it in two passes
   #create the pattern regular expression for the stuff we want to keep
   web = re.compile(
r"(?P[/a-zA-Z0-9\.\-:\_%\?&=]+)"
)
   #grab them all and put them in temp variable
   res = re.findall(web,res)
   tmp = ""
   #oops need some new lines at the end of each one to mark end of
#web address,
   #and need it all as one string
   for i in res:
   tmp = tmp + i+'\n'
   #compile reg expr for everything between :// and the newline
   web2 = re.compile(r":/(?P[^\n]+)")
   #find the websites
   #make them into an object we can pass
   res2 = re.findall(web2,tmp)
   #return 'em
   return res2


Thanks Alan,

Matt


Alan Gauld wrote:
>> if you look carefully at the string below, you see
>> that in amongst the "\x" stuff you have the text I want:
>> z tfile://home/alpha
>
> OK, those characters are obviously string data and it looks
> like its using 16 bit characters, so yes some kind of
> unicode string. In between and at the end ;lies the binary
> data in whatever format it is.
>
 Here is the first section of the file:
 '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l'
  

>
>
>> In a hex editor it turns out to be readable and sensible url's with 
>> spaces between each digit, and a bit of crud at the end of url's, 
>> just as above.
>
> Here's a fairly drastic approach:
>
 s = 
 '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01 

> \xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x
>  
>
> 00l'
 ''.join([c for c in s if c.isalnum() or c in '/: '])
> 'ztfile:/home/al'

>
> But it gets close...
>
> Alan g.
>

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] os.path.walk

2006-08-22 Thread nimrodx
Hi All,

I was wondering if anyone had used os.path.walk within a class or not, 
and what are the pitfalls...

What has got me worried is that the function called by os.path.walk  
must be a method of the class.
Now this means it will have something like this as a def:

def func_called_by_walk(self, arg, directory, names):

Will this work with os.path.walk with that definition?

Thanks,

Matt
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] [whitelist] Re: os.path.walk

2006-08-23 Thread nimrodx
Thanks guys,

I will have a go at both of the methods.

Matt
Alan Gauld wrote:
>> Yes, that is the right way to do it and it will work fine. Something 
>> like
>>
>> class Walker(object):
>>  def walk(self, base):
>>os.path.walk(base, self.callback, None)
>> 
>
>   
>> What happens is, when Python looks up self.callback it converts the 
>> method to a "bound method".
>> 
>
> Aargh! I should have remembered that. No need for lambdas here.
> Apologies...
>
>   
>> But, if you are using a recent version of Python (2.3 or greater) 
>> you should look at os.walk(), it is easier to use than 
>> os.path.walk().
>> 
>
> But I did suggest that too :-)
>
> Alan G.
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>   

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Sending an attatchment with SMTP lib

2006-08-25 Thread nimrodx
Hi All,

How do I go about sending an attachment with SMTP lib?

Thanks,

Matt
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] content disposition header: email module

2006-09-06 Thread nimrodx
Hi Python Gurus,

I am trying to mail a txt file, then with another client I
get the email and extract the text file.
The email I send however, does not seem to turn out correctly.
The content dispositon header is there, but it seems to be in the wrong 
place and my email
client the text file just gets included in the message body, and the 
file name is not visible.

This is the code:

from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.MIMEImage import MIMEImage

   def attch_send(self):
msg = MIMEMultipart()
#msg.add_header("From", sender)
#msg.add_header("To", recv)
msg.add_header('Content-Disposition', 'attachment', 
filename='web-list.txt')
msg.attach(MIMEText(file(os.path.join(save_dir, 
"web-list.txt")).read()))
server = smtplib.SMTP('localhost')
#server.set_debuglevel(1)
server.sendmail(sender, recv, msg.as_string())
server.quit() 
 


What is wrong with that?

I'd really appreciate your suggestions.

Thanks,

Matt
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] How do I open my browser from within a Python program

2006-09-14 Thread nimrodx
Basically a dumb question I can't seem to find the answer to.

How do I execute a bash command from within a python program.

I've been looking through my book on python, and the docs, but can't 
seem to find something so basic (sure it is
there, but I am not looking for the correct terms, I guess).

Sorry,

Matt

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor