-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Nimrodx,
In case you haven't found a solution yet, I developed a program to encode/decode stuff similar to this. You may want to take a look at it at http://home.townisp.com/~arobert/python/file_encoder.py nimrodx wrote: > Hi Alan, > > I found a pretty complicated way to do it (Alan's way is way more elegant). > In case someone is searching the archive, maybe they will find something > in it that is useful. > It uses the regular experessions module. > > import re > > def dehexlify_websites(fle): > # get binary data > inpt = open(fle,'rb') > dat = inpt.read() > inpt.close() > #strip out the hex "0"'s > pattern = r"\x00" > res = re.sub(pattern, "", dat) > #----------------------------------------- > #it seemed easier to do it in two passes > #create the pattern regular expression for the stuff we want to keep > web = re.compile( > r"(?P<addr>[/a-zA-Z0-9\.\-:\_%\?&=]+)" > ) > #grab them all and put them in temp variable > res = re.findall(web,res) > tmp = "" > #oops need some new lines at the end of each one to mark end of > #web address, > #and need it all as one string > for i in res: > tmp = tmp + i+'\n' > #compile reg expr for everything between :// and the newline > web2 = re.compile(r":/(?P<address>[^\n]+)") > #find the websites > #make them into an object we can pass > res2 = re.findall(web2,tmp) > #return 'em > return res2 > > > Thanks Alan, > > Matt > > > Alan Gauld wrote: >>> if you look carefully at the string below, you see >>> that in amongst the "\x" stuff you have the text I want: >>> z tfile://home/alpha >> OK, those characters are obviously string data and it looks >> like its using 16 bit characters, so yes some kind of >> unicode string. In between and at the end ;lies the binary >> data in whatever format it is. >> >>>>> Here is the first section of the file: >>>>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l' >>>>> >>>>> >> >>> In a hex editor it turns out to be readable and sensible url's with >>> spaces between each digit, and a bit of crud at the end of url's, >>> just as above. >> Here's a fairly drastic approach: >> >>>>> s = >>>>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01 >>>>> >> \xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x >> >> >> 00l' >>>>> ''.join([c for c in s if c.isalnum() or c in '/: ']) >> 'ztfile:/home/al' >> But it gets close... >> >> Alan g. >> > > _______________________________________________ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (MingW32) Comment: GnuPT 2.7.2 iD8DBQFE6ySXDvn/4H0LjDwRApntAJ0Wd0ecE/KFUSbbKQSRmrV72yyvfwCeOwAQ Gjg5IK0WG0YT6keGlDw0q94= =7QB2 -----END PGP SIGNATURE----- _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor