Hi Alan, I found a pretty complicated way to do it (Alan's way is way more elegant). In case someone is searching the archive, maybe they will find something in it that is useful. It uses the regular experessions module.
import re def dehexlify_websites(fle): # get binary data inpt = open(fle,'rb') dat = inpt.read() inpt.close() #strip out the hex "0"'s pattern = r"\x00" res = re.sub(pattern, "", dat) #----------------------------------------- #it seemed easier to do it in two passes #create the pattern regular expression for the stuff we want to keep web = re.compile( r"(?P<addr>[/a-zA-Z0-9\.\-:\_%\?&=]+)" ) #grab them all and put them in temp variable res = re.findall(web,res) tmp = "" #oops need some new lines at the end of each one to mark end of #web address, #and need it all as one string for i in res: tmp = tmp + i+'\n' #compile reg expr for everything between :// and the newline web2 = re.compile(r":/(?P<address>[^\n]+)") #find the websites #make them into an object we can pass res2 = re.findall(web2,tmp) #return 'em return res2 Thanks Alan, Matt Alan Gauld wrote: >> if you look carefully at the string below, you see >> that in amongst the "\x" stuff you have the text I want: >> z tfile://home/alpha > > OK, those characters are obviously string data and it looks > like its using 16 bit characters, so yes some kind of > unicode string. In between and at the end ;lies the binary > data in whatever format it is. > >>>> Here is the first section of the file: >>>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01\xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x00l' >>>> >>>> > > >> In a hex editor it turns out to be readable and sensible url's with >> spaces between each digit, and a bit of crud at the end of url's, >> just as above. > > Here's a fairly drastic approach: > >>>> s = >>>> '\x00\x00\x00\x02\xb8,\x08\x9f\x00\x00z\xa8\x00\x00\x01\xf4\x00\x00\x01 >>>> > \xf4\x00\x00\x00t\x00f\x00i\x00l\x00e\x00:\x00/\x00h\x00o\x00m\x00e\x00/\x00a\x > > > 00l' >>>> ''.join([c for c in s if c.isalnum() or c in '/: ']) > 'ztfile:/home/al' >>>> > > But it gets close... > > Alan g. > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor