On 28/06/2012 20:48, James Chapman wrote:
The name of the file I'm trying to open comes from a UTF-16 encoded text file, I'm then using regex to extract the string (filename) I need to open.
OK. Let's focus on that. For the moment -- although it might well be very relevant -- I'm going to ignore the regex side of things. It's always trying to portray things like this because there's such confusion between what characters I write to represent the data and the data represented by those characters themselves! OK, let's adopt a convention whereby I represent the data as they kind of thing you'd see in a hex editor. This obviously isn't how it appear in a a text file but hopefully it'll be clear what's going on. I have a filename £10.txt -- that is the characters: POUND SIGN DIGIT ONE DIGIT ZERO FULL STOP LATIN SMALL LETTER T LATIN SMALL LETTER X LATIN SMALL LETTER T I have -- prior to your getting there -- placed this in a text file which I guarantee is UTF16-encoded. For the purposes of illustration I shall do that in Python code here: <code> with open ("filedata.dat", "wb") as f: f.write (u"£10.txt".encode ("utf16")) </code> The file is named "filedata.dat" and looks like this (per our convention): ff fe a3 00 31 00 30 00 2e 00 74 00 78 00 74 00 I now want to read the contents of the that file as a filename and open the file in question. Here goes: <code> # # Open the file and extract the data as a set of # bytes into a Python (byte) string. # with open("filedata.dat", "rb") as f: data = f.read() # # Convert the data into a unicode object by decoding # the UTF16 bytes # filename = data.decode("utf16") # filename is now a unicode object which, depending on # what your console offers, will either display as # £10.txt or as \xa310.txt or as something else. # # Open that file by passing the unicode object directly # to Python's file-opening mechanism # ten_pound_txt = open (filename, "rb") print ten_pound_txt.read () # whatever ten_pound_txt.close () </code> I don't know if that makes anything clearer for you, but at least it gives you something to try out. The business with the regex clouds the issue: regex can play a little awkwardly with Unicode, so you'd have to show some code if you need help there. TJG _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor