On Wed, Feb 8, 2012 at 5:46 PM, Garry Willgoose < garry.willgo...@newcastle.edu.au> wrote:
> I'm reading a file output by the system utility WMIC in windows (so I can > track CPU usage by process ID) and the text file WMIC outputs seems to have > extra characters in I've not seen before. > > I use os.system('WMIC /OUTPUT:c:\cpu.txt PROCESS GET ProcessId') to output > the file and parse file c:\cpu.txt > > The first few lines of the file look like this in notepad > > ProcessId > 0 > 4 > 568 > 624 > 648 > > > I input the data with the lines > > infile = open('c:\cpu.txt','r') > infile.readline() > infile.readline() > infile.readline() > > the readline()s yield the following output > > '\xff\xfeP\x00r\x00o\x00c\x00e\x00s\x00s\x00I\x00d\x00 \x00 \x00\r\x00\n' > '\x000\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n' > '\x004\x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00 \x00\r\x00\n' > > Now for the first line the title 'ProcessId' is in this string but the > individual characters are separated by '\x00' and at least for the first > line of the file there is an extra '\xff\xfe'. For subsequent its just > '\x00. Now I can just replace the '\x**' with '' but that seems a bit > inelegant. I've tried various options on the open 'rU' and 'rb' but no > effect. > > Does anybody know what the rubbish characters are and what has caused the. > I'm using the latest Enthought python if that matters. > > You're trying to read a Unicode text file byte-by-byte. It'll end in tears... The "\xff\xfe" at the beginning is the Byte Order Marker or BOM. Here's a quick primer on Unicode: http://www.joelonsoftware.com/articles/Unicode.html
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor