gmarsha11 wrote:
I'm not sure about the file's encoding. How do I tell?
If you have "file" installed, its easy:
$ file Document.txt
Document.txt: Unicode text, UTF-16, little-endian
When I create a new file with vi, I can read the file with no problem. The
output is normal.
Look at the bottom line, vi tells you what kind of "text" it is... sort of:
"Document.txt" [converted][dos] 1L, 20C
The "converted" means it wasn't regular text, the "dos" means it has
CR-LF line endings.
If you like to look at what it really is, try:
$ od -tx2z Document.txt
0000000 feff 0054 0068 0069 0073 0020 0069 0073 >..T.h.i.s. .i.s.<
0000020 0020 0061 0062 0063 0020 0066 0069 006c > .a.b.c. .f.i.l.<
0000040 0065 000d 000a >e.....<
0000046
So your spaces are really null bytes (some fonts put little smileys), vi
was wrong no CR in there.
These particular text files that I am working with were created by HP Data
Protector. I can easily parse and manipulate these files on HPUX servers,
but the Windows servers lack that functionality. I thought Cygwin would
help with this.
How do I tell what the file's encoding is?
As pointed out by Gary Johnson, `cat Document.txt` doesn't result in
spaced text, it just shows "ÿþThis is abc file" (this is using mrxvt and
Bitstream Vera Sans mono font).
Better use the file command to see what it is. And no, there are no
converting software that I know of, Cygwin 1.5.x just doesn't support
wide characters.
--
René Berber
--
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html
Documentation: http://cygwin.com/docs.html
FAQ: http://cygwin.com/faq/