Re: Extra spaces in text files in cygwin

René Berber Tue, 10 Jun 2008 18:53:37 -0700

gmarsha11 wrote:

I'm not sure about the file's encoding.  How do I tell?


If you have "file" installed, its easy:

$ file Document.txt
Document.txt: Unicode text, UTF-16, little-endian

When I create a new file with vi, I can read the file with no problem.  The
output is normal.


Look at the bottom line, vi tells you what kind of "text" it is... sort of:

"Document.txt" [converted][dos] 1L, 20C

The "converted" means it wasn't regular text, the "dos" means it hasCR-LF line endings.


If you like to look at what it really is, try:

$ od -tx2z Document.txt
0000000 feff 0054 0068 0069 0073 0020 0069 0073  >..T.h.i.s. .i.s.<
0000020 0020 0061 0062 0063 0020 0066 0069 006c  > .a.b.c. .f.i.l.<
0000040 0065 000d 000a                           >e.....<
0000046

So your spaces are really null bytes (some fonts put little smileys), viwas wrong no CR in there.

These particular text files that I am working with were created by HP Data
Protector.  I can easily parse and manipulate these files on HPUX servers,
but the Windows servers lack that functionality.  I thought Cygwin would
help with this.

How do I tell what the file's encoding is?

As pointed out by Gary Johnson, `cat Document.txt` doesn't result inspaced text, it just shows "ÿþThis is abc file" (this is using mrxvt andBitstream Vera Sans mono font).

Better use the file command to see what it is. And no, there are noconverting software that I know of, Cygwin 1.5.x just doesn't supportwide characters.

--
René Berber


--
Unsubscribe info:      http://cygwin.com/ml/#unsubscribe-simple
Problem reports:       http://cygwin.com/problems.html
Documentation:         http://cygwin.com/docs.html
FAQ:                   http://cygwin.com/faq/

Re: Extra spaces in text files in cygwin

Reply via email to