Re: coreutils wc count multi bytes question

2009-02-06 Thread Osamu Aoki
Hi, I am Japanese speaker and "this is a 文件 vi 打的" looks to me 6 words. (Please note both Japanese and Chinese rarely use space as word separator, so this should be parsed by english syntax. Also Japanese tends to treat pair of Kanji as a word. For example, Kanji is 漢字.) I count

Re: coreutils wc count multi bytes question

2009-02-06 Thread Samuel Thibault
Neo Anderson, le Fri 06 Feb 2009 15:50:34 -0800, a écrit : > If I remember correctly that there is a mapping table, so possibly this can > be done. But of course, perhaps this is just my wishful thinking. The problem is that posix says `The wc utility shall consider a word to be a non-zero-lengt

Re: coreutils wc count multi bytes question

2009-02-06 Thread Neo Anderson
ps this is just my wishful thinking. My English is not very good. Hope my reply is not rude. Many thanks for your help, --- On Fri, 6/2/09, Samuel Thibault wrote: > From: Samuel Thibault > Subject: Re: coreutils wc count multi bytes question > To: "Neo Anderson" > C

Re: coreutils wc count multi bytes question

2009-02-06 Thread Samuel Thibault
Hello, Neo Anderson, le Fri 06 Feb 2009 15:18:51 -0800, a écrit : > this is a 文件 vi 打的 > > The manual words count are 8 characters. How do you count that? > But the output of wc -w is 6. It seems like it is separated as token by white > space. So the characters of Chinese which concatenates to

coreutils wc count multi bytes question

2009-02-06 Thread Neo Anderson
Hi Not very sure whether this is the right place to ask. But after searching the mailing list at http://www.debian.org/MailingLists/subscribe, I can't find a better one to post my question. So ask it here. My question is - does wc can count multi bytes characters, such as Big5/ UTF-8 Chinese?