Hi,
I am Japanese speaker and "this is a 文件 vi 打的" looks to me 6 words.
(Please note both Japanese and Chinese rarely use space as
word separator, so this should be parsed by english syntax. Also
Japanese tends to treat pair of Kanji as a word. For example,
Kanji is 漢字.)
I count
Neo Anderson, le Fri 06 Feb 2009 15:50:34 -0800, a écrit :
> If I remember correctly that there is a mapping table, so possibly this can
> be done. But of course, perhaps this is just my wishful thinking.
The problem is that posix says
`The wc utility shall consider a word to be a non-zero-lengt
ps this is just my wishful
thinking.
My English is not very good. Hope my reply is not rude.
Many thanks for your help,
--- On Fri, 6/2/09, Samuel Thibault wrote:
> From: Samuel Thibault
> Subject: Re: coreutils wc count multi bytes question
> To: "Neo Anderson"
> C
Hello,
Neo Anderson, le Fri 06 Feb 2009 15:18:51 -0800, a écrit :
> this is a 文件 vi 打的
>
> The manual words count are 8 characters.
How do you count that?
> But the output of wc -w is 6. It seems like it is separated as token by white
> space. So the characters of Chinese which concatenates to
Hi
Not very sure whether this is the right place to ask. But after searching the
mailing list at http://www.debian.org/MailingLists/subscribe, I can't find a
better one to post my question. So ask it here.
My question is - does wc can count multi bytes characters, such as Big5/ UTF-8
Chinese?
5 matches
Mail list logo