Re: default character encoding for everything in debian

2009-08-14 Thread Osamu Aoki
Hi, (I want to see as much UTF-8 support. These days, it is not bad. Try using "sed" with UTF-8. It works! Of course with some understandable gliches.) On Mon, Aug 10, 2009 at 08:55:27PM +0200, Norbert Preining wrote: > On Mo, 10 Aug 2009, Roger Leigh wrote: > > Of course there's a penalty f

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann
On Thu, 13 Aug 2009 02:03:43 +0100 Roger Leigh wrote: > On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote: > > On Wed, 12 Aug 2009 13:03:30 +0100 > > Roger Leigh wrote: > > > > > On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: > > > > I'm not sure, whether a conclusio

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 11:44:36PM +0200, Harald Braumann wrote: > On Wed, 12 Aug 2009 13:03:30 +0100 > Roger Leigh wrote: > > > On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: > > > I'm not sure, whether a conclusion is already reached. > > > > > > 1. apt-get install mysql > > > 2.

Re: default character encoding for everything in debian

2009-08-12 Thread Harald Braumann
On Wed, 12 Aug 2009 13:03:30 +0100 Roger Leigh wrote: > On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: > > I'm not sure, whether a conclusion is already reached. > > > > 1. apt-get install mysql > > 2. enter mysql client > > 3. create database test; create table test( test char(10)

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Roger Leigh, le Wed 12 Aug 2009 11:30:50 +0100, a écrit : > > > The default is UTF-32 or UTF-16, whichever corresponds to the width of > > > wchar_t. > > > > This documentation is bogus BTW. It should read "UCS-4 or UCS-2". > > It's "strictly" correct according to the standard. > http://en.wikip

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 01:18:12PM +0200, Thomas Koch wrote: > I'm not sure, whether a conclusion is already reached. > > 1. apt-get install mysql > 2. enter mysql client > 3. create database test; create table test( test char(10) ); > > Replace mysql with whatever application you like. > > What

Re: default character encoding for everything in debian

2009-08-12 Thread Thomas Koch
It's impressing how quickly threads on this list grow big. :-) I'm not sure, whether a conclusion is already reached. 1. apt-get install mysql 2. enter mysql client 3. create database test; create table test( test char(10) ); Replace mysql with whatever application you like. What should be the

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 07:54:33AM +0200, Giacomo A. Catenazzi wrote: > Samuel Thibault wrote: > > Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : > >> while length(str) in any language up to the 1990s was a mere > >> substraction, now we must go through the string checking each byte to >

Re: default character encoding for everything in debian

2009-08-12 Thread Roger Leigh
On Wed, Aug 12, 2009 at 09:56:49AM +0200, Samuel Thibault wrote: > Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit : > > Bastian Blank wrote: > > > On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: > > >> In article <20090811183800.ge5...@const.famille.thibault.fr> y

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Giacomo A. Catenazzi, le Wed 12 Aug 2009 08:03:30 +0200, a écrit : > Bastian Blank wrote: > > On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: > >> In article <20090811183800.ge5...@const.famille.thibault.fr> you wrote: > >>> Not necessarily. Any sane implementation should just use

Re: default character encoding for everything in debian

2009-08-12 Thread Samuel Thibault
Giacomo A. Catenazzi, le Wed 12 Aug 2009 07:54:33 +0200, a écrit : > Samuel Thibault wrote: > > Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : > >> while length(str) in any language up to the 1990s was a mere > >> substraction, now we must go through the string checking each byte to > >>

Re: default character encoding for everything in debian

2009-08-11 Thread Giacomo A. Catenazzi
Bastian Blank wrote: > On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: >> In article <20090811183800.ge5...@const.famille.thibault.fr> you wrote: >>> Not necessarily. Any sane implementation should just use wchar_t >> Which could be UTF16 and therefore still has complicatd length

Re: default character encoding for everything in debian

2009-08-11 Thread Giacomo A. Catenazzi
Samuel Thibault wrote: > Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : >> while length(str) in any language up to the 1990s was a mere >> substraction, now we must go through the string checking each byte to >> see if it is a Unicode marker and substract the appropriate number of >> byt

Re: default character encoding for everything in debian

2009-08-11 Thread Harald Braumann
On Tue, 11 Aug 2009 13:28:08 -0500 Gunnar Wolf wrote: > Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: > > > There are a lot of users out there that are not willing to pay the > > > price for increased generality. > > > > Don't you mean s/users/programmers? As a user I don't see w

Re: default character encoding for everything in debian

2009-08-11 Thread Adam Borowski
On Mon, Aug 10, 2009 at 09:04:37PM +0100, Roger Leigh wrote: > If having a C.UTF-8 locale always available for system services is > required for them to fully support UTF-8, then that needs adding to > glibc. It would also bring significant speed increase. Since about everything calls setlocale()

Re: default character encoding for everything in debian

2009-08-11 Thread Jakub Wilk
* Bastian Blank , 2009-08-11, 22:24: > Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. No, wchar_t is UCS-4 (or UCS-2 in esoteric implementations like Windows). And in the most esoteric (while still co

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault
Bernd Eckenfels, le Tue 11 Aug 2009 21:40:35 +0200, a écrit : > In article <20090811183800.ge5...@const.famille.thibault.fr> you wrote: > > Not necessarily. Any sane implementation should just use wchar_t > > Which could be UTF16 and therefore still has complicatd length semantics. ?? wchar_t

Re: default character encoding for everything in debian

2009-08-11 Thread Bastian Blank
On Tue, Aug 11, 2009 at 09:40:35PM +0200, Bernd Eckenfels wrote: > In article <20090811183800.ge5...@const.famille.thibault.fr> you wrote: > > Not necessarily. Any sane implementation should just use wchar_t > Which could be UTF16 and therefore still has complicatd length semantics. No, wchar_t

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels
In article <20090811183800.ge5...@const.famille.thibault.fr> you wrote: > Not necessarily. Any sane implementation should just use wchar_t Which could be UTF16 and therefore still has complicatd length semantics. And even with UTF32 there are combining characters. Sadly. But the length could b

Re: default character encoding for everything in debian

2009-08-11 Thread Bernd Eckenfels
In article <20090811182041.gd19...@cajita.gateway.2wire.net> you wrote: > encodings are _completely_ incompatible with UTF8, so it is just not > possible to tolerate broken text every now and then. Everything just > breaks completely. Or everything works out of the box, when you use it correctly..

Re: default character encoding for everything in debian

2009-08-11 Thread Samuel Thibault
Gunnar Wolf, le Tue 11 Aug 2009 13:28:08 -0500, a écrit : > while length(str) in any language up to the 1990s was a mere > substraction, now we must go through the string checking each byte to > see if it is a Unicode marker and substract the appropriate number of > bytes. Not necessarily. Any sa

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf
Harald Braumann dijo [Tue, Aug 11, 2009 at 01:33:58AM +0200]: > > There are a lot of users out there that are not willing to pay the > > price for increased generality. > > Don't you mean s/users/programmers? As a user I don't see what price I > pay. I only see advantages in having a consistent en

Re: default character encoding for everything in debian

2009-08-11 Thread Gunnar Wolf
Norbert Preining dijo [Mon, Aug 10, 2009 at 08:55:27PM +0200]: > On Mo, 10 Aug 2009, Roger Leigh wrote: > > Of course there's a penalty for certain operations. But UTF-8 is about > > as compact as an extended encoding is going to get. > > Rubbish. You know why in Japan and other Asian countries U

Re: default character encoding for everything in debian

2009-08-10 Thread Samuel Thibault
Harald Braumann, le Tue 11 Aug 2009 01:33:58 +0200, a écrit : > Or do you mean the user pays the price, because if the encoding is set > to UTF-8 then performance would suffer? In that case, I'd love to see > some real life numbers. I doubt the difference would be noticeable. Google utf-8 grep pe

Re: default character encoding for everything in debian

2009-08-10 Thread Harald Braumann
On Mon, 10 Aug 2009 13:45:40 +0200 Siggy Brentrup wrote: > On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: > > Hi, > > > > I've an issue, that I forgot to set the character encoding of > > tomcat to utf-8 after reinstalling a server. > > Now, before I report a wishlist(?) bug to tomcat,

Re: default character encoding for everything in debian

2009-08-10 Thread brian m. carlson
On Mon, Aug 10, 2009 at 09:42:18PM +0100, Roger Leigh wrote: > On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote: > > I didn't call utf-8 itself rubbish, I am myself a strong proponent for > > utf-8, only your quote that it is "about as compact as an extended encoding > > is going to

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 09:49:34PM +0200, Norbert Preining wrote: > On Mo, 10 Aug 2009, Philipp Kern wrote: > > >> Of course there's a penalty for certain operations. But UTF-8 is about > > >> as compact as an extended encoding is going to get. > [...] > > make UTF-8 bad per se to call it "rubbish

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 02:06:44PM +0200, Giacomo A. Catenazzi wrote: > Thomas Koch wrote: > >I've an issue, that I forgot to set the character encoding of > >tomcat to utf-8 after reinstalling a server. > >Now, before I report a wishlist(?) bug to tomcat, I want to ask > >(and invite to discuss) s

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining
On Mo, 10 Aug 2009, Philipp Kern wrote: > >> Of course there's a penalty for certain operations. But UTF-8 is about > >> as compact as an extended encoding is going to get. [...] > make UTF-8 bad per se to call it "rubbish". I didn't call utf-8 itself rubbish, I am myself a strong proponent for u

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup
On Mon, Aug 10, 2009 at 19:53 +0100, Roger Leigh wrote: > On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote: > > While utf-8 covers the broadest set of character glyphs possible, it > > suffers from size as well as performance penalties. Characters no > > longer are guaranteed to fit

Re: default character encoding for everything in debian

2009-08-10 Thread Philipp Kern
On 2009-08-10, Norbert Preining wrote: > On Mo, 10 Aug 2009, Roger Leigh wrote: >> Of course there's a penalty for certain operations. But UTF-8 is about >> as compact as an extended encoding is going to get. > Rubbish. You know why in Japan and other Asian countries UTF8 is not > so common? Beca

Re: default character encoding for everything in debian

2009-08-10 Thread Norbert Preining
On Mo, 10 Aug 2009, Roger Leigh wrote: > Of course there's a penalty for certain operations. But UTF-8 is about > as compact as an extended encoding is going to get. Rubbish. You know why in Japan and other Asian countries UTF8 is not so common? Because many of their glyphs need 4 (four!) bytes,

Re: default character encoding for everything in debian

2009-08-10 Thread Roger Leigh
On Mon, Aug 10, 2009 at 01:45:40PM +0200, Siggy Brentrup wrote: > On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: > > Hi, > > > > I've an issue, that I forgot to set the character encoding of tomcat to > > utf-8 > > after reinstalling a server. > > Now, before I report a wishlist(?) bug

Re: default character encoding for everything in debian

2009-08-10 Thread Russ Allbery
Josselin Mouette writes: > Now we could concentrate on removing from the archive programs without > proper UTF8 support. There are, sadly, some very useful programs with no adequate replacement that don't have UTF-8 support. tf5, for instance. -- Russ Allbery (r...@debian.org) <

Re: default character encoding for everything in debian

2009-08-10 Thread Josselin Mouette
Le lundi 10 août 2009 à 14:06 +0200, Giacomo A. Catenazzi a écrit : > But let to concentrate to the first task: having a good UTF-8 support > in all programs/terminals/etc. This task should have been completed for etch. Now we could concentrate on removing from the archive programs without proper

Re: default character encoding for everything in debian

2009-08-10 Thread Michal Čihař
Hi Dne Mon, 10 Aug 2009 13:09:21 +0200 Thomas Koch napsal(a): > I've an issue, that I forgot to set the character encoding of tomcat to utf-8 > after reinstalling a server. > Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite > to > discuss) shouldn't utf8 be the defa

Re: default character encoding for everything in debian

2009-08-10 Thread Giacomo A. Catenazzi
Thomas Koch wrote: Hi, I've an issue, that I forgot to set the character encoding of tomcat to utf-8 after reinstalling a server. Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite to discuss) shouldn't utf8 be the default character set everywhere? So when installing

Re: default character encoding for everything in debian

2009-08-10 Thread Siggy Brentrup
On Mon, Aug 10, 2009 at 13:09 +0200, Thomas Koch wrote: > Hi, > > I've an issue, that I forgot to set the character encoding of tomcat to utf-8 > after reinstalling a server. > Now, before I report a wishlist(?) bug to tomcat, I want to ask (and invite > to > discuss) shouldn't utf8 be the defa