On Tue, May 24, 2011 at 05:52:23PM +0100, Max Bolingbroke wrote:
> On 24 May 2011 02:16, Ian Lynagh wrote:
> > On Wed, May 18, 2011 at 11:14:08PM +0100, Max Bolingbroke wrote:
> >> On 18 May 2011 22:54, Mark Lentczner wrote:
> >> > The range is U+EF80 through U+EFFF, called "Reserved for encoding
On 24 May 2011 02:16, Ian Lynagh wrote:
> On Wed, May 18, 2011 at 11:14:08PM +0100, Max Bolingbroke wrote:
>> On 18 May 2011 22:54, Mark Lentczner wrote:
>> > The range is U+EF80 through U+EFFF, called "Reserved for encoding hacks".
>>
>> OK, I've applied another patch so we match this.
>
> So ho
On Wed, May 18, 2011 at 11:14:08PM +0100, Max Bolingbroke wrote:
> On 18 May 2011 22:54, Mark Lentczner wrote:
> > The range is U+EF80 through U+EFFF, called "Reserved for encoding hacks".
>
> OK, I've applied another patch so we match this.
So how do I fix the "rm a*" program below now? (if the
On 18 May 2011 22:54, Mark Lentczner wrote:
> The range is U+EF80 through U+EFFF, called "Reserved for encoding hacks".
OK, I've applied another patch so we match this. Here's hoping that
has finally put this issue to rest :-)
> On a related note, If we want to be able to round trip file names t
On Wed, May 18, 2011 at 2:28 AM, Max Bolingbroke wrote:
> > U+F1E00 ~ U+F1EFF -- for "Fie! we need to encode bad encodings!"
> >
> > We can (I'll be happy to) register this with the unofficial registory(2).
>
I've prepared a draft for the registry and submitted it…. Only to have it
pointed out t
On Wed, May 18, 2011 at 1:36 AM, Max Bolingbroke wrote:
>
> Aha! You go out of your way to detect and replace them. Interesting!
>
Yes, and necessary, as otherwise text is open to data corruption.
___
Cvs-ghc mailing list
Cvs-ghc@haskell.org
http://www
On 15 May 2011 18:08, Mark Lentczner wrote:
> We can increase the unlikeliness of colliding with them by using
> an known unused range. I've looked at relevant on-line sources(1,2) and
> suggest:
>
> U+F1E00 ~ U+F1EFF -- for "Fie! we need to encode bad encodings!"
>
> We can (I'll be happy to) reg
On 17 May 2011 19:50, Bryan O'Sullivan wrote:
> Any attempt to pack a String into a Text will replace UTF-16 surrogates with
> U+FFFD:
> https://github.com/bos/text/blob/master/Data/Text/Internal.hs#L87
> https://github.com/bos/text/blob/master/Data/Text.hs#L363
Aha! You go out of your way to det
On Mon, May 16, 2011 at 9:22 AM, Max Bolingbroke wrote:
> This is a key point - I wonder whether you have in mind a particular
> bit of code using the "text" package that will fail if we use lone
> surrogates as escapes?
>
Any attempt to pack a String into a Text will replace UTF-16 surrogates w
On 15 May 2011 18:08, Mark Lentczner wrote:
> other hand, Haskell software generally does presume valid Unicode, and the
> broken surrogates will break things, for example the Text package. PUA
> characters will work with all Haskell software.
This is a key point - I wonder whether you have in mi
I'll push back... and apologize for perhaps making this all seem more
complicated that it probably is :-)
I think, all things given, the use of private use area (PUA) characters is
far preferable. With the exception of small ranges used by Apple &
Microsoft, PUA characters exchanging in the wi
On 11 May 2011 13:36, Max Bolingbroke wrote:
> I thought you were arguing against choice 1 and in favour of 2 in your
> initial message?
I've pushed my implementation pretty much as it was at the beginning
of this thread to master so it can go into 7.2. Please let me know of
any problems you enco
On 11 May 2011 00:40, Mark Lentczner wrote:
> That is why the Python approach hides these beasts in a non-legal part of
> the code space.
Naturally. The choice is clear. The escapes should use either:
1. The surrogate code points, in which case we can roundtrip any
string but we might confuse U
> File paths that don't decode.
> File paths with a small range of private use characters.
>
> It was always my intention to allow roundtripping of arbitrary
> bytestrings through String. I don't think that the middle ground
> (where you can *read in* a filename without error but not write it out
On 10/05/2011 15:29, Max Bolingbroke wrote:
On 18 April 2011 12:48, Simon Marlow wrote:
I'm not sure about the motivation for the factoring here, You've added an
extra member to BufferCodec:
+ recover :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to),
but this always seems to be
On 18 April 2011 12:48, Simon Marlow wrote:
> I'm not sure about the motivation for the factoring here, You've added an
> extra member to BufferCodec:
>
> + recover :: Buffer from -> Buffer to -> IO (Buffer from, Buffer to),
>
> but this always seems to be instantiated by either recoverDecode or
On 7 May 2011 17:38, Mark Lentczner wrote:
> We have a choice. The current proposal maps the two following classes of
> file paths onto the same string, and so when encoding back to the system we
> must choose which it is -- the other class getting the short-end of the
> stick:
>
> File paths that
(Crud - Simon just pointed out that I accidentally sent my reply to just
him, not the list. D'oh! -- Sorry for the tardy reply-all, all!)
On Wed, Apr 20, 2011 at 2:59 AM, Simon Marlow wrote:
> So that means filenames that are not legal in the current encoding won't
> round-trip? But wasn't that
On Tue, Apr 12, 2011 at 01:05:41PM +0100, Max Bolingbroke wrote:
>
> As you may know, I've been working on improving GHC's support for
> Unicode.
I think we're happy to have someone who knows what they're doing working
on this, but I'm not sure what the status is. Are you waiting for us to
do som
On 18/04/2011 21:46, Mark Lentczner wrote:
(A minor point: I think your definition D10, rather than D76,
is closest to what GHC implements as Char, since you can for
example evaluate (length "\xD800") with no complaints
Yikes - I thought earlier versions of GHC wouldn't evaluate "\xD
>
> (A minor point: I think your definition D10, rather than D76, is closest to
> what GHC implements as Char, since you can for example evaluate (length
> "\xD800") with no complaints
Yikes - I thought earlier versions of GHC wouldn't evaluate "\xD800". So you
are right - GHC seems to be D10, but
On 15/04/2011 09:37, Max Bolingbroke wrote:
On 14 April 2011 13:40, Simon Marlow wrote:
Suffice to say, this conversation is now over my head :-) So I defer to you
guys; I'm happy with whatever solution you come up with.
I was hoping to get your input on the general structure of the code
chan
On 14 April 2011 13:40, Simon Marlow wrote:
> Suffice to say, this conversation is now over my head :-) So I defer to you
> guys; I'm happy with whatever solution you come up with.
I was hoping to get your input on the general structure of the code
changes - Mark's proposal only relates to exactl
On 12/04/2011 22:04, Max Bolingbroke wrote:
Hi Mark,
Thanks for your detailed response.
(A minor point: I think your definition D10, rather than D76, is
closest to what GHC implements as Char, since you can for example
evaluate (length "\xD800") with no complaints - this comes back to
Bryan's e
Hi Mark,
Thanks for your detailed response.
(A minor point: I think your definition D10, rather than D76, is
closest to what GHC implements as Char, since you can for example
evaluate (length "\xD800") with no complaints - this comes back to
Bryan's earlier reply to this thread. Of course, you ca
Indeed, POSIX has made a mess of things, hasn't it?
That said, I don't think applying PEP-383 here would make things better for
Haskell. Please bear with this background:
*Background*
Haskell 98 and Haskell 2000 both define the type Char this way:
> The character type Char is an enumeration whos
On Tue, Apr 12, 2011 at 5:05 AM, Max Bolingbroke wrote:
> a) When decoding a byte sequence to a String (which in GHC is
> typically a sequence of 16-bit values representing a UTF-16 encoded
> Unicode string), any bytes in the input which are undecodable are
> represented in the String as a unico
Hi,
As you may know, I've been working on improving GHC's support for
Unicode. In particular, I have been trying to achieve the following:
1. Use the locale encoding to decode command line arguments,
environment variables and file names from e.g. the System.Directory
functions
2. Implement F
28 matches
Mail list logo