Re: [Python-Dev] Bytes path related questions for Guido

2014-08-29 Thread Walter Dörwald
On 28 Aug 2014, at 19:54, Glenn Linderman wrote: On 8/28/2014 10:41 AM, R. David Murray wrote: On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: [...] Also for cases where the data stream is *supposed* to be in a given encoding, but contains undecodable bytes. Showing the stuff tha

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread R. David Murray
On Thu, 28 Aug 2014 10:54:44 -0700, Glenn Linderman wrote: > On 8/28/2014 10:41 AM, R. David Murray wrote: > > On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman > > wrote: > >> On 8/28/2014 12:30 AM, MRAB wrote: > >>> There'll be a surrogate escape if a byte couldn't be decoded, but just > >>

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread Glenn Linderman
On 8/28/2014 10:41 AM, R. David Murray wrote: On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: On 8/28/2014 12:30 AM, MRAB wrote: On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wr

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread R. David Murray
On Thu, 28 Aug 2014 10:15:40 -0700, Glenn Linderman wrote: > On 8/28/2014 12:30 AM, MRAB wrote: > > On 2014-08-28 05:56, Glenn Linderman wrote: > >> On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: > >>> Glenn Linderman writes: > >>> > On 8/26/2014 4:31 AM, MRAB wrote: > >>> > > On 2014-08-26

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread Glenn Linderman
On 8/28/2014 12:30 AM, MRAB wrote: On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > >

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-28 Thread MRAB
On 2014-08-28 05:56, Glenn Linderman wrote: On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: > > Glenn Linderman writes: > > > And further, replacement could be a vector of 128 characters, to do > > > immediate transcoding, > > > > Using what encoding? > > The vector would contain the transcoding. Each

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Glenn Linderman
On 8/27/2014 6:08 PM, Stephen J. Turnbull wrote: Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes(s, replacement='\uFFFD') > > > > If you

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Stephen J. Turnbull
Glenn Linderman writes: > On 8/26/2014 4:31 AM, MRAB wrote: > > On 2014-08-26 03:11, Stephen J. Turnbull wrote: > >> Nick Coghlan writes: > > How about: > > > > replace_surrogate_escapes(s, replacement='\uFFFD') > > > > If you want them removed, just pass an empty string as the > > re

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-27 Thread Glenn Linderman
On 8/26/2014 4:31 AM, MRAB wrote: On 2014-08-26 03:11, Stephen J. Turnbull wrote: Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replac

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-26 Thread MRAB
On 2014-08-26 03:11, Stephen J. Turnbull wrote: Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') How about: r

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-25 Thread Stephen J. Turnbull
Nick Coghlan writes: > "purge_surrogate_escapes" was the other term that occurred to me. "purge" suggests removal, not replacement. That may be useful too. neutralize_surrogate_escapes(s, remove=False, replacement='\uFFFD') maybe? (Of course the remove argument is feature creep, so I'm only

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 Aug 2014 03:55, "Guido van Rossum" wrote: > > Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised). Thanks! > The status of pathlib is a little unclear to me -- is the

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Guido van Rossum
Yes on #1 -- making the low-level functions more usable for edge cases by supporting bytes seems fine (as long as the support for strings, where it exists, is not compromised). The status of pathlib is a little unclear to me -- is there a plan to eventually support bytes or not? For #2 I think yo

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 25 August 2014 00:23, Antoine Pitrou wrote: > Le 24/08/2014 09:04, Nick Coghlan a écrit : >> Serhiy & Ezio convinced me to scale this one back to a proposal for >> "codecs.clean_surrogate_escapes(s)", which replaces surrogates that >> may be produced by surrogateescape (that's what string.clean

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Antoine Pitrou
Le 24/08/2014 09:04, Nick Coghlan a écrit : On 24 August 2014 14:44, Nick Coghlan wrote: 2. Should we add some additional helpers to the string module for dealing with surrogate escaped bytes and other techniques for smuggling arbitrary binary data as text? My proposal [3] is to add: * string

Re: [Python-Dev] Bytes path related questions for Guido

2014-08-24 Thread Nick Coghlan
On 24 August 2014 14:44, Nick Coghlan wrote: > 2. Should we add some additional helpers to the string module for > dealing with surrogate escaped bytes and other techniques for > smuggling arbitrary binary data as text? > > My proposal [3] is to add: > > * string.escaped_surrogates (constant with