Greg Ewing wrote:
> Ron Adam wrote:
>
>> This would apply to codecs that
>> could return either bytes or strings, or strings or unicode, or bytes or
>> unicode.
>
> I'd need to see some concrete examples of such codecs
> before being convinced that they exist, or that they
> couldn't just as we
Ron Adam wrote:
> This would apply to codecs that
> could return either bytes or strings, or strings or unicode, or bytes or
> unicode.
I'd need to see some concrete examples of such codecs
before being convinced that they exist, or that they
couldn't just as well return a fixed type that you
t
Stephen J. Turnbull wrote:
> Doesn't that make base64 non-text by analogy to other "look but don't
> touch" strings like a .gz or vmlinuz?
No, because I can take a piece of base64 encoded data
and use a text editor to manually paste it in with some
other text (e.g. a plain-text (not MIME) mail me
Greg Ewing wrote:
> Ron Adam wrote:
>
>> This uses syntax to determine the direction of encoding. It would be
>> easier and clearer to just require two arguments or a tuple.
>>
>> u = unicode(b, 'encode', 'base64')
>> b = bytes(u, 'decode', 'base64')
>
> The point of the exercise wa
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> (BTW, doesn't the fact that you *can* load an XML file into
Greg> what we call a "text editor" say something?)
Why not answer that question for yourself, and then turn that answer
into a description of "text semantics"?
For me,
Stephen J. Turnbull wrote:
> What you presumably meant was "what would you consider the proper type
> for (P)CDATA?"
No, I mean the whole thing, including all the <...> tags
etc. Like you see when you load an XML file into a text
editor. (BTW, doesn't the fact that you *can* load an
XML file into
Ron Adam wrote:
> This uses syntax to determine the direction of encoding. It would be
> easier and clearer to just require two arguments or a tuple.
>
> u = unicode(b, 'encode', 'base64')
> b = bytes(u, 'decode', 'base64')
The point of the exercise was to avoid using the terms
'en
Delaney, Timothy (Tim) wrote:
> unicode.frombytes(cls, encoding)
unicode.frombytes(encoding) ...
Tim Delaney
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/m
Just van Rossum wrote:
> My preference for bytes -> unicode -> bytes API would be this:
>
> u = unicode(b, "utf8") # just like we have now
> b = u.tobytes("utf8") # like u.encode(), but being explicit
> # about the resulting type
+1 - I was going to write e
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> But the base64 string itself *does* have text semantics.
What do you mean by that? The strings of abstract "characters"
defined by RFC 3548 cannot be concatenated in general, they may only
be split at 4-character intervals, they ca
Just van Rossum <[EMAIL PROTECTED]> wrote:
>
> Ron Adam wrote:
>
> > Josiah Carlson wrote:
> > > Greg Ewing <[EMAIL PROTECTED]> wrote:
> > >>u = unicode(b)
> > >>u = unicode(b, 'utf8')
> > >>b = bytes['utf8'](u)
> > >>u = unicode['base64'](b) # encoding
> > >>b = bytes(u, '
Ron Adam wrote:
> Josiah Carlson wrote:
> > Greg Ewing <[EMAIL PROTECTED]> wrote:
> >>u = unicode(b)
> >>u = unicode(b, 'utf8')
> >>b = bytes['utf8'](u)
> >>u = unicode['base64'](b) # encoding
> >>b = bytes(u, 'base64') # decoding
> >>u2 = unicode['piglatin'](u1) #
Josiah Carlson wrote:
> Greg Ewing <[EMAIL PROTECTED]> wrote:
>>u = unicode(b)
>>u = unicode(b, 'utf8')
>>b = bytes['utf8'](u)
>>u = unicode['base64'](b) # encoding
>>b = bytes(u, 'base64') # decoding
>>u2 = unicode['piglatin'](u1) # encoding
>>u1 = unicode(u2, '
Greg Ewing <[EMAIL PROTECTED]> wrote:
>u = unicode(b)
>u = unicode(b, 'utf8')
>b = bytes['utf8'](u)
>u = unicode['base64'](b) # encoding
>b = bytes(u, 'base64') # decoding
>u2 = unicode['piglatin'](u1) # encoding
>u1 = unicode(u2, 'piglatin') # decoding
Your
Ron Adam wrote:
> 1. We can specify the operation and not be sure of the resulting type.
>
>*or*
>
> 2. We can specify the type and not always be sure of the operation.
>
> maybe there's a way to specify both so it's unambiguous?
Here's another take on the matter. When we're doing
Unicode
Greg Ewing wrote:
> Ron Adam wrote:
>
>> While playing around with the example bytes class I noticed code reads
>> much better when I use methods called tounicode and tostring.
>>
>> b64ustring = b.tounicode('base64')
>> b = bytes(b64ustring, 'base64')
>
> I don't like that, because it c
[My apologies Greg; I meant to send this to the whole list. I really
need a list-reply button in GMail. ]
On 3/1/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> I don't like that, because it creates a dependency
> (conceptually, at least) between the bytes type and
> the unicode type.
I only find hal
Ron Adam wrote:
> While playing around with the example bytes class I noticed code reads
> much better when I use methods called tounicode and tostring.
>
> b64ustring = b.tounicode('base64')
> b = bytes(b64ustring, 'base64')
I don't like that, because it creates a dependency
(conceptua
Bill Janssen wrote:
> No, once it's in a particular encoding it's bytes, no longer text.
The point at issue is whether the characters produced
by base64 are in a particular encoding. According to
my reading of the RFC, they're not.
--
Greg Ewing, Computer Science Dept, +
Nick Coghlan wrote:
>ascii_bytes = orig_bytes.decode("base64").encode("ascii")
>
>orig_bytes = ascii_bytes.decode("ascii").encode("base64")
>
> The only slightly odd aspect is that this inverts the conventional meaning of
> base64 encoding and decoding,
-1. Whatever we do, we shouldn't
I wrote:
> ... I will say that if there were no legacy I'd prefer the tounicode()
> and tostring() (but shouldn't itbe 'tobytes()' instead?) names for Python 3.0.
Scott Daniels replied:
> Wouldn't 'tobytes' and 'totext' be better for 3.0 where text == unicode?
Um... yes. Sorry, I'm not completely
Chermside, Michael wrote:
> ... I will say that if there were no legacy I'd prefer the tounicode()
> and tostring() (but shouldn't itbe 'tobytes()' instead?) names for Python 3.0.
Wouldn't 'tobytes' and 'totext' be better for 3.0 where text == unicode?
--
-- Scott David Daniels
[EMAIL PROTECTED]
> Huh... just joining here but surely you don't mean a text string that
> doesn't use every character available in a particular encoding is
> "really bytes"... it's still a text string...
No, once it's in a particular encoding it's bytes, no longer text.
As you say,
> Keep these two concepts sepa
Ron Adam writes:
> While playing around with the example bytes class I noticed code reads
> much better when I use methods called tounicode and tostring.
[...]
> I'm not suggesting we start using to-type everywhere, just where it
> might make things clearer over decode and encode.
+1
I alwa
Nick Coghlan wrote:
> All the unicode codecs, on the other hand, use encode to get from characters
> to bytes and decode to get from bytes to characters.
>
> So if bytes objects *did* have an encode method, it should still result in a
> unicode object, just the same as a decode method does (beca
Bill Janssen wrote:
> Greg Ewing wrote:
>> Bill Janssen wrote:
>>
>>> bytes -> base64 -> text
>>> text -> de-base64 -> bytes
>> It's nice to hear I'm not out of step with
>> the entire world on this. :-)
>
> Well, I can certainly understand the bytes->base64->bytes side of
> thing too. The "text"
On Tue, 2006-02-28 at 15:23 -0800, Bill Janssen wrote:
> Greg Ewing wrote:
> > Bill Janssen wrote:
> >
> > > bytes -> base64 -> text
> > > text -> de-base64 -> bytes
> >
> > It's nice to hear I'm not out of step with
> > the entire world on this. :-)
>
> Well, I can certainly understand the byte
Bill Janssen wrote:
> Well, I can certainly understand the bytes->base64->bytes side of
> thing too. The "text" produced is specified as using "a 65-character
> subset of US-ASCII", so that's really bytes.
But it then goes on to say that these same characters
are also a subset of EBCDIC. So it s
Greg Ewing wrote:
> Bill Janssen wrote:
>
> > bytes -> base64 -> text
> > text -> de-base64 -> bytes
>
> It's nice to hear I'm not out of step with
> the entire world on this. :-)
Well, I can certainly understand the bytes->base64->bytes side of
thing too. The "text" produced is specified as us
On 2/28/06, Greg Ewing <[EMAIL PROTECTED]> wrote:
> Bill Janssen wrote:
>
> > bytes -> base64 -> text
> > text -> de-base64 -> bytes
>
> It's nice to hear I'm not out of step with
> the entire world on this. :-)
What Bill proposes makes sense to me.
--
--Guido van Rossum (home page: http://www.py
Bill Janssen wrote:
> bytes -> base64 -> text
> text -> de-base64 -> bytes
It's nice to hear I'm not out of step with
the entire world on this. :-)
--
Greg Ewing, Computer Science Dept, +--+
University of Canterbury, | Carpe post meridiam!
Bill Janssen wrote:
> I use it quite a bit for image processing (converting to and from the
> "data:" URL form), and various checksum applications (converting SHA
> into a string).
Aha! We have a customer!
For those cases, would you find it more convenient
for the result to be text or bytes in P
> If implementing a mime packer is really the only use case
> for base64, then it might as well be removed from the
> standard library, since 99.9% of all programmers will
> never touch it. Those that do will need to have boned up
I use it quite a bit for image processing (converting to and fr
Stephen J. Turnbull wrote:
> Greg> I'd be perfectly happy with ascii characters, but in Py3k,
> Greg> the most natural place to keep ascii characters will be in
> Greg> character strings, not byte arrays.
>
> Natural != practical.
That seems to be another thing we disagree about --
t
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote:
>> I gave you one, MIME processing in email
Greg> If implementing a mime packer is really the only use case
Greg> for base64, then it might as well be removed from the
Greg> standard libra
Stephen J. Turnbull wrote:
> I gave you one, MIME processing in email
If implementing a mime packer is really the only use case
for base64, then it might as well be removed from the
standard library, since 99.9% of all programmers will
never touch it. Those that do will need to have boned up
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> I think we need some concrete use cases to talk about if
Greg> we're to get any further with this. Do you have any such use
Greg> cases in mind?
I gave you one, MIME processing in email, and a concrete bug that is
possible w
Greg Ewing wrote:
> Stephen J. Turnbull wrote:
>> It's "what is the Python compiler/interpreter going
> > to think?" AFAICS, it's going to think that base64 is
> > a unicode codec.
>
> Only if it's designed that way, and I specifically
> think it shouldn't -- i.e. it should be an error
> to att
Stephen J. Turnbull wrote:
> The reason that Python source code is text is that the primary
> producers/consumers of Python source code are human beings, not
> compilers
I disagree with "primary" -- I think human and computer
use of source code have equal importance. Because of the
fact that Pyth
> "Ron" == Ron Adam <[EMAIL PROTECTED]> writes:
Ron> So, lets consider a "codec" and a "coding" as being two
Ron> different things where a codec is a character sub set of
Ron> unicode characters expressed in a native format. And a
Ron> coding is *not* a subset of the unicode c
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote:
>> the kind of "text" for which Unicode was designed is normally
>> produced and consumed by people, who wll pt up w/ ll knds f
>> nnsns. Base64 decoders will not put up with the same kinds of
Stephen J. Turnbull wrote:
> the kind of "text" for which Unicode was designed is normally produced
> and consumed by people, who wll pt up w/ ll knds f nnsns. Base64
> decoders will not put up with the same kinds of nonsense that people
> will.
The Python compiler won't put up with that sort of
* The following reply is a rather longer than I intended explanation of
why codings (and how they differ) like 'rot' aren't the same thing as
pure unicode codecs and probably should be treated differently.
If you already understand that, then I suggest skipping this. But if
you like detailed l
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote:
>> No, base64 isn't a wire protocol. It's a family[...].
Greg> Yes, and it's up to the programmer to choose those code
Greg> units (i.e. pick an encoding for the characters) that will,
Gr
> "Ron" == Ron Adam <[EMAIL PROTECTED]> writes:
Ron> We could call it transform or translate if needed.
You're still losing the directionality, which is my primary objection
to "recode". The absence of directionality is precisely why "recode"
is used in that sense for i18n work.
There r
Stephen J. Turnbull wrote:
> Please define "character," and explain how its semantics map to
> Python's unicode objects.
One of the 65 abstract entities referred to in the RFC
and represented in that RFC by certain visual glyphs.
There is a subset of the Unicode code points that
are conventionall
Stephen J. Turnbull wrote:
>> "Ron" == Ron Adam <[EMAIL PROTECTED]> writes:
>
> Ron> Terry Reedy wrote:
>
> >> I prefer the shorter names and using recode, for instance, for
> >> bytes to bytes.
>
> Ron> While I prefer constructors with an explicit encode argument,
> Ron>
> "Ron" == Ron Adam <[EMAIL PROTECTED]> writes:
Ron> Terry Reedy wrote:
>> I prefer the shorter names and using recode, for instance, for
>> bytes to bytes.
Ron> While I prefer constructors with an explicit encode argument,
Ron> and use a recode() method for 'like to like
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote:
>> Base64 is a (family of) wire protocol(s). It's not clear to me
>> that it makes sense to say that the alphabets used by "baseNN"
>> encodings are composed of characters,
Greg> Take a l
James Y Knight wrote:
> Some MIME sections
> might have a base64 Content-Transfer-Encoding, others might be 8bit
> encoded, others might be 7bit encoded, others might be quoted- printable
> encoded.
I stand corrected -- in that situation you would have to encode
the characters before combini
Ron Adam wrote:
> While I prefer constructors with an explicit encode argument, and use a
> recode() method for 'like to like' coding. Then the whole encode/decode
> confusion goes away.
I'd be happy with that, too.
--
Greg Ewing, Computer Science Dept, +-
Terry Reedy wrote:
> "Greg Ewing" <[EMAIL PROTECTED]> wrote in message
>
>>Efficiency is an implementation concern.
>
> It is also a user concern, especially if inefficiency overruns memory
> limits.
Sure, but what I mean is that it's better to find what's
conceptually right and then look for an
Terry Reedy wrote:
> "Greg Ewing" <[EMAIL PROTECTED]> wrote in message
>
>> Which is why I think that only *unicode* codings should be
>> available through the .encode and .decode interface. Or
>> alternatively there should be something more explicit like
>> .unicode_encode and .unicode_decode th
"Greg Ewing" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Efficiency is an implementation concern.
It is also a user concern, especially if inefficiency overruns memory
limits.
> In Py3k, strings
> which contain only ascii or latin-1 might be stored as
> 1 byte per character,
On Feb 22, 2006, at 6:35 AM, Greg Ewing wrote:
> I'm thinking of convenience, too. Keep in mind that in Py3k,
> 'unicode' will be called 'str' (or something equally neutral
> like 'text') and you will rarely have to deal explicitly with
> unicode codings, this being done mostly for you by the I/O
Stephen J. Turnbull wrote:
> Base64 is a (family of) wire protocol(s). It's not clear to me that
> it makes sense to say that the alphabets used by "baseNN" encodings
> are composed of characters,
Take a look at
http://en.wikipedia.org/wiki/Base64
where it says
...base64 is a binary to
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote:
>> What I advocate for Python is to require that the standard
>> base64 codec be defined only on bytes, and always produce
>> bytes.
Greg> I don't understand that. It seems quite clear to
Josiah Carlson wrote:
> It doesn't seem strange to you to need to encode data twice to be able
> to have a usable sequence of characters which can be embedded in an
> effectively 7-bit email;
I'm talking about a 3.0 world where all strings are unicode
and the unicode <-> external coding is for th
Greg Ewing <[EMAIL PROTECTED]> wrote:
>
> Stephen J. Turnbull wrote:
>
> > What I advocate for Python is to require that the standard base64
> > codec be defined only on bytes, and always produce bytes.
>
> I don't understand that. It seems quite clear to me that
> base64 encoding (in the gener
On Sun, 2006-02-19 at 23:30 +0900, Stephen J. Turnbull wrote:
> > "M" == "M.-A. Lemburg" <[EMAIL PROTECTED]> writes:
> M> * for Unicode codecs the original form is Unicode, the derived
> M> form is, in most cases, a string
>
> First of all, that's Martin's point!
>
> Second, almost a
Stephen J. Turnbull wrote:
> What I advocate for Python is to require that the standard base64
> codec be defined only on bytes, and always produce bytes.
I don't understand that. It seems quite clear to me that
base64 encoding (in the general sense of encoding, not the
unicode sense) takes binar
On Feb 20, 2006, at 7:25 PM, Stephen J. Turnbull wrote:
>> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes:
>
> Martin> Please do take a look. It is the only way: If you were to
> Martin> embed base64 *bytes* into character data content of an XML
> Martin> element, the resul
> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes:
Martin> Please do take a look. It is the only way: If you were to
Martin> embed base64 *bytes* into character data content of an XML
Martin> element, the resulting XML file might not be well-formed
Martin> anymore (if the
Bengt Richter wrote:
> On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters <[EMAIL PROTECTED]> wrote:
> note what base64 really is for. It's essence is to create a _character_
> sequence
> which can succeed in being encoded as ascii. The concept of base64 going
> str->str
> is really a mental sho
Stephen J. Turnbull wrote:
> Martin> For an example where base64 is *not* necessarily
> Martin> ASCII-encoded, see the "binary" data type in XML
> Martin> Schema. There, base64 is embedded into an XML document,
> Martin> and uses the encoding of the entire XML document. As a
> M
> "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes:
Josiah> I try to internalize it by not thinking of strings as
Josiah> encoded data, but as binary data, and unicode as text. I
Josiah> then remind myself that unicode isn't native on-disk or
Josiah> cross-network (which
On Sat, 18 Feb 2006 23:33:15 +0100, Thomas Wouters <[EMAIL PROTECTED]> wrote:
>On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:
>
[...]
>> > - The return value for the non-unicode encodings depends on the value of
>> >the encoding argument.
>
>> Not really: you'll always get a b
> "Martin" == Martin v Löwis <[EMAIL PROTECTED]> writes:
Martin> Stephen J. Turnbull wrote:
Bengt> The characters in b could be encoded in plain ascii, or
Bengt> utf16le, you have to know.
>> Which base64 are you thinking about? Both RFC 3548 and RFC
>> 2045 (MIME) speci
"Stephen J. Turnbull" <[EMAIL PROTECTED]> wrote:
>
> > "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes:
>
> Josiah> The question remains: is str.decode() returning a string
> Josiah> or unicode depending on the argument passed, when the
> Josiah> argument quite literally na
On Feb 19, 2006, at 10:55 AM, Martin v. Löwis wrote:
> Stephen J. Turnbull wrote:
>> BTW, what use cases do you have in mind for Unicode -> Unicode
>> decoding?
>
> I think "rot13" falls into that category: it is a transformation
> on text, not on bytes.
The current implementation is a transforma
Stephen J. Turnbull wrote:
> Bengt> The characters in b could be encoded in plain ascii, or
> Bengt> utf16le, you have to know.
>
> Which base64 are you thinking about? Both RFC 3548 and RFC 2045
> (MIME) specify subsets of US-ASCII explicitly.
Unfortunately, it is ambiguous as to whethe
Stephen J. Turnbull wrote:
> Do you do any of the user education *about codec use* that you
> recommend? The people I try to teach about coding invariably find it
> difficult to understand. The problem is that the near-universal
> intuition is that for "human-usable text" is pretty much anything
Stephen J. Turnbull wrote:
> BTW, what use cases do you have in mind for Unicode -> Unicode
> decoding?
I think "rot13" falls into that category: it is a transformation
on text, not on bytes.
For other "odd" cases: "base64" goes Unicode->bytes in the *decode*
direction, not in the encode directio
> "Bengt" == Bengt Richter <[EMAIL PROTECTED]> writes:
Bengt> The characters in b could be encoded in plain ascii, or
Bengt> utf16le, you have to know.
Which base64 are you thinking about? Both RFC 3548 and RFC 2045
(MIME) specify subsets of US-ASCII explicitly.
--
School of System
> "Bob" == Bob Ippolito <[EMAIL PROTECTED]> writes:
Bob> On Feb 17, 2006, at 8:33 PM, Josiah Carlson wrote:
>> But you aren't always getting *unicode* text from the decoding
>> of bytes, and you may be encoding bytes *to* bytes:
Please note that I presumed that you can indeed ass
> "Josiah" == Josiah Carlson <[EMAIL PROTECTED]> writes:
Josiah> The question remains: is str.decode() returning a string
Josiah> or unicode depending on the argument passed, when the
Josiah> argument quite literally names the codec involved,
Josiah> difficult to understand? I
> "M" == "M.-A. Lemburg" <[EMAIL PROTECTED]> writes:
M> The main reason is symmetry and the fact that strings and
M> Unicode should be as similar as possible in order to simplify
M> the task of moving from one to the other.
Those are perfectly compatible with Martin's suggestion.
> "M" == "M.-A. Lemburg" <[EMAIL PROTECTED]> writes:
M> Martin v. Löwis wrote:
>> No. The reason to ban string.decode and bytes.encode is that it
>> confuses users.
M> Instead of starting to ban everything that can potentially
M> confuse a few users, we should educate tho
> "Ian" == Ian Bicking <[EMAIL PROTECTED]> writes:
Ian> Encodings cover up eclectic interfaces, where those
Ian> interfaces fit a basic pattern -- data in, data out.
Isn't "filter" the word you're looking for?
I think you've just made a very strong case that this is a slippery
slope
"M.-A. Lemburg" <[EMAIL PROTECTED]> writes:
> Martin v. Löwis wrote:
>> M.-A. Lemburg wrote:
> True. However, note that the .encode()/.decode() methods on
> strings and Unicode narrow down the possible return types.
> The corresponding .bytes methods should only allow bytes and
> U
Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:
> Except that ambiguates it even further.
>
> Is encodings.tounicode() encoding, or decoding? According to everything
> you have said so far, it would be decoding. But if I am decoding binary
> data, why should it be spending any time
Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
> > Ron Adam <[EMAIL PROTECTED]> wrote:
> >> Josiah Carlson wrote:
> > [snip]
> >>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> >>> Are we encoding _to_ something, or are we decoding _from_ something?
> >>
Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
> [snip]
>>> Again, the problem is ambiguity; what does bytes.recode(something) mean?
>>> Are we encoding _to_ something, or are we decoding _from_ something?
>> This was just an example of one way that might wor
"Josiah Carlson" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
> Again, the problem is ambiguity; what does bytes.recode(something) mean?
> Are we encoding _to_ something, or are we decoding _from_ something?
> Are we going to need to embed the direction in the encoding/decoding
>
On Sat, Feb 18, 2006 at 01:21:18PM +0100, M.-A. Lemburg wrote:
> It's by no means a Perl attitude.
In your eyes, perhaps. It certainly feels that way to me (or I wouldn't have
said it :). Perl happens to be full of general constructs that were added
because they were easy to add, or they were use
Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by
>> overloading and could do both and more. It has a lot of potential to be
>> an intermediate step for encoding as well a
Ron Adam <[EMAIL PROTECTED]> wrote:
> Josiah Carlson wrote:
[snip]
> > Again, the problem is ambiguity; what does bytes.recode(something) mean?
> > Are we encoding _to_ something, or are we decoding _from_ something?
>
> This was just an example of one way that might work, but here are my
> tho
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
True. However, note that the .encode()/.decode() methods on
strings and Unicode narrow down the possible return types.
The corresponding .bytes methods should only allow bytes and
Unicode.
>>> I forgot that: what is the rationale for
M.-A. Lemburg wrote:
>>>True. However, note that the .encode()/.decode() methods on
>>>strings and Unicode narrow down the possible return types.
>>>The corresponding .bytes methods should only allow bytes and
>>>Unicode.
>>
>>I forgot that: what is the rationale for that restriction?
>
>
> To as
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. And I've explained that
>> the .encode() and .
Michael Hudson wrote:
> There's one extremely significant example where the *value* of
> something impacts on the type of something else: functions. The types
> of everything involved in str([1]) and len([1]) are the same but the
> results are different. This shows up in PyPy's type annotation; m
M.-A. Lemburg wrote:
> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .encode() and .decode() method *do* check the
Aahz wrote:
> On Sat, Feb 18, 2006, Ron Adam wrote:
>> I like the bytes.recode() idea a lot. +1
>>
>> It seems to me it's a far more useful idea than encoding and decoding by
>> overloading and could do both and more. It has a lot of potential to be
>> an intermediate step for encoding as well a
On Sat, Feb 18, 2006, Ron Adam wrote:
>
> I like the bytes.recode() idea a lot. +1
>
> It seems to me it's a far more useful idea than encoding and decoding by
> overloading and could do both and more. It has a lot of potential to be
> an intermediate step for encoding as well as being used for
On 2/18/06, Josiah Carlson <[EMAIL PROTECTED]> wrote:
> Look at what we've currently got going for data transformations in the
> standard library to see what these removals will do: base64 module,
> binascii module, binhex module, uu module, ... Do we want or need to
> add another top-level module
Thomas Wouters wrote:
> On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
>
>> I've already explained why we have .encode() and .decode()
>> methods on strings and Unicode many times. I've also
>> explained the misunderstanding that can codecs only do
>> Unicode-string conversions. An
Josiah Carlson wrote:
> Ron Adam <[EMAIL PROTECTED]> wrote:
>> Josiah Carlson wrote:
>>> Bengt Richter had a good idea with bytes.recode() for strictly bytes
>>> transformations (and the equivalent for text), though it is ambiguous as
>>> to the direction; are we encoding or decoding with bytes.rec
This posting is entirely tangential. Be warned.
"Martin v. Löwis" <[EMAIL PROTECTED]> writes:
> It's worse than that. The return *type* depends on the *value* of
> the argument. I think there is little precedence for that:
There's one extremely significant example where the *value* of
something
Martin v. Löwis wrote:
> M.-A. Lemburg wrote:
>> Just because some codecs don't fit into the string.decode()
>> or bytes.encode() scenario doesn't mean that these codecs are
>> useless or that the methods should be banned.
>
> No. The reason to ban string.decode and bytes.encode is that
> it confu
On Sat, Feb 18, 2006 at 12:06:37PM +0100, M.-A. Lemburg wrote:
> I've already explained why we have .encode() and .decode()
> methods on strings and Unicode many times. I've also
> explained the misunderstanding that can codecs only do
> Unicode-string conversions. And I've explained that
> the .e
1 - 100 of 139 matches
Mail list logo