Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

Glenn Linderman Wed, 24 Apr 2013 02:21:36 -0700

On 4/24/2013 1:22 AM, M.-A. Lemburg wrote:

On 23.04.2013 19:24, Guido van Rossum wrote:

On Tue, Apr 23, 2013 at 9:04 AM, M.-A. Lemburg <m...@egenix.com> wrote:

On 23.04.2013 17:47, Guido van Rossum wrote:

On Tue, Apr 23, 2013 at 8:22 AM, M.-A. Lemburg <m...@egenix.com> wrote:

Just as reminder: we have the general purpose
encode()/decode() functions in the codecs module:


import codecs
r13 = codecs.encode('hello world', 'rot-13')

These interface directly to the codec interfaces, without
enforcing type restrictions. The codec defines the supported
input and output types.

As an implementation mechanism I see nothing wrong with this. I hope
the codecs module lets you introspect the input and output types of a
codec given by name?

At the moment there is no standard interface to access supported
input and output types... but then: regular Python functions or
methods also don't provide such functionality, so no surprise
there ;-)

Not quite the same though. Each function has its own unique behavior.
But codecs support a standard interface, *except* that the input and
output types sometimes vary.

The codec system itself

It's mostly a matter of specifying the supported type
combinations in the codec documentation.

BTW: What would be a use case where you'd want to
programmatically access such information before calling
the codec ?

As you know, in Python 3, most code working with bytes doesn't also
work with strings, and vice versa (except for a few cases where we've
gone out of our way to write polymorphic code -- but users rarely do
so, and any time you use a string or bytes literal you basically limit
yourself to that type).

Suppose I write a command-line utility that reads a file, runs it
through a codec, and writes the result to another file. Suppose the
name of the codec is a command-line argument (as well as the
filenames). I need to know whether to open the files in text or binary
mode based on the name of the codec.

Ok, so you need to know which codecs your tool can support and
which of those need text input and which bytes input.

I've been thinking about this some more: I think that type
information alone is not flexible enough to cover such
use cases.

Maybe MIME type and encoding would be sufficient type information, butprobably not str vs. bytes.

In your use case you'd want to only permit use of a certain
set of codecs, not simply all of them, since some might
not implement what you actually want to achieve with the tool,
e.g. a user might have installed a codec set that adds
support for reading and writing image data, but your
intended use was to only support text data.

MIME type supports this sort of concept, with the two-level hierarchy ofnaming the type... text/xml text/plain image/jpeg

So what we need is a way to allow the codecs to say e.g.
"I work on text", "I support encoding bytes and text",
"I encode to bytes", "I'm reversible", "I transform
input data", "I support bytes and text, and will create
same type output", "I work on image data", "I work on
X509 certificates", "I work on XML data", etc.


Guess what I think you are re-inventing here....
Nope, guess again....
Yep, MIME types _plus_ encodings.

In other words, we need a form of tagging system, with a
set of standard tags that each codec can publish and
which also allows non-standard tags (which can then at
some point be made standard, if there's agreement on them).


Hmm.  Sounds just like the registry for, um, you guessed it: MIME types.

Given a codec name you could then ask the codec registry for
the codec tags and verify that the chosen codec handles
text data, needs bytes or text encoding input and
creates bytes as encoding output. If the registry returns
codec tags that don't include the "I work on text" tag,
the tool could then raise an error.

For just doing text encoding transformations, text/plain would work asa MIME type, and the encodings of interest for the encodings.

Seems like "str" always means "Unicode" but the MIME type can vary;"bytes" might mean encoded text, and the MIME type can also vary.

For non-textual transformations, "encoding" might mean Base 64, BinHex,or other such representations... but those can also be applied to text,so it might be a 3rd dimension, or it might just be a list of encodingsrather than a single encoding.


Compression could be another dimension, or perhaps another encoding.

But really, then, a transformation needs to be a list of steps; a codeccan sign up to perform one or more of the steps, a sequence of codecswould have to be found, capable of performing a subsequence of thesteps, and then run in the appropriate order.

This all sounds so general, that probably the Python compiler could beimplemented as a codec :) Or any compiler. Probably a web server couldbe implemented as a codec too :) Well, maybe not, codecs have limitederror handling and reporting abilities.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Why can't I encode/decode base64 without importing a module?

Reply via email to