Re: [Python-Dev] Dropping bytes "support" in json

2009-04-27 Thread Damien Diederen

Hello,

Antoine Pitrou  writes:
> Hello,
>
> We're in the process of forward-porting the recent (massive) json
> updates to 3.1, and we are also thinking of dropping remnants of
> support of the bytes type in the json library (in 3.1, again). This
> bytes support almost didn't work at all, but there was a lot of C and
> Python code for it nevertheless. We're also thinking of dropping the
> "encoding" argument in the various APIs, since it is useless.

I had a quick look into the module on both branches, and at Antoine's
latest patch (json_py3k-3).  The current situation on trunk is indeed
not very pretty in terms of code duplication, and I agree it would be
nice not to carry that forward.

I couldn't figure out a way to get rid of it short of multi-#including
"templates" and playing with the C preprocessor, however, and have the
nagging feeling the latter would be frowned upon by the maintainers.

There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
wrong about that.  Should I give it a try, and see how "clean" the
result can be made?

> Under the new situation, json would only ever allow str as input, and
> output str as well. By posting here, I want to know whether anybody
> would oppose this (knowing, once again, that bytes support is already
> broken in the current py3k trunk).

Provided one of the alternatives is dropped, wouldn't it be better to do
the opposite, i.e., have the decoder take bytes as input, and the
encoder produce bytes—and layer the str functionality on top of that?  I
guess the answer depends on how the (most common) lower layers are
structured, but it would be nice to allow a straight bytes path to/from
the underlying transport.

(I'm willing to have a go at the conversion in case somebody is
interested.)

Bob, would you have an idea of which lower layers are most commonly used
with the json module, and whether people are more likely to expect strs
or bytes in Python 3.x?  Maybe that data could be inferred from some bug
tracking system?

> The bug entry is: http://bugs.python.org/issue4136
>
> Regards
> Antoine.

Regards,
Damien

-- 
http://crosstwine.com

"Strong Opinions, Weakly Held"
 -- Bob Johansen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-27 Thread Damien Diederen

Hi Eric,

"Eric Smith"  writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>
> Not sure if this is exactly what you mean, but look at Objects/stringlib.
> str.format() and unicode.format() share the same implementation, using
> stringdefs.h and unicodedefs.h.

That's indeed a much better example!  I'm more confortable applying the
same technique to the json module now that I see it used in the core.

(Provided Bob and Antoine are not turned away by the relative ugliness,
that is.)

> Eric.

Cheers,
Damien

--
http://crosstwine.com

"Strong Opinions, Weakly Held"
 -- Bob Johansen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping bytes "support" in json

2009-04-27 Thread Damien Diederen

Hi Antoine,

Antoine Pitrou  writes:
> Damien Diederen  crosstwine.com> writes:
>> I couldn't figure out a way to get rid of it short of multi-#including
>> "templates" and playing with the C preprocessor, however, and have the
>> nagging feeling the latter would be frowned upon by the maintainers.
>> 
>> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm
>> wrong about that.  Should I give it a try, and see how "clean" the
>> result can be made?
>
> Keep in mind that json is externally maintained by Bob. The more we rework his
> code, the less easy it will be to backport other changes from the simplejson
> library.
>
> I think we should either keep the code duplication (if we want to keep fast
> paths for both bytes and str objects), or only keep one of the two versions as
> my patch does.

Yes, I was (slowly) reaching the same conclusion.

>> Provided one of the alternatives is dropped, wouldn't it be better to do
>> the opposite, i.e., have the decoder take bytes as input, and the
>> encoder produce bytes—and layer the str functionality on top of that?  I
>> guess the answer depends on how the (most common) lower layers are
>> structured, but it would be nice to allow a straight bytes path to/from
>> the underlying transport.
>
> The straightest path is actually to/from unicode, since JSON data can contain
> unicode strings but no byte strings. Also, the json library /has/ to output
> unicode when `ensure_ascii` is False. In 2.x:
>
>>>> json.dumps([u"éléphant"], ensure_ascii=False)
> u'["\xe9l\xe9phant"]'
>
> In any case, I don't think it will matter much in terms of speed
> whether we take one route or the other. UTF-8 encoding/decoding is
> probably much faster (in characters per second) than JSON
> encoding/decoding is.

You're undoubtedly right.  I was more concerned about the interaction
with other modules, and avoiding unnecessary copies/conversions
especially when they don't make sense from the user's perspective.

I will whip up a patch adding a {loadb,dumpb} API as you suggested in
another email, with the most trivial implementation, and then we'll see
where to go from there.

It can still be dropped if there is a concern of perpetuating a "bad
idea," or I can follow up with a port of Bob's "bytes" implementation
from 2.x if there is any interest.

> Regards
> Antoine.

Cheers,
Damien

-- 
http://crosstwine.com

"Strong Opinions, Weakly Held"
 -- Bob Johansen
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com