Re: [Python-Dev] Dropping bytes "support" in json
Hello, Antoine Pitrou writes: > Hello, > > We're in the process of forward-porting the recent (massive) json > updates to 3.1, and we are also thinking of dropping remnants of > support of the bytes type in the json library (in 3.1, again). This > bytes support almost didn't work at all, but there was a lot of C and > Python code for it nevertheless. We're also thinking of dropping the > "encoding" argument in the various APIs, since it is useless. I had a quick look into the module on both branches, and at Antoine's latest patch (json_py3k-3). The current situation on trunk is indeed not very pretty in terms of code duplication, and I agree it would be nice not to carry that forward. I couldn't figure out a way to get rid of it short of multi-#including "templates" and playing with the C preprocessor, however, and have the nagging feeling the latter would be frowned upon by the maintainers. There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm wrong about that. Should I give it a try, and see how "clean" the result can be made? > Under the new situation, json would only ever allow str as input, and > output str as well. By posting here, I want to know whether anybody > would oppose this (knowing, once again, that bytes support is already > broken in the current py3k trunk). Provided one of the alternatives is dropped, wouldn't it be better to do the opposite, i.e., have the decoder take bytes as input, and the encoder produce bytes—and layer the str functionality on top of that? I guess the answer depends on how the (most common) lower layers are structured, but it would be nice to allow a straight bytes path to/from the underlying transport. (I'm willing to have a go at the conversion in case somebody is interested.) Bob, would you have an idea of which lower layers are most commonly used with the json module, and whether people are more likely to expect strs or bytes in Python 3.x? Maybe that data could be inferred from some bug tracking system? > The bug entry is: http://bugs.python.org/issue4136 > > Regards > Antoine. Regards, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
Hi Eric, "Eric Smith" writes: >> I couldn't figure out a way to get rid of it short of multi-#including >> "templates" and playing with the C preprocessor, however, and have the >> nagging feeling the latter would be frowned upon by the maintainers. > > Not sure if this is exactly what you mean, but look at Objects/stringlib. > str.format() and unicode.format() share the same implementation, using > stringdefs.h and unicodedefs.h. That's indeed a much better example! I'm more confortable applying the same technique to the json module now that I see it used in the core. (Provided Bob and Antoine are not turned away by the relative ugliness, that is.) > Eric. Cheers, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Dropping bytes "support" in json
Hi Antoine, Antoine Pitrou writes: > Damien Diederen crosstwine.com> writes: >> I couldn't figure out a way to get rid of it short of multi-#including >> "templates" and playing with the C preprocessor, however, and have the >> nagging feeling the latter would be frowned upon by the maintainers. >> >> There is a precedent with xmltok.c/xmltok_impl.c, though, so maybe I'm >> wrong about that. Should I give it a try, and see how "clean" the >> result can be made? > > Keep in mind that json is externally maintained by Bob. The more we rework his > code, the less easy it will be to backport other changes from the simplejson > library. > > I think we should either keep the code duplication (if we want to keep fast > paths for both bytes and str objects), or only keep one of the two versions as > my patch does. Yes, I was (slowly) reaching the same conclusion. >> Provided one of the alternatives is dropped, wouldn't it be better to do >> the opposite, i.e., have the decoder take bytes as input, and the >> encoder produce bytes—and layer the str functionality on top of that? I >> guess the answer depends on how the (most common) lower layers are >> structured, but it would be nice to allow a straight bytes path to/from >> the underlying transport. > > The straightest path is actually to/from unicode, since JSON data can contain > unicode strings but no byte strings. Also, the json library /has/ to output > unicode when `ensure_ascii` is False. In 2.x: > >>>> json.dumps([u"éléphant"], ensure_ascii=False) > u'["\xe9l\xe9phant"]' > > In any case, I don't think it will matter much in terms of speed > whether we take one route or the other. UTF-8 encoding/decoding is > probably much faster (in characters per second) than JSON > encoding/decoding is. You're undoubtedly right. I was more concerned about the interaction with other modules, and avoiding unnecessary copies/conversions especially when they don't make sense from the user's perspective. I will whip up a patch adding a {loadb,dumpb} API as you suggested in another email, with the most trivial implementation, and then we'll see where to go from there. It can still be dropped if there is a concern of perpetuating a "bad idea," or I can follow up with a port of Bob's "bytes" implementation from 2.x if there is any interest. > Regards > Antoine. Cheers, Damien -- http://crosstwine.com "Strong Opinions, Weakly Held" -- Bob Johansen ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com