Stefan Behnel added the comment:
If I understand the code right, "PY_SSIZE_T_MAX/sizeof(Py_UCS4)" would not be
correct since it would unnecessarily limit the length of ASCII-only unicode
strings.
I think the initial check avoids the risk of integer overflow in the
calculations
Stefan Behnel added the comment:
Well, if that's what it takes, then that's what it takes. I'm fine with the
change. The (unaccelerated) ET doesn't strictly require it, but a) I can't
really see a use case for non-Element classes in the tree, b) pretty much
no
New submission from Stefan Behnel :
I see reports from Cython users on Windows-64 that extension modules that use
"longintrepr.h" get miscompiled by MinGW. A failing setup looks as follows:
Stock 64 bit CPython on Windows, e.g.
Python 3.6.3 (v3.6.3:2c5fed8, Oct 3 2017, 18:11:49) [
Stefan Behnel added the comment:
Some more information. CPython uses this code snippet to decide on the PyLong
digit size:
#ifndef PYLONG_BITS_IN_DIGIT
#if SIZEOF_VOID_P >= 8
#define PYLONG_BITS_IN_DIGIT 30
#else
#define PYLONG_BITS_IN_DIGIT 15
#endif
#endif
In MinGW, "SIZEOF_VOID
Stefan Behnel added the comment:
> There's PyLong_GetInfo ...
Thanks, I understand that this information can be found at runtime. However, in
order to correctly compile C code against a given CPython runtime, there needs
to be a guarantee that extension module builds use
Stefan Behnel added the comment:
The relevant macro seems to be "__MINGW64__". I have neither a Windows
environment nor MinGW-64 for testing, but the logic should be the same as for
the "_WIN64" macro, i.e.
#if defined(_WIN64) || defined(__MINGW64__)
#define MS_WI
Stefan Behnel added the comment:
> Maybe some people prefer sorting to get a more deterministic output
Note that those people are much better off with C14N. It is the one tool
designed for all of these use cases, even usable for cryptographic signatures.
And it's trivial to use, it&
Stefan Behnel added the comment:
Making _PyGC_FINALIZED() internal broke Cython
(https://github.com/cython/cython/issues/2721). It's used in the finaliser
implementation
(https://github.com/cython/cython/blob/da657c8e326a419cde8ae6ea91be9661b9622504/Cython/Compiler/ModuleNode.py#L1442-
Stefan Behnel added the comment:
This test from lxml's ElementTree test suite crashes for me now when run
against (c)ElementTree:
def test_parser_target_error_in_start(self):
assertEqual = self.assertEqual
events = []
class Target(object):
def
Stefan Behnel added the comment:
@Victor: yes, the Cython project has CI tests running against debug builds of
all CPython branches since 2.4, updated daily. lxml is part of an extended set
of tests for Cython, and the test suite of lxml includes several compatibility
tests for ElementTree
Changes by Stefan Behnel :
--
nosy: -scoder
___
Python tracker
<http://bugs.python.org/issue18408>
___
___
Python-bugs-list mailing list
Unsubscribe:
New submission from Stefan Behnel:
The C accelerator for the collections.Counter class (_count_elements() in
_collections.c) is slower than the pure Python versions for data that has many
unique entries. This is because the fast path for dicts is not taken (Counter
is a subtype of dict) and
Stefan Behnel added the comment:
Here is a patch that cleans up the current implementation to avoid making the
result of iterparse() an IncrementalParser (with all of its new API).
Please note that the tulip mailing list is not an appropriate place to discuss
additions to the XML libraries
Changes by Stefan Behnel :
--
components: +XML
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Python-bugs-list mailing list
Unsubscribe:
Stefan Behnel added the comment:
Copying the discussion between Antoine and me from python-dev:
>> IMO it should mimic the interface of the TreeBuilder, which
>> calls the data reception method "feed()" and the termination method
>> "close()". There
Stefan Behnel added the comment:
Actually, let me take that last paragraph back. There is an Obvious Way to do
it, and that's the feed() and close() methods. They are the existing and
established ElementTree interface for incremental parsing. The fact that
close() doesn't clean up
Changes by Stefan Behnel :
Added file: http://bugs.python.org/file31206/iterparse_api_cleanup.patch
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Pytho
Stefan Behnel added the comment:
Thinking about the original patch some more - I wonder why it doesn't use a
wrapper for TreeBuilder to process the events. Antoine, is there a reason why
you had to add this _setevents() method to the XMLParser, instead of making the
IncrementalPars
Stefan Behnel added the comment:
Oh, and could we reopen this ticket, please?
--
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Python-bugs-list m
Stefan Behnel added the comment:
> The point is not to build a tree of potentially unbounded size (think
> XMPP). The point is to yield events in a non-blocking way (iterparse()
> is blocking, which makes it useless for non-blocking applications).
Ok, but that's the only differen
Stefan Behnel added the comment:
> About the patch: I think changing the API names now that alpha1 has been
> released is a bit gratuitous.
Sorry for being late, but I can't see it being my fault.
A change in an alpha release is still way better than a change after a final
release.
Stefan Behnel added the comment:
> But worse than no change at all. Arguing about API naming is a bit
> futile, *especially* when the ship has sailed.
It's easy to say that as a core developer with commit rights who can
simply hide changes in a low frequented bug tracker withou
Stefan Behnel added the comment:
But IncrementalParser uses an XMLParser internally, which in turn uses a
TreeBuilder internally. So what exactly is your point?
--
___
Python tracker
<http://bugs.python.org/issue17
Stefan Behnel added the comment:
> Well, I would rather like to understand yours.
My point is that the IncrementalParser uses a TreeBuilder that builds an XML
tree in the back. So I'm wondering why you are saying that it doesn't build a
tree.
> Whatever IncrementalParser
Stefan Behnel added the comment:
> TreeBuilder doesn't do parsing, it takes already parsed data: it has a
> start() method to open a tag, and a data() method to add raw text
> inside that tag.
That is correct. However, the XMLParser has a feed() method that sends new data
into t
Stefan Behnel added the comment:
> Unless I'm reading it wrong, when _setevents() is called, the internal
> hooks are rewired to populate the events list, rather than call the
> corresponding TreeBuilder methods. So, yes, there's a TreeBuilder
> somewhere, but it stands
Stefan Behnel added the comment:
> ask to be added to the experts list for ET
Already done, see the corresponding python-dev thread.
> Now back to a productive discussion please...
I think we already are. Keep reading through the rest of the
Stefan Behnel added the comment:
Ok, finally. ;)
Can we agree on discarding the current implementation for now and then
rewriting it based on a tree builder instead of a parser wrapper?
"Because we'd need to change internal code" is not a good argument for adding a
Stefan Behnel added the comment:
What about this idea: instead of changing the internal C implementation, we
could provide a simple parser target wrapper class (say, "EventBuilder") that
the parser would simply recognise with isinstance(). Then it would unpack the
wrapped target and
Stefan Behnel added the comment:
> I agree, but this assumes there is a "better API". I would like to see
> an API proposal and a patch before I give an opinion :-)
Well, I have already proposed an API throughout this thread, and then given a
usage example and an almost comple
Stefan Behnel added the comment:
I gave the implementation a try and attached an incomplete patch. Some tests
are failing.
It turns out that it's not entirely easy to do this. As Antoine noticed, the
hack in the C implementation of the TreeBuilder makes it tricky to integrate
with
Stefan Behnel added the comment:
FWIW, lxml.etree supports wildcards like '{*}tag' in searches, and this is
otherwise quite rarely a problem in practice.
I'm -1 on the proposed feature and wouldn't mind rejecting this all together.
(At least change the title to someth
Stefan Behnel added the comment:
Rejecting this ticket was the right thing to do. It's not a bug but a feature.
In Python 2.x, ElementTree returns any text content that can correctly be
represented as an ASCII encoded string in the native Py2.x string type (i.e.
'str'). Only no
Stefan Behnel added the comment:
There's also the QName class which can be used to split qualified tag names.
And it's pretty trivial to pre-process the entire tree by stripping all
namespaces from it the intention is really to do namespace agnostic processing.
However, in my experi
Stefan Behnel added the comment:
Just to reiterate this point, lxml.etree supports a "pretty_print" flag in its
tostring() function and ElementTree.write(). It would thus make sense to
support the same thing in ET.
http://lxml.de/api.html#serialisation
For completeness, the current
Stefan Behnel added the comment:
Please leave the title as it is now.
--
title: ElementTree gets awkward to use if there is an xmlns -> ElementTree --
provide a way to ignore namespace in tags and seaches
___
Python tracker
<http://bugs.pyth
Stefan Behnel added the comment:
As I already suggested for lxml, you can use the QName class to process
qualified names, e.g.
QName(some_element.tag).localname
Or even just
QName(some_element).localname
It appears that ElementTree doesn't support this. It lists the QName ty
New submission from Stefan Behnel:
The get/set/delitem slicing protocol has replaced the old Py2.x
get/set/delslice protocol in Py3.x. This change introduces a substantial
overhead due to processing indices as Python objects rather than plain
Py_ssize_t values. This overhead should be reduced
Stefan Behnel added the comment:
This tiny patch adds a fast-path to _PyEval_SliceIndex() that speeds up the
slicing-heavy "fannkuch" benchmark by 8% for me.
--
keywords: +patch
Added file: http://bugs.python.org/file31421/faster_PyEval_SliceI
Stefan Behnel added the comment:
Sorry, broken patch. Here's a new one (same results).
--
Added file: http://bugs.python.org/file31422/faster_PyEval_SliceIndex.patch
___
Python tracker
<http://bugs.python.org/is
Stefan Behnel added the comment:
Another patch, originally proposed in issue10227 by Kristján Valur Jónsson. It
uses local variables instead of pointers in PySlice_GetIndicesEx(). I see no
performance difference whatsoever (Linux/gcc 4.7), but also I can't see a
reason not to do this. I
Stefan Behnel added the comment:
And in fact, callgrind shows an *increase* in the number of instructions
executed for the "use locals" patch (898M with vs. 846M without that patch when
running the fannkuch benchmark twice). That speaks against making that change
Stefan Behnel added the comment:
Here is another patch that remembers the Py_ssize_t slice indices if they are
known at instantiation time. It only makes a very small difference for the
"fannkuch" benchmark, so that's no reason to add both the complexity and the
(IMHO ig
Stefan Behnel added the comment:
Ok, so what are we going to do for the next alpha?
--
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Python-bug
Stefan Behnel added the comment:
I was asking for the current implementation to be removed until we have a
working implementation that hurts neither the API nor the module design.
--
___
Python tracker
<http://bugs.python.org/issue17
Stefan Behnel added the comment:
I don't think I understand what you mean.
In any case, it's not to late to remove the implementation. There was only one
alpha release so far that included it, so it can't really break any existing
code that relies on it. The longer we wait,
Stefan Behnel added the comment:
Could we please keep the discussion on rational terms?
It's not just the method names. The problem is that you are duplicating an
existing class (the XMLParser) for no good reason, instead of putting the
feature where it belongs: *behind* the XMLParser.
Stefan Behnel added the comment:
Given that it seems to be hard to come to a consensus in this ticket, I've
asked for removal of the code on python-dev.
http://mail.python.org/pipermail/python-dev/2013-August/128095.html
--
___
Python tr
Stefan Behnel added the comment:
"""
1. Why have the "event builder" wrap a tree builder? Can't it just be a
separate target?
"""
You need a TreeBuilder in order to build the tree for the events.
If you want to use a different target than a TreeBu
Stefan Behnel added the comment:
I attached a patch that removes the IncrementalParser class and merges its
functionality into the _IterParseIterator. It thus retains most of the
refactoring without adding new functionality and/or APIs.
I did not take a look if anything else from later
Stefan Behnel added the comment:
"""
TreeBuilder has to support an explicit API for collecting and reporting events.
XMLParser has to call into this API and either not have _setevents at all or
have something public and documented. Note also that event hookup in the parser
m
Stefan Behnel added the comment:
> I still think IncrementalParser is worth keeping.
If you want to keep it at all cost, I think we should at least hide it behind a
function (as with iterparse()). If it's implemented as a class, chances are
that people will start relying on inte
Stefan Behnel added the comment:
Actually, let me revise my rpevious comment. I think we should fake the new
interface for now by adding a TreeEventBuilder that requires having its own
TreeBuilder internally, instead of wrapping an arbitrary target. That way, we
can avoid having to clean up
Stefan Behnel added the comment:
> fully working patches will be considered
Let me remind you that it's not me who wants this feature so badly.
> As for faking the new API, I don't know if that's a good idea because we're
> not yet sure what that new API is.
I
Stefan Behnel added the comment:
> Also, even if the new approach is implemented in the next release,
IncrementalParser can stay as a simple synonym to
XMLParser(target=EventBuilder(...)).
No it can't. According to your signature, it accepts a parser instance as
input. So it can'
Stefan Behnel added the comment:
Hmm, did you look at my last comment at all? It solves both the technical
issues and the API issues very nicely and avoids any problems of potential
future changes. Let me quickly explain why.
The feature in question depends on two existing parts of the API
Stefan Behnel added the comment:
BTW, I also like how short and clean iterparse() becomes when you move this
feature into the parser. It's basically just a convenience function that does
read(), feed(), and yield-from. Plus the usual bit of bolerplate code,
obvi
Stefan Behnel added the comment:
> iterparse's "parser" argument will be deprecated
No need to do that. Update the docs, yes, but otherwise keep the possibility to
improve the implementation later on, without going through a deprecation +
dedeprecation cycle. That would
Stefan Behnel added the comment:
I don't see adding one method to XMLParser as a design problem. In fact, it's
even a good design on the technical side, because if ET ever gains an
HTMLParser, then the implementation of this feature would be highly dependent
on the underlying parse
Stefan Behnel added the comment:
> ... instead require passing in a callback that accepts the target ...
That could be the parser class then, for example, except that there may be
other options to set as well. Plus, it would not actually allow iterparse to
wrap a user provided target. So,
Stefan Behnel added the comment:
> it's really about turning XMLParser's "push" API for events (where the events
> are pushed into the target object by the parser calling the appropriate
> methods), into an iterparse style pull API where the events can be retrieve
Stefan Behnel added the comment:
> in the long run we want the new class to just be a convenience API for
> combining XMLParser and a custom target object, even if it can't be
> implemented that way right now.
Just to be clear: I changed my opinion on this one and I no longer
Stefan Behnel added the comment:
> XMLParser knows nothing about Elements, at least in the direct API of today.
> The one constructing Elements is the target.
Absolutely. And I'm 100% for keeping that distinction exactly as it is.
> The "read_events" method pr
Stefan Behnel added the comment:
Here is a proof-of-concept patch that integrates the functionality of the
IncrementalParser into the XMLParser. I ended up reusing most of Antoines
implementation and test suite. In case he'll look back into this ticket at some
point, I'll put a
Stefan Behnel added the comment:
(I still wonder why I'm the one writing all the patches here when Eli is the
one who actually wants this feature ...)
--
___
Python tracker
<http://bugs.python.org/is
Stefan Behnel added the comment:
BTW, maybe "read_events()" still isn't the ideal method name to put on a parser.
--
___
Python tracker
<http://bugs.pyt
Stefan Behnel added the comment:
> Putting _setevents aside for the moment,
Agreed, obviously.
> XMLParser is a clean and simple API. Its output is only "push" (by calling
> callbacks on the target). It doesn't deal with Elements at all.
We already agreed on that,
Stefan Behnel added the comment:
> The whole point of the new API is not to replace XMLParser, but to provide a
> convenience API to set up a particular combination of an XMLParser with a
> particular kind of custom target.
Ok, but I'm saying that we don't need that. It&
Stefan Behnel added the comment:
This is a bit tricky in ET because it generally allows you to stick anything
into the Element properties (and that's a feature). So catching this at tree
building time (as lxml.etree does) isn't really possible.
However, at least catching it in the
Stefan Behnel added the comment:
Go for it. That's usually the fastest way to get things done.
--
___
Python tracker
<http://bugs.python.org/issue18850>
___
___
Stefan Behnel added the comment:
Eli, I agree that we've put way more than enough time into the discussion by
now. We all know each other's arguments and failed to convince each other.
Please come up with working code that shows that the approach you are
advocating for
Stefan Behnel added the comment:
Michele, could you elaborate how you would exploit this issue as a security
risk?
I mean, I can easily create a (non-)XML-document with control characters
manually, and the parser would reject it.
What part of the create-to-serialise process exactly is a
Stefan Behnel added the comment:
> The parser is *not* rejecting control chars.
The parser *is* rejecting control characters. It's an XML parser. See the
example in the link you posted.
> assume you have a script that simply stores each message it receives (from
> stdin, fro
Stefan Behnel added the comment:
Or maybe even to "enhancement". The behaviour that it writes out what you give
it isn't exactly wrong, it's just inconvenient that you have to take care
yourself that you pass it wel
Stefan Behnel added the comment:
> the push API is inactive and gets redirected to a pull API
Given that this is aimed to become a redundant 'convenience' wrapper around
something else at a point, I assume that you are aware that the above is just
an arbitrary restriction due to
Stefan Behnel added the comment:
> I think the point here is clarifying whether xml expect text or just a byte
> string. In case that's a stream of byte, I agree with you, is more a
> "behaviour" problem.
XML is *defined* as a stream of bytes.
Regarding the API
Stefan Behnel added the comment:
>> XML is *defined* as a stream of bytes.
> Can you *paste* the *source* proving what you are arguing, please?
http://www.w3.org/TR/REC-xml/
> python3 works with ElementTree(bytes(unicode))
What does this s
Stefan Behnel added the comment:
We are talking about two different things here.
I said that (serialised) XML is defined as a sequence of bytes. Read the spec
on that.
What you are talking about is the Infoset, or the parsed/generated in-memory
XML tree. That's obviously not bytes,
Stefan Behnel added the comment:
Any comments regarding my naming suggestion?
Calling it a "push" parser is just too ambiguous.
--
___
Python tracker
<http://bugs.python.o
Stefan Behnel added the comment:
Erm, "pull" parser, but you see what I mean.
--
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Python-b
Stefan Behnel added the comment:
Is that you actual use case? That you *want* to store binary data in XML,
instead of getting it properly rejected as non well-formed content?
Then I suggest going the canonical route of passing it through base64 first, or
any of the other binary-to-characters
Stefan Behnel added the comment:
> As an advice I hope you do not take as insult, saying
> "in section {section} the spec says {argument}"
> is much more constructive than
> "read the spec on that", "{extremely_obvious_link}",
> at least to people
New submission from Stefan Behnel:
The exception handling clauses in framework_find() are weird.
def framework_find(fn, executable_path=None, env=None):
"""
Find a framework using dyld semantics in a very loose manner.
Will take input such as:
Stefan Behnel added the comment:
changing title as it doesn't really look like a typo, more a "converto"
--
title: typo in Lib/ctypes/macholib/dyld.py -> invalid exception handling in
Lib/ctypes/macholib/dyld.py
___
Pyt
New submission from Stefan Behnel:
diff --git a/performance/pystone.py b/performance/pystone.py
--- a/performance/pystone.py
+++ b/performance/pystone.py
@@ -59,9 +59,9 @@
def main(loops=LOOPS):
benchtime, stones = pystones(loops)
-print "Pystone(%s) time for %d passes
Stefan Behnel added the comment:
I can well imagine that the serialiser is broken for this in Py2.x, given that
the API accepts byte strings and stores them as such. The fix might be as
simple as decoding byte strings in the serialiser before writing them out.
Involves a pretty high
Stefan Behnel added the comment:
> This would make it possible to layer XMLPullParser on top of the stock
> XMLParser coupled with a special target that collects "events" from the
> callback calls.
Given that we have an XMLPullParser now, I think we should not clutter
Stefan Behnel added the comment:
(fixing subject to properly hit bug filters)
--
title: Make ET event handling more modular to allow custom targets for the
non-blocking parser -> Make ElementTree event handling more modular to allow
custom targets for the non-blocking par
Stefan Behnel added the comment:
While refactoring the iterparse() implementation in lxml to support this new
interface, I noticed that the close() method of the XMLPullParser does not
behave like the close() method of the XMLParser. Instead of setting some .root
attribute on the parser
Stefan Behnel added the comment:
Looks like we missed the alpha2 release for the close() API fix. I recommend
not letting yet another deadline go by.
--
___
Python tracker
<http://bugs.python.org/issue17
New submission from Stefan Behnel:
The .close() method of the new XMLPullParser (see issue17741) in Py3.4 shows an
unnecessarily complicated behaviour that is inconsistent with the .close()
method of the existing XMLParser.
The attached patch removes some code to fix this
Changes by Stefan Behnel :
--
nosy: +eli.bendersky
___
Python tracker
<http://bugs.python.org/issue18990>
___
___
Python-bugs-list mailing list
Unsubscribe:
Stefan Behnel added the comment:
Created separate issue18990 to keep this one closed as is.
--
___
Python tracker
<http://bugs.python.org/issue17741>
___
___
Pytho
Stefan Behnel added the comment:
Could the
while thread._count() > c:
pass
in test_thread.py be changed to this? (as used in other places)
while thread._count() > c:
time.sleep(0.01)
It currently hangs in Cython because it doesn't free the GIL during
Stefan Behnel added the comment:
Looks like a new feature to me.
--
versions: +Python 3.4 -Python 3.3
___
Python tracker
<http://bugs.python.org/issue13
Stefan Behnel added the comment:
Florent, what you describe is exactly the definition of a new feature.
Users even have to change their code in order to make use of it.
--
___
Python tracker
<http://bugs.python.org/issue13
New submission from Stefan Behnel:
Line 912 of threading.py, Py2.7, reads:
self.queue = deque()
"deque" hasn't been imported.
--
components: Library (Lib)
messages: 167554
nosy: scoder
priority: normal
severity: normal
status: open
title: threading.py con
New submission from Stefan Behnel:
The new importlib shows a regression w.r.t. previous CPython versions. It no
longer recognises an "__init__.so" file as a package. All previous CPython
versions have always tested first for an extension file before testing for a
.py/.pyc fil
Stefan Behnel added the comment:
Additional info: working around this bug from user code is fairly involved
because some FileLoader instances are already created early at initialisation
time and the overall configuration of the FileLoaders is kept in the closure of
a path hook
Stefan Behnel added the comment:
Hi, thanks for bringing in the 'historical details'. It's not so much that
"Cython has been relying on it" - it's entirely up to users what they compile
and what not. It's more that I don't see anything being wrong
701 - 800 of 1287 matches
Mail list logo