[issue38011] xml.dom.pulldom splits text data at buffer size when parsing from file

2019-09-02 Thread Stefan Behnel
Stefan Behnel added the comment: I don't see anything inherently wrong with having multiple text nodes. In fact, input with very large text content can be considered a security threat (c.f. compression bombs), so a tool like pulldom (which is designed for incremental processing) should not s

[issue38011] xml.dom.pulldom splits text data at buffer size when parsing from file

2019-09-02 Thread Karthikeyan Singaravelan
Change by Karthikeyan Singaravelan : -- nosy: +scoder ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: https:/

[issue38011] xml.dom.pulldom splits text data at buffer size when parsing from file

2019-09-02 Thread Noam Sturmwind
Noam Sturmwind added the comment: I believe this is working as intended, but is potentially surprising behavior. If so, perhaps a note could be added to the xml.dom documentation mentioning that this needs to be accounted for. Per https://stackoverflow.com/a/317494 a correct way to read the

[issue38011] xml.dom.pulldom splits text data at buffer size when parsing from file

2019-09-02 Thread Noam Sturmwind
Noam Sturmwind added the comment: Note that the parser handles it correctly if the buffer boundary lies in the middle of a tag name; only if it lies in the middle of text data does it result in this behavior. -- ___ Python tracker

[issue38011] xml.dom.pulldom splits text data at buffer size when parsing from file

2019-09-02 Thread Noam Sturmwind
New submission from Noam Sturmwind : Python 3.7.4 When parsing a file using xml.dom.pulldom.parse(), if the parser is in the middle of text data when default_bufsize is reached it will split the text into multiple DOM Text nodes. This breaks code expecting that reads the text data using node