[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted
New submission from Vaibhav Mallya: I'm writing python `csv` based-parsers as part of a data processing pipeline that includes Redshift and other data stores upstream and down. It's easy and expected in all of these data stores (http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) that CSV-style data can be generated with ESCAPE'd newlines, and with or without quotes on the columns. Challenge: However, 2.x CSV module has a bug where ESCAPE'd newlines in unquoted CSVs are not actually treated as escaped newlines, but as entirely new record entries. This is at odds with expected behavior in most common data warehouses (See - Redshift docs I linked above for example) and is a subtle source of bugs for data processing pipelines. We changed our Redshift Parameters to ADDQUOTES so we could get around this bug, after some debugging. Note - This seems to be a continuation of https://bugs.python.org/issue15927 which was closed as WONTFIX for 2.x. I think this is a legitimate bug, and should be fixed in 2.x. If someone is relying on old / bad behavior might mean something else is wrong. In my view, the current behavior effectively adds an implicit, undocumented dialect to the CSV module. -- components: Library (Lib) messages: 303025 nosy: mallyvai priority: normal severity: normal status: open title: CSV module incorrectly treats escaped newlines as new records if unquoted type: behavior versions: Python 2.7 ___ Python tracker <https://bugs.python.org/issue31590> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted
Vaibhav Mallya added the comment: Hello R. David & Terry! Appreciate your prompt responses. While experimenting with different test cases I realized that escaped slashes and newlines are intrinsically annoying to reason about as stringy-one-liners, so I threw together a small tarball test case - attached - to make sure we're on the same page. To be clear, I was referring *solely* to reading with csv.DictReader (we're not using the writing part). The assertion for the multi_line_csv_unquoted fails, and I believe it should succeed. I hadn't considered the design-bug vs code-bug angle. I also think that documenting this somehow - explicitly - would help others, since there's no mention of the interaction here, with what should be a fairly common use-case. It might even make sense to make a "strong recommendation" that everything is quoted + escaped (much as redshift makes a strong recommendation to escape). Our data pipeline is doing fine after the right parameters on both sides, this is more about improving Python for the rest of the community. Thanks for your help, I will of course respect any decision you make. -- Added file: https://bugs.python.org/file47181/csv_test.tar ___ Python tracker <https://bugs.python.org/issue31590> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5570] Bus error when calling .poll() on a closed Connection from multiprocessing.Pipe()
Vaibhav Mallya added the comment: Python 2.6.1 (r261:67515, Mar 22 2009, 05:39:39) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from multiprocessing import Pipe >>> a, b = Pipe() >>> a.close() >>> a.poll() Segmentation fault Seems like this should raise an exception. uname -a: Linux mememy 2.6.24-23-generic #1 SMP Thu Feb 5 15:00:25 UTC 2009 i686 GNU/Linux Compiled Python 2.6.1 from source. -- components: +Extension Modules -Library (Lib) nosy: +mallyvai ___ Python tracker <http://bugs.python.org/issue5570> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5573] multiprocessing Pipe poll() and recv() semantics.
New submission from Vaibhav Mallya : Python 2.6.1 (r261:67515, Mar 22 2009, 05:39:39) [GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from multiprocessing import Pipe >>> parent, child = Pipe() >>> parent.send(1) >>> parent.close() >>> print child.recv() 1 >>> print child.poll() True >>> print child.recv() Traceback (most recent call last): File "", line 1, in EOFError We have to use both poll() and recv() to determine whether or not the connection was actually closed. Better behavior might be returning True on poll() only if the next recv() on that end of the pipe will work without an error. There may not be a way to guarantee this, but it would be useful if the documentation was clarified either way. uname -a: Linux mememy 2.6.24-23-generic #1 SMP Thu Feb 5 15:00:25 UTC 2009 i686 GNU/Linux Compiled Python 2.6.1 from source. -- assignee: georg.brandl components: Documentation, Library (Lib) messages: 84204 nosy: georg.brandl, mallyvai severity: normal status: open title: multiprocessing Pipe poll() and recv() semantics. type: behavior versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue5573> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5573] multiprocessing Pipe poll() and recv() semantics.
Changes by Vaibhav Mallya : -- nosy: +jnoller ___ Python tracker <http://bugs.python.org/issue5573> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5574] multiprocessing queues.py doesn't include JoinableQueue in its __all__ list
New submission from Vaibhav Mallya : Should __all__ = ['Queue', 'SimpleQueue'] in queues.py have JoinableQueue as part of the list as well? Also, multiprocessing's __init__.py does not appear to have SimpleQueue as part of its __all__ - is this expected? SimpleQueue does not appear in the multiprocessing docs; is it meant to be avoided by user code then? -- assignee: georg.brandl components: Documentation, Library (Lib) messages: 84212 nosy: georg.brandl, jnoller, mallyvai severity: normal status: open title: multiprocessing queues.py doesn't include JoinableQueue in its __all__ list type: feature request versions: Python 2.6, Python 2.7, Python 3.0 ___ Python tracker <http://bugs.python.org/issue5574> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5573] multiprocessing Pipe poll() and recv() semantics.
Vaibhav Mallya added the comment: On second thought, it seems like it shouldn't make sense. This forces a destructive check. Suppose we do child.poll() and then child.recv() but it's legitimate data; that data will be removed from the queue even if we just wanted to check if the pipe was alive. This seems like it shouldn't have to happen. I'm unfamiliar with the lower level workings of sockets; is this destructive checking behavior forced by the socket internals? Is it standard? -- ___ Python tracker <http://bugs.python.org/issue5573> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5857] Return namedtuples from tokenize token generator
New submission from Vaibhav Mallya : Returning an anonymous 5-tuple seems like a suboptimal interface since it's so easy to accidentally confuse, for example, the indices of start and end. I've used tokenize.py for several scripts in the past few weeks and I've always ended up writing some sort of wrapper function for generate_tokens that names the returned tuple's fields to help me avoid mistakes like this. I'd like to propose the following patch that simply decorates the generate_token function and names its return values' fields. Since it's a namedtuple, it should be fully backwards compatible with the existing interface, but also allow member access via next_token.type next_token.string next_token.start.row, next_token.start.col next_token.end.row, next_token.end.col next_token.line If this seems like a reasonable way to do things, I'd be happy to submit relevant doc patches as well as the corresponding patch for 3.0. -- components: Library (Lib) files: mallyvai_tokenize.patch keywords: patch messages: 86691 nosy: mallyvai severity: normal status: open title: Return namedtuples from tokenize token generator type: feature request versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1 Added file: http://bugs.python.org/file13797/mallyvai_tokenize.patch ___ Python tracker <http://bugs.python.org/issue5857> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5857] Return namedtuples from tokenize token generator
Vaibhav Mallya added the comment: Well, the reason I put in the inner row/col namedtuple initially was because the first mistake I made with the original module was mixing up the row/col indices for a particular case. It certainly caused all sorts of weird headaches. :o) I mean, it seems like there's no real reason it "should" be (row,col) instead of (col,row) in the returned tuple; that is, it feels like the ordering is arbitrary in and of itself. I really feel that allowing for start.row and start.col would make the interface completely explicit and valid semantically. Agreed with the other two points, however. Also, I take it there's going to be a need for an addendum to the test suite, since the interface is being modified? -- ___ Python tracker <http://bugs.python.org/issue5857> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5906] Risk of confusion in multiprocessing module - daemonic processes
Vaibhav Mallya added the comment: I understand pakal's line of reasoning. The term 'daemon' in the general Unix sense has a specific meaning that is at odds with the multiprocessing module's usage of 'daemon'. Clarification would be useful, I feel, especially if an outright rename of that part of the API is out of the question. -- nosy: +mallyvai ___ Python tracker <http://bugs.python.org/issue5906> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6009] optparse docs say 'default' keyword is deprecated but uses it in most examples
New submission from Vaibhav Mallya : The first example, and several subsequent examples later on in the optparse docs, use 'default' as an argument, even though it's apparently deprecated in favor of set_defaults. At the risk of overstating the obvious, this seems to be inconsistent. Even the section on defaults http://docs.python.org/library/optparse.html#default-values uses the 'default' keyword without stressing its deprecation. It might make more sense to leave it out of all of the examples altogether, replacing it with the appropriate set_defaults invocations. -- assignee: georg.brandl components: Documentation, Library (Lib) messages: 87668 nosy: georg.brandl, mallyvai severity: normal status: open title: optparse docs say 'default' keyword is deprecated but uses it in most examples type: behavior versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1, Python 3.2 ___ Python tracker <http://bugs.python.org/issue6009> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted
Vaibhav Mallya (mallyvai) added the comment: If there's any way this can be documented that would be a big help, at least. There have been other folks who run into this, and the current behavior is implicit. On Sep 29, 2017 5:44 PM, "R. David Murray" wrote: R. David Murray added the comment: I'm pretty hesitant to make this kind of change in python2. I'm going to punt, and let someone else make the decision. Which means if no one does, the status quo will win. Sorry about that. -- ___ Python tracker <https://bugs.python.org/issue31590> _______ -- nosy: +Vaibhav Mallya (mallyvai) ___ Python tracker <https://bugs.python.org/issue31590> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com