[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted

2017-09-26 Thread Vaibhav Mallya

New submission from Vaibhav Mallya:

I'm writing python `csv` based-parsers as part of a data processing pipeline 
that includes Redshift and other data stores upstream and down. It's easy and 
expected in all of these data stores  
(http://docs.aws.amazon.com/redshift/latest/dg/r_UNLOAD.html) that CSV-style 
data can be generated with ESCAPE'd newlines, and with or without quotes on the 
columns.

Challenge: However, 2.x CSV module has a bug where ESCAPE'd newlines in 
unquoted CSVs are not actually treated as escaped newlines, but as entirely new 
record entries. This is at odds with expected behavior in most common data 
warehouses (See - Redshift docs I linked above for example) and is a subtle 
source of bugs for data processing pipelines. We changed our Redshift 
Parameters to ADDQUOTES so we could get around this bug, after some debugging. 

Note - This seems to be a continuation of https://bugs.python.org/issue15927 
which was closed as WONTFIX for 2.x. I think this is a legitimate bug, and 
should be fixed in 2.x. If someone is relying on old / bad behavior might mean 
something else is wrong. In my view, the current behavior effectively adds an 
implicit, undocumented dialect to the CSV module.

--
components: Library (Lib)
messages: 303025
nosy: mallyvai
priority: normal
severity: normal
status: open
title: CSV module incorrectly treats escaped newlines as new records if unquoted
type: behavior
versions: Python 2.7

___
Python tracker 
<https://bugs.python.org/issue31590>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted

2017-09-30 Thread Vaibhav Mallya

Vaibhav Mallya  added the comment:

Hello R. David & Terry!

Appreciate your prompt responses. While experimenting with different test cases 
I realized that escaped slashes and newlines are intrinsically annoying to 
reason about as stringy-one-liners, so I threw together a small tarball test 
case - attached - to make sure we're on the same page. 

To be clear, I was referring *solely* to reading with csv.DictReader (we're not 
using the writing part).

The assertion for the multi_line_csv_unquoted fails, and I believe it should 
succeed.

I hadn't considered the design-bug vs code-bug angle. I also think that 
documenting this somehow - explicitly - would help others, since there's no 
mention of the interaction here, with what should be a fairly common use-case. 
It might even make sense to make a "strong recommendation" that everything is 
quoted + escaped (much as redshift makes a strong recommendation to escape).

Our data pipeline is doing fine after the right parameters on both sides, this 
is more about improving Python for the rest of the community. Thanks for your 
help, I will of course respect any decision you make.

--
Added file: https://bugs.python.org/file47181/csv_test.tar

___
Python tracker 
<https://bugs.python.org/issue31590>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5570] Bus error when calling .poll() on a closed Connection from multiprocessing.Pipe()

2009-03-26 Thread Vaibhav Mallya

Vaibhav Mallya  added the comment:

Python 2.6.1 (r261:67515, Mar 22 2009, 05:39:39) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Pipe
>>> a, b = Pipe()
>>> a.close()
>>> a.poll()
Segmentation fault

Seems like this should raise an exception.

uname -a:
Linux mememy 2.6.24-23-generic #1 SMP Thu Feb 5 15:00:25 UTC 2009 i686
GNU/Linux

Compiled Python 2.6.1 from source.

--
components: +Extension Modules -Library (Lib)
nosy: +mallyvai

___
Python tracker 
<http://bugs.python.org/issue5570>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5573] multiprocessing Pipe poll() and recv() semantics.

2009-03-26 Thread Vaibhav Mallya

New submission from Vaibhav Mallya :

Python 2.6.1 (r261:67515, Mar 22 2009, 05:39:39) 
[GCC 4.2.4 (Ubuntu 4.2.4-1ubuntu4)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing import Pipe
>>> parent, child = Pipe()
>>> parent.send(1)
>>> parent.close()
>>> print child.recv()
1
>>> print child.poll()
True
>>> print child.recv()
Traceback (most recent call last):
  File "", line 1, in 
EOFError

We have to use both poll() and recv() to determine whether or not the
connection was actually closed.

Better behavior might be returning True on poll() only if the next
recv() on that end of the pipe will work without an error. There may not
be a way to guarantee this, but it would be useful if the documentation
was clarified either way.


uname -a:
Linux mememy 2.6.24-23-generic #1 SMP Thu Feb 5 15:00:25 UTC 2009 i686
GNU/Linux

Compiled Python 2.6.1 from source.

--
assignee: georg.brandl
components: Documentation, Library (Lib)
messages: 84204
nosy: georg.brandl, mallyvai
severity: normal
status: open
title: multiprocessing Pipe poll() and recv() semantics.
type: behavior
versions: Python 2.6

___
Python tracker 
<http://bugs.python.org/issue5573>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5573] multiprocessing Pipe poll() and recv() semantics.

2009-03-26 Thread Vaibhav Mallya

Changes by Vaibhav Mallya :


--
nosy: +jnoller

___
Python tracker 
<http://bugs.python.org/issue5573>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5574] multiprocessing queues.py doesn't include JoinableQueue in its __all__ list

2009-03-26 Thread Vaibhav Mallya

New submission from Vaibhav Mallya :

Should __all__ = ['Queue', 'SimpleQueue'] in queues.py have
JoinableQueue as part of the list as well? 

Also, multiprocessing's __init__.py does not appear to have SimpleQueue
as part of its __all__ - is this expected?

SimpleQueue does not appear in the multiprocessing docs; is it meant to
be avoided by user code then?

--
assignee: georg.brandl
components: Documentation, Library (Lib)
messages: 84212
nosy: georg.brandl, jnoller, mallyvai
severity: normal
status: open
title: multiprocessing queues.py doesn't include JoinableQueue in its __all__ 
list
type: feature request
versions: Python 2.6, Python 2.7, Python 3.0

___
Python tracker 
<http://bugs.python.org/issue5574>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5573] multiprocessing Pipe poll() and recv() semantics.

2009-03-26 Thread Vaibhav Mallya

Vaibhav Mallya  added the comment:

On second thought, it seems like it shouldn't make sense. This forces a
destructive check. Suppose we do child.poll() and then child.recv() but
it's legitimate data; that data will be removed from the queue even if
we just wanted to check if the pipe was alive. This seems like it
shouldn't have to happen.

I'm unfamiliar with the lower level workings of sockets; is this
destructive checking behavior forced by the socket internals? Is it
standard?

--

___
Python tracker 
<http://bugs.python.org/issue5573>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5857] Return namedtuples from tokenize token generator

2009-04-27 Thread Vaibhav Mallya

New submission from Vaibhav Mallya :

Returning an anonymous 5-tuple seems like a suboptimal interface since
it's so easy to accidentally confuse, for example, the indices of start
and end. I've used tokenize.py for several scripts in the past few weeks
and I've always ended up writing some sort of wrapper function for
generate_tokens that names the returned tuple's fields to help me avoid
mistakes like this.

I'd like to propose the following patch that simply decorates the
generate_token function and names its return values' fields. Since it's
a namedtuple, it should be fully backwards compatible with the existing
interface, but also allow member access via 

next_token.type
next_token.string
next_token.start.row, next_token.start.col
next_token.end.row, next_token.end.col
next_token.line

If this seems like a reasonable way to do things, I'd be happy to submit
relevant doc patches as well as the corresponding patch for 3.0.

--
components: Library (Lib)
files: mallyvai_tokenize.patch
keywords: patch
messages: 86691
nosy: mallyvai
severity: normal
status: open
title: Return namedtuples from tokenize token generator
type: feature request
versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1
Added file: http://bugs.python.org/file13797/mallyvai_tokenize.patch

___
Python tracker 
<http://bugs.python.org/issue5857>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5857] Return namedtuples from tokenize token generator

2009-04-27 Thread Vaibhav Mallya

Vaibhav Mallya  added the comment:

Well, the reason I put in the inner row/col namedtuple initially was
because the first mistake I made with the original module was mixing up
the row/col indices for a particular case. It certainly caused all sorts
of weird headaches. :o)

I mean, it seems like there's no real reason it "should" be (row,col)
instead of (col,row) in the returned tuple; that is, it feels like the
ordering is arbitrary in and of itself.

I really feel that allowing for start.row and start.col would make the
interface completely explicit and valid semantically.

Agreed with the other two points, however.

Also, I take it there's going to be a need for an addendum to the test
suite, since the interface is being modified?

--

___
Python tracker 
<http://bugs.python.org/issue5857>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue5906] Risk of confusion in multiprocessing module - daemonic processes

2009-05-04 Thread Vaibhav Mallya

Vaibhav Mallya  added the comment:

I understand pakal's line of reasoning. The term 'daemon' in the general
Unix sense has a specific meaning that is at odds with the
multiprocessing module's usage of 'daemon'. Clarification would be
useful, I feel, especially if an outright rename of that part of the API
is out of the question.

--
nosy: +mallyvai

___
Python tracker 
<http://bugs.python.org/issue5906>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6009] optparse docs say 'default' keyword is deprecated but uses it in most examples

2009-05-12 Thread Vaibhav Mallya

New submission from Vaibhav Mallya :

The first example, and several subsequent examples later on in the
optparse docs, use 'default' as an argument, even though it's apparently
deprecated in favor of set_defaults. At the risk of overstating the
obvious, this seems to be inconsistent. Even the section on defaults
http://docs.python.org/library/optparse.html#default-values uses the
'default' keyword without stressing its deprecation. It might make more
sense to leave it out of all of the examples altogether, replacing it
with the appropriate set_defaults invocations.

--
assignee: georg.brandl
components: Documentation, Library (Lib)
messages: 87668
nosy: georg.brandl, mallyvai
severity: normal
status: open
title: optparse docs say 'default' keyword is deprecated but uses it in most 
examples
type: behavior
versions: Python 2.6, Python 2.7, Python 3.0, Python 3.1, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue6009>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31590] CSV module incorrectly treats escaped newlines as new records if unquoted

2017-09-29 Thread Vaibhav Mallya (mallyvai)

Vaibhav Mallya (mallyvai)  added the comment:

If there's any way this can be documented that would be a big help, at
least. There have been other folks who run into this, and the current
behavior is implicit.

On Sep 29, 2017 5:44 PM, "R. David Murray"  wrote:

R. David Murray  added the comment:

I'm pretty hesitant to make this kind of change in python2.  I'm going to
punt, and let someone else make the decision.  Which means if no one does,
the status quo will win.  Sorry about that.

--

___
Python tracker 
<https://bugs.python.org/issue31590>
_______

--
nosy: +Vaibhav Mallya (mallyvai)

___
Python tracker 
<https://bugs.python.org/issue31590>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com