[issue1818] Add named tuple reader to CSV module

2009-02-25 Thread Rob Renaud

Rob Renaud  added the comment:

I am totally new to Python dev.  I reinvented a NamedTupleReader
tonight, only to find out that it was created a year ago.  My primary
motivation is that DictReader reads headers nicely, but DictWriter
totally sucks at handling them.

Consider doing some filtering on a csv file, like so.

sample_data = [
'title,latitude,longitude',
'OHO Ofner & Hammecke Reinigungsgesellschaft mbH,48.128265,11.610848',
'Kitchen Kaboodle,45.544241,-122.715728',
'Walgreens,28.339727,-81.596367',
'Gurnigel Pass,46.731944,7.447778'
]

def filter_with_dict_reader_writer():
  accepted_rows = []
  for row in csv.DictReader(sample_data):
if float(row['latitude']) > 0.0 and float(row['longitude']) > 0.0:
  accepted_rows.append(row)

  field_names = csv.reader(sample_data).next()
  output_writer = csv.DictWriter(open('accepted_by_dict.csv', 'w'),
 field_names)
  output_writer.writerow(dict(zip(field_names, field_names)))
  output_writer.writerows(accepted_rows)

You have to work so hard to maintain the headers when you write the file
with DictWriter.  I understand this is a limitation of dicts throwing
away the order information.  But namedtuples don't have that problem.

NamedTupleReader and NamedTupleWriter should be inverses.  This means
that NamedTupleWriter needs to write headers.  This should produce
identical output as the dict writer example, but it's much cleaner.

def filter_with_named_tuple_reader_writer():
   accepted_rows = []
   for row in csv.NamedTupleReader(sample_data):
 if float(row.latitude) > 0.0 and float(row.longitude) > 0.0:
   accepted_rows.append(row)

   output_writer = csv.NamedTupleWriter(
   open('accepted_by_named_tuple.csv', 'w'))
   output_writer.writerows(accepted_rows)

I patched on top of the existing NamedTupleWriter patch adding support
for writing headers.  I don't know if that's bad style/etiquette, etc.

--
nosy: +rrenaud
Added file: http://bugs.python.org/file13187/named_tuple_write_header.patch

___
Python tracker 
<http://bugs.python.org/issue1818>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1818] Add named tuple reader to CSV module

2009-02-25 Thread Rob Renaud

Rob Renaud  added the comment:

My previous patch could write the header twice.  But I am not sure about
about how the writer should handle the fieldnames parameter on one hand,
and the namedtuple._fields on the other.

Added file: http://bugs.python.org/file13188/named_tuple_write_header2.patch

___
Python tracker 
<http://bugs.python.org/issue1818>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1818] Add named tuple reader to CSV module

2009-02-25 Thread Rob Renaud

Changes by Rob Renaud :


Removed file: http://bugs.python.org/file13187/named_tuple_write_header.patch

___
Python tracker 
<http://bugs.python.org/issue1818>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1818] Add named tuple reader to CSV module

2009-02-26 Thread Rob Renaud

Rob Renaud  added the comment:

I want to make sure I understand.  Am I correct in believing that Skip
thinks writing headers should be optional, while Jervis believes we
should leave the burden to the NamedTupleWriter client?  

I agree that we should not unconditionally write headers, but I think
that we should write headers by default, much like we read them by default.

I believe the implicit header writing is very elegant, and the only
reason that the DictWriter object doesn't write headers is the impedance
mismatch between dicts and CSV.  namedtuples has the field order
information, the impedance mismatch is gone, we should no longer be
hindered.  Implicitly reading but not explicitly writing headers just
seems wrong.

It also seems wrong to require the construction of "header" namedtuple
objects.  It's much less natural than dicts holding identity mappings.

>>> Point._make(Point._fields)
Point(x='x', y='y')

To me, that just looks weird and non-obvious to me.  That Point instance
doesn't really fit in my mind as something that should be a Point.

___
Python tracker 
<http://bugs.python.org/issue1818>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1818] Add named tuple reader to CSV module

2009-02-26 Thread Rob Renaud

Rob Renaud  added the comment:

I did a search on Google code for the DictReader constructor.  I
analyzed the first 3 pages, the fieldnames parameter was used in 14 of
27 cases (discounting unittest code built into Python) and was not
used in 13 of 27 cases.  I suppose that means headered csv files are
sufficiently rare that they shouldn't be created implicitly by
default.  I still don't like the lack of symmetry of supporting
implicit header reads, but not implicit header writes.

On Thu, Feb 26, 2009 at 8:00 PM, Skip Montanaro  wrote:
>
> Skip Montanaro  added the comment:
>
> More concretely, I don't think this is so onerous:
>
>names = ["col1", "col2", "color"]
>writer = csv.DictWriter(open("f.csv", "wb"), fieldnames=names, ...)
>writer.writerow(dict(zip(names, names)))
>...
>
> or
>
>f = open("f.csv", "rb")
>names = csv.reader(f).next()
>reader = csv.DictReader(f, fieldnames=names, ...)
>...
>
> Skip
>
> ___
> Python tracker 
> <http://bugs.python.org/issue1818>
> ___
>

___
Python tracker 
<http://bugs.python.org/issue1818>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1551113] random.choice(setinstance) fails

2009-03-26 Thread Rob Renaud

Rob Renaud  added the comment:

I found this via google search when disappointed that random.choice
raised an exception rather than returned a random item in the set.

It's quite easy to implement random.choice for sets/dicts in O(1)
expected time from the C implementation as long as the set/dict
implementation guarantees minimal constant density.  Simply generate
random indices in the set object until one with an object is found . 
This has will work in expected O(1/density) probes.

I suppose making random.choice work for sets/dicts isn't worth a C
implementation (as happy as it would have made me a few hours ago...)?

--
nosy: +rrenaud

___
Python tracker 
<http://bugs.python.org/issue1551113>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com