On Thu, Dec 1, 2011 at 17:39, Charles R Harris <charlesr.har...@gmail.com> wrote: > Given that strings should be the result, this looks like a bug. It's a bit > of a corner case that probably slipped through during the recent work on > casting. There needs to be tests for these sorts of things, so if you find > more oddities post them so we can add them.
I'm happy to add a patch and tests, but could use some guidance... It looks like discover_itemsize() in core/src/multiarray/ctors.c should compute the length of the string or unicode representation of the object based on the eventual type, but looking at UNICODE_setitem() and STRING_setitem() in core/src/multiarray/arraytypes.c.src, this is not trivial. Perhaps the object-to-unicode/string parts of UNICODE_setitem/STRING_setitem can be extracted into separate functions that can be called from *_setitem as well as discover_itemsize. discover_itemsize would also need to know the type it's discovering for (string or unicode or user-defined). Not sure what to do to handle user-defined types (error?). If that's is too complicated, maybe discover_itemsize should return -1 (or warn, but given the danger of truncation, that seems a bit weak) if asked to discover from data that doesn't have a length. This would result in dtype=object when np.array is handed a mixed int/string list. I wonder, also, if STRING_setitem and UNICODE_setitem shouldn't emit a warning if asked to truncate data? Ray Jones _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion