On Thu, Mar 31, 2011 at 5:03 PM, Bruce Southey <[email protected]> wrote: > On Wed, Mar 30, 2011 at 9:53 PM, Charles R Harris > <[email protected]> wrote: >> >> >> On Sun, Mar 27, 2011 at 4:09 AM, Paul Anton Letnes >> <[email protected]> wrote: >>> >>> On 26. mars 2011, at 21.44, Derek Homeier wrote: >>> >>> > Hi Paul, >>> > >>> > having had a look at the other tickets you dug up, >>> > > [snip] >>> >>> >> 1071: >>> >> It is not clear to me whether loadtxt is supposed to support >>> >> missing values in the fashion indicated in the ticket. >>> > >>> > In principle it should at least allow you to, by the use of converters >>> > as described there. >>> > The problem is, the default delimiter is described as 'any >>> > whitespace', which in the >>> > present implementation obviously includes any number of blanks or >>> > tabs. These >>> > are therefore treated differently from delimiters like ',' or '&'. I'd >>> > reckon there are >>> > too many people actually relying on this behaviour to silently change it >>> > (e.g. I know plenty of tables with columns separated by either one or >>> > several >>> > tabs depending on the length of the previous entry). But the tab is >>> > apparently also >>> > treated differently if explicitly specified with "delimiter='\t'" - >>> > and in that case using >>> > a converter à la {2: lambda s: float(s or 'Nan')} is working for >>> > fields in the middle of >>> > the line, but not at the end - clearly warrants improvement. I've >>> > prepared a patch >>> > working for Python3 as well. >>> >>> Great! >>> > This is an invalid ticket because the docstring clearly states that in > 3 different, yet critical places, that missing values are not handled > here: > > "Each row in the text file must have the same number of values." > "genfromtxt : Load data with missing values handled as specified." > " This function aims to be a fast reader for simply formatted files. The > `genfromtxt` function provides more sophisticated handling of, e.g., > lines with missing values." > > Really I am trying to separate the usage of loadtxt and genfromtxt to > avoid unnecessary duplication and confusion. Part of this is > historical because loadtxt was added in 2007 and genfromtxt was added > in 2009. So really certain features of loadtxt have been 'kept' for > backwards compatibility purposes yet these features can be 'abused' to > handle missing data. But I really consider that any missing values > should cause loadtxt to fail.
I agree with you Bruce, but it would be easier to discuss this on the tickets instead of here. Could you add your comments there please? Ralf > The patch is incorrect because it should not include a space in the > split() as indicated in the comment by the original reporter. Of > course a corrected patch alone still is not sufficient to address the > problem without the user providing the correct converter. Also you > start to run into problems with multiple delimiters (such as one space > versus two spaces) so you start down the path to add all the features > that duplicate genfromtxt. _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
